Search | arXiv e-print repository

PyTrial: Machine Learning Software and Benchmark for Clinical Trial Applications

Authors: Zifeng Wang, Brandon Theodorou, Tianfan Fu, Cao Xiao, Jimeng Sun

Abstract: Clinical trials are conducted to test the effectiveness and safety of potential drugs in humans for regulatory approval. Machine learning (ML) has recently emerged as a new tool to assist in clinical trials. Despite this progress, there have been few efforts to document and benchmark ML4Trial algorithms available to the ML research community. Additionally, the accessibility to clinical trial-relat… ▽ More Clinical trials are conducted to test the effectiveness and safety of potential drugs in humans for regulatory approval. Machine learning (ML) has recently emerged as a new tool to assist in clinical trials. Despite this progress, there have been few efforts to document and benchmark ML4Trial algorithms available to the ML research community. Additionally, the accessibility to clinical trial-related datasets is limited, and there is a lack of well-defined clinical tasks to facilitate the development of new algorithms. To fill this gap, we have developed PyTrial that provides benchmarks and open-source implementations of a series of ML algorithms for clinical trial design and operations. In this paper, we thoroughly investigate 34 ML algorithms for clinical trials across 6 different tasks, including patient outcome prediction, trial site selection, trial outcome prediction, patient-trial matching, trial similarity search, and synthetic data generation. We have also collected and prepared 23 ML-ready datasets as well as their working examples in Jupyter Notebooks for quick implementation and testing. PyTrial defines each task through a simple four-step process: data loading, model specification, model training, and model evaluation, all achievable with just a few lines of code. Furthermore, our modular API architecture empowers practitioners to expand the framework to incorporate new algorithms and tasks effortlessly. The code is available at https://github.com/RyanWangZf/PyTrial. △ Less

Submitted 5 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

arXiv:2306.01631 [pdf, other]

Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations

Authors: Pengcheng Jiang, Cao Xiao, Tianfan Fu, Jimeng Sun

Abstract: Molecule representation learning is crucial for various downstream applications, such as understanding and predicting molecular properties and side effects. In this paper, we propose a novel method called GODE, which takes into account the two-level structure of individual molecules. We recognize that molecules have an intrinsic graph structure as well as being a node in a larger molecule knowledg… ▽ More Molecule representation learning is crucial for various downstream applications, such as understanding and predicting molecular properties and side effects. In this paper, we propose a novel method called GODE, which takes into account the two-level structure of individual molecules. We recognize that molecules have an intrinsic graph structure as well as being a node in a larger molecule knowledge graph. GODE integrates graph representations of individual molecules with multidomain biochemical data from knowledge graphs. By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures. This fusion results in a more robust and informative representation, which enhances molecular property prediction by harnessing both chemical and biological information. When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks. Impressively, it surpasses the current leading model in molecule property predictions with average advancements of 2.1% in classification and 6.4% in regression tasks. △ Less

Submitted 19 January, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

arXiv:2305.18090 [pdf, other]

ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback

Authors: Shengchao Liu, Jiongxiao Wang, Yi** Yang, Chengpeng Wang, Ling Liu, Hongyu Guo, Chaowei Xiao

Abstract: Recent advancements in conversational large language models (LLMs), such as ChatGPT, have demonstrated remarkable promise in various domains, including drug discovery. However, existing works mainly focus on investigating the capabilities of conversational LLMs on chemical reaction and retrosynthesis. While drug editing, a critical task in the drug discovery pipeline, remains largely unexplored. T… ▽ More Recent advancements in conversational large language models (LLMs), such as ChatGPT, have demonstrated remarkable promise in various domains, including drug discovery. However, existing works mainly focus on investigating the capabilities of conversational LLMs on chemical reaction and retrosynthesis. While drug editing, a critical task in the drug discovery pipeline, remains largely unexplored. To bridge this gap, we propose ChatDrug, a framework to facilitate the systematic investigation of drug editing using LLMs. ChatDrug jointly leverages a prompt module, a retrieval and domain feedback (ReDF) module, and a conversation module to streamline effective drug editing. We empirically show that ChatDrug reaches the best performance on 33 out of 39 drug editing tasks, encompassing small molecules, peptides, and proteins. We further demonstrate, through 10 case studies, that ChatDrug can successfully identify the key substructures (e.g., the molecule functional groups, peptide motifs, and protein structures) for manipulation, generating diverse and valid suggestions for drug editing. Promisingly, we also show that ChatDrug can offer insightful explanations from a domain-specific perspective, enhancing interpretability and enabling informed decision-making. This research sheds light on the potential of ChatGPT and conversational LLMs for drug editing. It paves the way for a more efficient and collaborative drug discovery pipeline, contributing to the advancement of pharmaceutical research and development. △ Less

Submitted 29 May, 2023; originally announced May 2023.

arXiv:2302.04611 [pdf, other]

A Text-guided Protein Design Framework

Authors: Shengchao Liu, Yan**g Li, Zhuoxinran Li, Anthony Gitter, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Arvind Ramanathan, Chaowei Xiao, Jian Tang, Hongyu Guo, Anima Anandkumar

Abstract: Current AI-assisted protein design mainly utilizes protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in the text format describing proteins' high-level functionalities. Yet, whether the incorporation of such text data can help protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multi-modal framework tha… ▽ More Current AI-assisted protein design mainly utilizes protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in the text format describing proteins' high-level functionalities. Yet, whether the incorporation of such text data can help protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multi-modal framework that leverages textual descriptions for protein design. ProteinDT consists of three subsequent steps: ProteinCLAP which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality, and a decoder that creates the protein sequences from the representation. To train ProteinDT, we construct a large dataset, SwissProtCLAP, with 441K text and protein pairs. We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90\% accuracy for text-guided protein generation; (2) best hit ratio on 10 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks. △ Less

Submitted 3 December, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

arXiv:2212.10789 [pdf, other]

Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing

Authors: Shengchao Liu, Weili Nie, Chengpeng Wang, Jiarui Lu, Zhuoran Qiao, Ling Liu, Jian Tang, Chaowei Xiao, Anima Anandkumar

Abstract: There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Her… ▽ More There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks. △ Less

Submitted 29 January, 2024; v1 submitted 21 December, 2022; originally announced December 2022.

arXiv:2208.11126 [pdf, other]

Retrieval-based Controllable Molecule Generation

Authors: Zichao Wang, Weili Nie, Zhuoran Qiao, Chaowei Xiao, Richard Baraniuk, Anima Anandkumar

Abstract: Generating new molecules with specified chemical and biological properties via generative models has emerged as a promising direction for drug discovery. However, existing methods require extensive training/fine-tuning with a large dataset, often unavailable in real-world generation tasks. In this work, we propose a new retrieval-based framework for controllable molecule generation. We use a small… ▽ More Generating new molecules with specified chemical and biological properties via generative models has emerged as a promising direction for drug discovery. However, existing methods require extensive training/fine-tuning with a large dataset, often unavailable in real-world generation tasks. In this work, we propose a new retrieval-based framework for controllable molecule generation. We use a small set of exemplar molecules, i.e., those that (partially) satisfy the design criteria, to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria. We design a retrieval mechanism that retrieves and fuses the exemplar molecules with the input molecule, which is trained by a new self-supervised objective that predicts the nearest neighbor of the input molecule. We also propose an iterative refinement process to dynamically update the generated molecules and retrieval database for better generalization. Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning. On various tasks ranging from simple design criteria to a challenging real-world scenario for designing lead compounds that bind to the SARS-CoV-2 main protease, we demonstrate our approach extrapolates well beyond the retrieval database, and achieves better performance and wider applicability than previous methods. Code is available at https://github.com/NVlabs/RetMol. △ Less

Submitted 24 April, 2023; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: ICLR 2023

arXiv:2105.01171 [pdf, other]

Machine Learning Applications for Therapeutic Tasks with Genomics Data

Authors: Kexin Huang, Cao Xiao, Lucas M. Glass, Cathy W. Critchlow, Greg Gibson, Jimeng Sun

Abstract: Thanks to the increasing availability of genomics and other biomedical data, many machine learning approaches have been proposed for a wide range of therapeutic discovery and development tasks. In this survey, we review the literature on machine learning applications for genomics through the lens of therapeutic development. We investigate the interplay among genomics, compounds, proteins, electron… ▽ More Thanks to the increasing availability of genomics and other biomedical data, many machine learning approaches have been proposed for a wide range of therapeutic discovery and development tasks. In this survey, we review the literature on machine learning applications for genomics through the lens of therapeutic development. We investigate the interplay among genomics, compounds, proteins, electronic health records (EHR), cellular images, and clinical texts. We identify twenty-two machine learning in genomics applications across the entire therapeutics pipeline, from discovering novel targets, personalized medicine, develo** gene-editing tools all the way to clinical trials and post-market studies. We also pinpoint seven important challenges in this field with opportunities for expansion and impact. This survey overviews recent research at the intersection of machine learning, genomics, and therapeutic development. △ Less

Submitted 3 May, 2021; originally announced May 2021.

arXiv:2102.09548 [pdf, other]

Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development

Authors: Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik

Abstract: Therapeutics machine learning is an emerging field with incredible opportunities for innovatiaon and impact. However, advancement in this field requires formulation of meaningful learning tasks and careful curation of datasets. Here, we introduce Therapeutics Data Commons (TDC), the first unifying platform to systematically access and evaluate machine learning across the entire range of therapeuti… ▽ More Therapeutics machine learning is an emerging field with incredible opportunities for innovatiaon and impact. However, advancement in this field requires formulation of meaningful learning tasks and careful curation of datasets. Here, we introduce Therapeutics Data Commons (TDC), the first unifying platform to systematically access and evaluate machine learning across the entire range of therapeutics. To date, TDC includes 66 AI-ready datasets spread across 22 learning tasks and spanning the discovery and development of safe and effective medicines. TDC also provides an ecosystem of tools and community resources, including 33 data functions and types of meaningful data splits, 23 strategies for systematic model evaluation, 17 molecule generation oracles, and 29 public leaderboards. All resources are integrated and accessible via an open Python library. We carry out extensive experiments on selected datasets, demonstrating that even the strongest algorithms fall short of solving key therapeutics challenges, including real dataset distributional shifts, multi-scale modeling of heterogeneous data, and robust generalization to novel data points. We envision that TDC can facilitate algorithmic and scientific advances and considerably accelerate machine-learning model development, validation and transition into biomedical and clinical implementation. TDC is an open-science initiative available at https://tdcommons.ai. △ Less

Submitted 28 August, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: Published at NeurIPS 2021 Datasets and Benchmarks

arXiv:2012.04747 [pdf, other]

STELAR: Spatio-temporal Tensor Factorization with Latent Epidemiological Regularization

Authors: Nikos Kargas, Cheng Qian, Nicholas D. Sidiropoulos, Cao Xiao, Lucas M. Glass, Jimeng Sun

Abstract: Accurate prediction of the transmission of epidemic diseases such as COVID-19 is crucial for implementing effective mitigation measures. In this work, we develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously. We construct a 3-way spatio-temporal tensor (location, attribute, time) of case counts and propose a nonnegative tensor factorization with latent… ▽ More Accurate prediction of the transmission of epidemic diseases such as COVID-19 is crucial for implementing effective mitigation measures. In this work, we develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously. We construct a 3-way spatio-temporal tensor (location, attribute, time) of case counts and propose a nonnegative tensor factorization with latent epidemiological model regularization named STELAR. Unlike standard tensor factorization methods which cannot predict slabs ahead, STELAR enables long-term prediction by incorporating latent temporal regularization through a system of discrete-time difference equations of a widely adopted epidemiological model. We use latent instead of location/attribute-level epidemiological dynamics to capture common epidemic profile sub-types and improve collaborative learning and prediction. We conduct experiments using both county- and state-level COVID-19 data and show that our model can identify interesting latent patterns of the epidemic. Finally, we evaluate the predictive ability of our method and show superior performance compared to the baselines, achieving up to 21% lower root mean square error and 25% lower mean absolute error for county-level prediction. △ Less

Submitted 17 March, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: AAAI 2021

arXiv:2010.03951 [pdf, other]

MolDesigner: Interactive Design of Efficacious Drugs with Deep Learning

Authors: Kexin Huang, Tianfan Fu, Dawood Khan, Ali Abid, Ali Abdalla, Abubakar Abid, Lucas M. Glass, Marinka Zitnik, Cao Xiao, Jimeng Sun

Abstract: The efficacy of a drug depends on its binding affinity to the therapeutic target and pharmacokinetics. Deep learning (DL) has demonstrated remarkable progress in predicting drug efficacy. We develop MolDesigner, a human-in-the-loop web user-interface (UI), to assist drug developers leverage DL predictions to design more effective drugs. A developer can draw a drug molecule in the interface. In the… ▽ More The efficacy of a drug depends on its binding affinity to the therapeutic target and pharmacokinetics. Deep learning (DL) has demonstrated remarkable progress in predicting drug efficacy. We develop MolDesigner, a human-in-the-loop web user-interface (UI), to assist drug developers leverage DL predictions to design more effective drugs. A developer can draw a drug molecule in the interface. In the backend, more than 17 state-of-the-art DL models generate predictions on important indices that are crucial for a drug's efficacy. Based on these predictions, drug developers can edit the drug molecule and reiterate until satisfaction. MolDesigner can make predictions in real-time with a latency of less than a second. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Comments: NeurIPS 2020 Demonstration Track

arXiv:2010.01450 [pdf, other]

doi 10.1093/bioinformatics/btab207

SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization

Authors: Yue Yu, Kexin Huang, Chao Zhang, Lucas M. Glass, Jimeng Sun, Cao Xiao

Abstract: Thanks to the increasing availability of drug-drug interactions (DDI) datasets and large biomedical knowledge graphs (KGs), accurate detection of adverse DDI using machine learning models becomes possible. However, it remains largely an open problem how to effectively utilize large and noisy biomedical KG for DDI detection. Due to its sheer size and amount of noise in KGs, it is often less benefic… ▽ More Thanks to the increasing availability of drug-drug interactions (DDI) datasets and large biomedical knowledge graphs (KGs), accurate detection of adverse DDI using machine learning models becomes possible. However, it remains largely an open problem how to effectively utilize large and noisy biomedical KG for DDI detection. Due to its sheer size and amount of noise in KGs, it is often less beneficial to directly integrate KGs with other smaller but higher quality data (e.g., experimental data). Most of the existing approaches ignore KGs altogether. Some try to directly integrate KGs with other data via graph neural networks with limited success. Furthermore, most previous works focus on binary DDI prediction whereas the multi-typed DDI pharmacological effect prediction is a more meaningful but harder task. To fill the gaps, we propose a new method SumGNN: knowledge summarization graph neural network, which is enabled by a subgraph extraction module that can efficiently anchor on relevant subgraphs from a KG, a self-attention based subgraph summarization scheme to generate a reasoning path within the subgraph, and a multi-channel knowledge and data integration module that utilizes massive external biomedical knowledge for significantly improved multi-typed DDI predictions. SumGNN outperforms the best baseline by up to 5.54\%, and the performance gain is particularly significant in low data relation types. In addition, SumGNN provides interpretable prediction via the generated reasoning paths for each prediction. △ Less

Submitted 6 May, 2021; v1 submitted 3 October, 2020; originally announced October 2020.

Comments: Published in Bioinformatics 2021

arXiv:2008.04215 [pdf]

STAN: Spatio-Temporal Attention Network for Pandemic Prediction Using Real World Evidence

Authors: Junyi Gao, Rakshith Sharma, Cheng Qian, Lucas M. Glass, Jeffrey Spaeder, Justin Romberg, Jimeng Sun, Cao Xiao

Abstract: Objective: The COVID-19 pandemic has created many challenges that need immediate attention. Various epidemiological and deep learning models have been developed to predict the COVID-19 outbreak, but all have limitations that affect the accuracy and robustness of the predictions. Our method aims at addressing these limitations and making earlier and more accurate pandemic outbreak predictions by (1… ▽ More Objective: The COVID-19 pandemic has created many challenges that need immediate attention. Various epidemiological and deep learning models have been developed to predict the COVID-19 outbreak, but all have limitations that affect the accuracy and robustness of the predictions. Our method aims at addressing these limitations and making earlier and more accurate pandemic outbreak predictions by (1) using patients' EHR data from different counties and states that encode local disease status and medical resource utilization condition; (2) considering demographic similarity and geographical proximity between locations; and (3) integrating pandemic transmission dynamics into deep learning models. Materials and Methods: We proposed a spatio-temporal attention network (STAN) for pandemic prediction. It uses an attention-based graph convolutional network to capture geographical and temporal trends and predict the number of cases for a fixed number of days into the future. We also designed a physical law-based loss term for enhancing long-term prediction. STAN was tested using both massive real-world patient data and open source COVID-19 statistics provided by Johns Hopkins university across all U.S. counties. Results: STAN outperforms epidemiological modeling methods such as SIR and SEIR and deep learning models on both long-term and short-term predictions, achieving up to 87% lower mean squared error compared to the best baseline prediction model. Conclusions: By using information from real-world patient data and geographical data, STAN can better capture the disease status and medical resource utilization information and thus provides more accurate pandemic modeling. With pandemic transmission law based regularization, STAN also achieves good long-term prediction performance. △ Less

Submitted 7 December, 2020; v1 submitted 23 July, 2020; originally announced August 2020.

arXiv:2004.14949 [pdf, other]

SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks

Authors: Kexin Huang, Cao Xiao, Lucas Glass, Marinka Zitnik, Jimeng Sun

Abstract: Molecular interaction networks are powerful resources for the discovery. They are increasingly used with machine learning methods to predict biologically meaningful interactions. While deep learning on graphs has dramatically advanced the prediction prowess, current graph neural network (GNN) methods are optimized for prediction on the basis of direct similarity between interacting nodes. In biolo… ▽ More Molecular interaction networks are powerful resources for the discovery. They are increasingly used with machine learning methods to predict biologically meaningful interactions. While deep learning on graphs has dramatically advanced the prediction prowess, current graph neural network (GNN) methods are optimized for prediction on the basis of direct similarity between interacting nodes. In biological networks, however, similarity between nodes that do not directly interact has proved incredibly useful in the last decade across a variety of interaction networks. Here, we present SkipGNN, a graph neural network approach for the prediction of molecular interactions. SkipGNN predicts molecular interactions by not only aggregating information from direct interactions but also from second-order interactions, which we call skip similarity. In contrast to existing GNNs, SkipGNN receives neural messages from two-hop neighbors as well as immediate neighbors in the interaction network and non-linearly transforms the messages to obtain useful information for prediction. To inject skip similarity into a GNN, we construct a modified version of the original network, called the skip graph. We then develop an iterative fusion scheme that optimizes a GNN using both the skip graph and the original graph. Experiments on four interaction networks, including drug-drug, drug-target, protein-protein, and gene-disease interactions, show that SkipGNN achieves superior and robust performance, outperforming existing methods by up to 28.8\% of area under the precision recall curve (PR-AUC). Furthermore, we show that unlike popular GNNs, SkipGNN learns biologically meaningful embeddings and performs especially well on noisy, incomplete interaction networks. △ Less

Submitted 9 December, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

Comments: Published in Nature Scientific Reports: https://www.nature.com/articles/s41598-020-77766-9

arXiv:2004.11424 [pdf, other]

doi 10.1093/bioinformatics/btaa880

MolTrans: Molecular Interaction Transformer for Drug Target Interaction Prediction

Authors: Kexin Huang, Cao Xiao, Lucas Glass, Jimeng Sun

Abstract: Drug target interaction (DTI) prediction is a foundational task for in silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (1) the sole data-driven molecular representation learning approa… ▽ More Drug target interaction (DTI) prediction is a foundational task for in silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (1) the sole data-driven molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain; (2) existing methods focus on limited labeled data while ignoring the value of massive unlabelled molecular data. We propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (1) knowledge inspired sub-structural pattern mining algorithm and interaction modeling module for more accurate and interpretable DTI prediction; (2) an augmented transformer encoder to better extract and capture the semantic relations among substructures extracted from massive unlabeled biomedical data. We evaluate MolTrans on real world data and show it improved DTI prediction performance compared to state-of-the-art baselines. △ Less

Submitted 23 April, 2020; originally announced April 2020.

Comments: Bioinformatics, 2020

arXiv:2004.08919 [pdf, other]

doi 10.1093/bioinformatics/btaa1005

DeepPurpose: a Deep Learning Library for Drug-Target Interaction Prediction

Authors: Kexin Huang, Tianfan Fu, Lucas Glass, Marinka Zitnik, Cao Xiao, Jimeng Sun

Abstract: Accurate prediction of drug-target interactions (DTI) is crucial for drug discovery. Recently, deep learning (DL) models for show promising performance for DTI prediction. However, these models can be difficult to use for both computer scientists entering the biomedical field and bioinformaticians with limited DL experience. We present DeepPurpose, a comprehensive and easy-to-use deep learning lib… ▽ More Accurate prediction of drug-target interactions (DTI) is crucial for drug discovery. Recently, deep learning (DL) models for show promising performance for DTI prediction. However, these models can be difficult to use for both computer scientists entering the biomedical field and bioinformaticians with limited DL experience. We present DeepPurpose, a comprehensive and easy-to-use deep learning library for DTI prediction. DeepPurpose supports training of customized DTI prediction models by implementing 15 compound and protein encoders and over 50 neural architectures, along with providing many other useful features. We demonstrate state-of-the-art performance of DeepPurpose on several benchmark datasets. △ Less

Submitted 9 December, 2020; v1 submitted 19 April, 2020; originally announced April 2020.

Comments: Published in Bioinformatics (2020)

arXiv:1911.06446 [pdf, other]

CASTER: Predicting Drug Interactions with Chemical Substructure Representation

Authors: Kexin Huang, Cao Xiao, Trong Nghia Hoang, Lucas M. Glass, Jimeng Sun

Abstract: Adverse drug-drug interactions (DDIs) remain a leading cause of morbidity and mortality. Identifying potential DDIs during the drug design process is critical for patients and society. Although several computational models have been proposed for DDI prediction, there are still limitations: (1) specialized design of drug representation for DDI predictions is lacking; (2) predictions are based on li… ▽ More Adverse drug-drug interactions (DDIs) remain a leading cause of morbidity and mortality. Identifying potential DDIs during the drug design process is critical for patients and society. Although several computational models have been proposed for DDI prediction, there are still limitations: (1) specialized design of drug representation for DDI predictions is lacking; (2) predictions are based on limited labelled data and do not generalize well to unseen drugs or DDIs; and (3) models are characterized by a large number of parameters, thus are hard to interpret. In this work, we develop a ChemicAl SubstrucTurE Representation (CASTER) framework that predicts DDIs given chemical structures of drugs.CASTER aims to mitigate these limitations via (1) a sequential pattern mining module rooted in the DDI mechanism to efficiently characterize functional sub-structures of drugs; (2) an auto-encoding module that leverages both labelled and unlabelled chemical structure data to improve predictive accuracy and generalizability; and (3) a dictionary learning module that explains the prediction via a small set of coefficients which measure the relevance of each input sub-structures to the DDI outcome. We evaluated CASTER on two real-world DDI datasets and showed that it performed better than state-of-the-art baselines and provided interpretable predictions. △ Less

Submitted 19 November, 2019; v1 submitted 14 November, 2019; originally announced November 2019.

Comments: Accepted by AAAI 2020

arXiv:1910.02107 [pdf, other]

GENN: Predicting Correlated Drug-drug Interactions with Graph Energy Neural Networks

Authors: Tengfei Ma, Junyuan Shang, Cao Xiao, Jimeng Sun

Abstract: Gaining more comprehensive knowledge about drug-drug interactions (DDIs) is one of the most important tasks in drug development and medical practice. Recently graph neural networks have achieved great success in this task by modeling drugs as nodes and drug-drug interactions as links and casting DDI predictions as link prediction problems. However, correlations between link labels (e.g., DDI types… ▽ More Gaining more comprehensive knowledge about drug-drug interactions (DDIs) is one of the most important tasks in drug development and medical practice. Recently graph neural networks have achieved great success in this task by modeling drugs as nodes and drug-drug interactions as links and casting DDI predictions as link prediction problems. However, correlations between link labels (e.g., DDI types) were rarely considered in existing works. We propose the graph energy neural network (GENN) to explicitly model link type correlations. We formulate the DDI prediction task as a structure prediction problem and introduce a new energy-based model where the energy function is defined by graph neural networks. Experiments on two real-world DDI datasets demonstrated that GENN is superior to many baselines without consideration of link type correlations and achieved $13.77\%$ and $5.01\%$ PR-AUC improvement on the two datasets, respectively. We also present a case study in which \mname can better capture meaningful DDI correlations compared with baseline models. △ Less

Submitted 7 October, 2019; v1 submitted 4 October, 2019; originally announced October 2019.

arXiv:1904.00232 [pdf]

Unifying Modular and Core-Periphery Structure in Functional Brain Networks over Development

Authors: Shi Gu, Cedric Huchuan Xia, Rastko Ciric, Tyler M. Moore, Ruben C. Gur, Raquel E. Gur, Theodore D. Satterthwaite, Danielle S. Bassett

Abstract: At rest, human brain functional networks display striking modular architecture in which coherent clusters of brain regions are activated. The modular account of brain function is pervasive, reliable, and reproducible. Yet, a complementary perspective posits a core-periphery or rich-club account of brain function, where hubs are densely interconnected with one another, allowing for integrative proc… ▽ More At rest, human brain functional networks display striking modular architecture in which coherent clusters of brain regions are activated. The modular account of brain function is pervasive, reliable, and reproducible. Yet, a complementary perspective posits a core-periphery or rich-club account of brain function, where hubs are densely interconnected with one another, allowing for integrative processing. Unifying these two perspectives has remained difficult due to the fact that the methodological tools to identify modules are entirely distinct from the methodological tools to identify core-periphery structure. Here we leverage a recently-developed model-based approach -- the weighted stochastic block model -- that simultaneously uncovers modular and core-periphery structure, and we apply it to fMRI data acquired at rest in 872 youth of the Philadelphia Neurodevelopmental Cohort. We demonstrate that functional brain networks display rich meso-scale organization beyond that sought by modularity maximization techniques. Moreover, we show that this meso-scale organization changes appreciably over the course of neurodevelopment, and that individual differences in this organization predict individual differences in cognition more accurately than module organization alone. Broadly, our study provides a unified assessment of modular and core-periphery structure in functional brain networks, providing novel insights into their development and implications for behavior. △ Less

Submitted 4 April, 2019; v1 submitted 30 March, 2019; originally announced April 2019.

arXiv:1703.10451 [pdf]

Walking behavior in a circular arena modified by pulsed light stimulation in Drosophila melanogaster w1118 line

Authors: Shuang Qiu, Chengfeng Xiao

Abstract: The Drosophila melanogaster white-eyed w1118 line serves as a blank control, allowing genetic recombination of any gene of interest along with a readily recognizable marker. w1118 flies display behavioral susceptibility to environmental stimulation such as light. It is of great importance to characterize the behavioral performance of w1118 flies because this would provide a baseline from which the… ▽ More The Drosophila melanogaster white-eyed w1118 line serves as a blank control, allowing genetic recombination of any gene of interest along with a readily recognizable marker. w1118 flies display behavioral susceptibility to environmental stimulation such as light. It is of great importance to characterize the behavioral performance of w1118 flies because this would provide a baseline from which the effect of the gene of interest could be differentiated. Little work has been performed to characterize the walking behavior in adult w1118 flies. Here we show that pulsed light stimulation increased the regularity of walking trajectories of w1118 flies in circular arenas. We statistically modeled the distribution of distances to center and extracted the walking structures of w1118 flies. Pulsed light stimulation redistributed the time proportions for individual walking structures. Specifically, pulsed light stimulation reduced the episodes of crossing over the central region of the arena. An addition of four genomic copies of mini-white, a common marker gene for eye color, mimicked the effect of pulsed light stimulation in reducing crossing in a circular arena. The reducing effect of mini-white was copy-number-dependent. These findings highlight the rhythmic light stimulation-evoked modifications of walking behavior in w1118 flies and an unexpected behavioral consequence of mini-white in transgenic flies carrying w1118 isogenic background. △ Less

Submitted 18 November, 2017; v1 submitted 30 March, 2017; originally announced March 2017.

Comments: 27 pages, 6 figures, research article

arXiv:1606.01358 [pdf, ps, other]

doi 10.1209/0295-5075/114/30001

Firing regulation of fast-spiking interneurons by autaptic inhibition

Authors: Daqing Guo, Mingming Chen, Matjaz Perc, Shengdun Wu, Chuan Xia, Yangsong Zhang, Peng Xu, Yang Xia, Dezhong Yao

Abstract: Fast-spiking (FS) interneurons in the brain are self-innervated by powerful inhibitory GABAergic autaptic connections. By computational modelling, we investigate how autaptic inhibition regulates the firing response of such interneurons. Our results indicate that autaptic inhibition both boosts the current threshold for action potential generation as well as modulates the input-output gain of FS i… ▽ More Fast-spiking (FS) interneurons in the brain are self-innervated by powerful inhibitory GABAergic autaptic connections. By computational modelling, we investigate how autaptic inhibition regulates the firing response of such interneurons. Our results indicate that autaptic inhibition both boosts the current threshold for action potential generation as well as modulates the input-output gain of FS interneurons. The autaptic transmission delay is identified as a key parameter that controls the firing patterns and determines multistability regions of FS interneurons. Furthermore, we observe that neuronal noise influences the firing regulation of FS interneurons by autaptic inhibition and extends their dynamic range for encoding inputs. Importantly, autaptic inhibition modulates noise-induced irregular firing of FS interneurons, such that coherent firing appears at an optimal autaptic inhibition level. Our result reveal the functional roles of autaptic inhibition in taming the firing dynamics of FS interneurons. △ Less

Submitted 4 June, 2016; originally announced June 2016.

Comments: 6 pages, 5 figures

Journal ref: EPL 114 (2016) 30001

arXiv:1605.01102 [pdf, other]

Heterogeneous resource allocation can change social hierarchy in public goods games

Authors: Sandro Meloni, Cheng-Yi Xia, Yamir Moreno

Abstract: Public Goods Games represent one of the most useful tools to study group interactions between individuals. However, even if they could provide an explanation for the emergence and stability of cooperation in modern societies, they are not able to reproduce some key features observed in social and economical interactions. The typical shape of wealth distribution - known as Pareto Law - and the micr… ▽ More Public Goods Games represent one of the most useful tools to study group interactions between individuals. However, even if they could provide an explanation for the emergence and stability of cooperation in modern societies, they are not able to reproduce some key features observed in social and economical interactions. The typical shape of wealth distribution - known as Pareto Law - and the microscopic organization of wealth production are two of them. Here, we introduce a modification to the classical formulation of Public Goods Games that allows for the emergence of both of these features from first principles. Unlike traditional Public Goods Games on networks, where players contribute equally to all the games in which they participate, we allow individuals to redistribute their contribution according to what they earned in previous rounds. Results from numerical simulations show that not only a Pareto distribution for the payoffs naturally emerges but also that if players don't invest enough in one round they can act as defectors even if they are formally cooperators. Finally, we also show that the players self-organize in a very productive backbone that covers almost perfectly the minimum spanning tree of the underlying interaction network. Our results not only give an explanation for the presence of the wealth heterogeneity observed in real data but also points to a conceptual change regarding how cooperation is defined in collective dilemmas. △ Less

Submitted 3 May, 2016; originally announced May 2016.

Comments: 8 pages, 5 figures, 55 references

arXiv:1502.07724 [pdf, other]

doi 10.1209/0295-5075/109/58002

Dynamic instability of cooperation due to diverse activity patterns in evolutionary social dilemmas

Authors: Cheng-Yi Xia, Sandro Meloni, Matjaz Perc, Yamir Moreno

Abstract: Individuals might abstain from participating in an instance of an evolutionary game for various reasons, ranging from lack of interest to risk aversion. In order to understand the consequences of such diverse activity patterns on the evolution of cooperation, we study a weak prisoner's dilemma where each player's participation is probabilistic rather than certain. Players that do not participate g… ▽ More Individuals might abstain from participating in an instance of an evolutionary game for various reasons, ranging from lack of interest to risk aversion. In order to understand the consequences of such diverse activity patterns on the evolution of cooperation, we study a weak prisoner's dilemma where each player's participation is probabilistic rather than certain. Players that do not participate get a null payoff and are unable to replicate. We show that inactivity introduces cascading failures of cooperation, which are particularly severe on scale-free networks with frequently inactive hubs. The drops in the fraction of cooperators are sudden, while the spatiotemporal reorganization of compact cooperative clusters, and thus the recovery, takes time. Nevertheless, if the activity of players is directly proportional to their degree, or if the interaction network is not strongly heterogeneous, the overall evolution of cooperation is not impaired. This is because inactivity negatively affects the potency of low-degree defectors, who are hence unable to utilize on their inherent evolutionary advantage. Between cascading failures, the fraction of cooperators is therefore higher than usual, which lastly balances out the asymmetric dynamic instabilities that emerge due to intermittent blackouts of cooperative hubs. △ Less

Submitted 26 February, 2015; originally announced February 2015.

Comments: 6 two-column pages, 6 figures; accepted for publication in Europhysics Letters

Journal ref: EPL 109 (2015) 58002

arXiv:1406.3258 [pdf, other]

Scanning a Poisson Random Field for Local Signals

Authors: Nancy R. Zhang, Benjamin Yakir, Charlie L. Xia, David Siegmund

Abstract: The detection of local genomic signals using high-throughput DNA sequencing data can be cast as a problem of scanning a Poisson random field for local changes in the rate of the process. We propose a likelihood-based framework for for such scans, and derive formulas for false positive rate control and power calculations. The framework can also accommodate mixtures of Poisson processes to deal with… ▽ More The detection of local genomic signals using high-throughput DNA sequencing data can be cast as a problem of scanning a Poisson random field for local changes in the rate of the process. We propose a likelihood-based framework for for such scans, and derive formulas for false positive rate control and power calculations. The framework can also accommodate mixtures of Poisson processes to deal with over-dispersion. As a specific, detailed example, we consider the detection of insertions and deletions by paired-end DNA-sequencing. We propose several statistics for this problem, compare their power under current experimental designs, and illustrate their application on an Illumina Platinum Genomes data set. △ Less

Submitted 12 June, 2014; originally announced June 2014.

arXiv:1402.4523 [pdf, other]

Dynamics of interacting diseases

Authors: Joaquín Sanz, Cheng-Yi Xia, Sandro Meloni, Yamir Moreno

Abstract: Current modeling of infectious diseases allows for the study of complex and realistic scenarios that go from the population to the individual level of description. However, most epidemic models assume that the spreading process takes place on a single level (be it a single population, a meta-population system or a network of contacts). In particular, interdependent contagion phenomena can only be… ▽ More Current modeling of infectious diseases allows for the study of complex and realistic scenarios that go from the population to the individual level of description. However, most epidemic models assume that the spreading process takes place on a single level (be it a single population, a meta-population system or a network of contacts). In particular, interdependent contagion phenomena can only be addressed if we go beyond the scheme one pathogen-one network. In this paper, we propose a framework that allows describing the spreading dynamics of two concurrent diseases. Specifically, we characterize analytically the epidemic thresholds of the two diseases for different scenarios and also compute the temporal evolution characterizing the unfolding dynamics. Results show that there are regions of the parameter space in which the onset of a disease's outbreak is conditioned to the prevalence levels of the other disease. Moreover, we show, for the SIS scheme, that under certain circumstances, finite and not vanishing epidemic thresholds are found even at the thermodynamic limit for scale-free networks. For the SIR scenario, the phenomenology is richer and additional interdependencies show up. We also find that the secondary thresholds for the SIS and SIR models are different, which results directly from the interaction between both diseases. Our work thus solve an important problem and pave the way towards a more comprehensive description of the dynamics of interacting diseases. △ Less

Submitted 30 July, 2014; v1 submitted 18 February, 2014; originally announced February 2014.

Comments: 24 pages, 9 figures, 4 tables, 3 appendices. Final version accepted for publication in Physical Review X

Journal ref: Phys. Rev. X 4, 041005 (2014)

arXiv:1304.6158 [pdf]

doi 10.4238/2013.April.2.13

Genetic analysis of differentiation of T-helper lymphocytes

Authors: Qixin Wang, Menghui Li, Li Charlie Xia, Ge Wen, Hualong Zu, Mingyi Gao

Abstract: In the human immune system, T-helper cells are able to differentiate into two lymphocyte subsets: Th1 and Th2. The intracellular signaling pathways of differentiation form a dynamic regulation network by secreting distinctive types of cytokines, while differentiation is regulated by two major gene loci: T-bet and GATA-3. We developed a system dynamics model to simulate the differentiation and re-d… ▽ More In the human immune system, T-helper cells are able to differentiate into two lymphocyte subsets: Th1 and Th2. The intracellular signaling pathways of differentiation form a dynamic regulation network by secreting distinctive types of cytokines, while differentiation is regulated by two major gene loci: T-bet and GATA-3. We developed a system dynamics model to simulate the differentiation and re-differentiation process of T-helper cells, based on gene expression levels of T-bet and GATA-3 during differentiation of these cells. We arrived at three ultimate states of the model and came to the conclusion that cell differentiation potential exists as long as the system dynamics is at an unstable equilibrium point; the T-helper cells will no longer have the potential of differentiation when the model reaches a stable equilibrium point. In addition, the time lag caused by expression of transcription factors can lead to oscillations in the secretion of cytokines during differentiation. △ Less

Submitted 22 April, 2013; originally announced April 2013.

Journal ref: Genetics and Molecular Research 2 (2012) 972-987

Showing 1–25 of 25 results for author: Xiao, C