-
SUGAR: Spherical Ultrafast Graph Attention Framework for Cortical Surface Registration
Authors:
Jianxun Ren,
Ning An,
Youjia Zhang,
Danyang Wang,
Zhenyu Sun,
Cong Lin,
Weigang Cui,
Weiwei Wang,
Ying Zhou,
Wei Zhang,
Qingyu Hu,
** Zhang,
Dan Hu,
Danhong Wang,
Hesheng Liu
Abstract:
Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a lea…
▽ More
Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a learning-based method that exceeds the state-of-the-art conventional methods simultaneously in computational efficiency, registration accuracy, and distortion control, despite the theoretically greater representational capabilities of deep learning approaches. To address the challenge, we present SUGAR, a unified unsupervised deep-learning framework for both rigid and non-rigid registration. SUGAR incorporates a U-Net-based spherical graph attention network and leverages the Euler angle representation for deformation. In addition to the similarity loss, we introduce fold and multiple distortion losses, to preserve topology and minimize various types of distortions. Furthermore, we propose a data augmentation strategy specifically tailored for spherical surface registration, enhancing the registration performance. Through extensive evaluation involving over 10,000 scans from 7 diverse datasets, we showed that our framework exhibits comparable or superior registration performance in accuracy, distortion, and test-retest reliability compared to conventional and learning-based methods. Additionally, SUGAR achieves remarkable sub-second processing times, offering a notable speed-up of approximately 12,000 times in registering 9,000 subjects from the UK Biobank dataset in just 32 minutes. This combination of high registration performance and accelerated processing time may greatly benefit large-scale neuroimaging studies.
△ Less
Submitted 2 July, 2023;
originally announced July 2023.
-
DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations
Authors:
Yuanfeng Ji,
Lu Zhang,
Jiaxiang Wu,
Bingzhe Wu,
Long-Kai Huang,
Tingyang Xu,
Yu Rong,
Lanqing Li,
Jie Ren,
Ding Xue,
Houtim Lai,
Shaoyong Xu,
**g Feng,
Wei Liu,
** Luo,
Shuigeng Zhou,
Junzhou Huang,
Peilin Zhao,
Yatao Bian
Abstract:
AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise},…
▽ More
AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise}, which is inevitable in real world AIDD applications.
In this work, we present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery, which comes with an open-source Python package that fully automates the data curation and OOD benchmarking processes. We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction, which involves both macromolecule (protein target) and small-molecule (drug compound). In contrast to only providing fixed datasets, DrugOOD offers automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise annotations and rigorous benchmarking of state-of-the-art OOD algorithms. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for \emph{graph OOD learning} problems. Extensive empirical studies have shown a significant performance gap between in-distribution and out-of-distribution experiments, which highlights the need to develop better schemes that can allow for OOD generalization under noise for AIDD.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
Competitive Exclusion in a DAE Model for Microbial Electrolysis Cells
Authors:
Harry J. Dudley,
Zhiyong Jason Ren,
David M. Bortz
Abstract:
Microbial electrolysis cells (MECs) employ electroactive bacteria to perform extracellular electron transfer, enabling hydrogen generation from biodegradable substrates. In previous work, we developed and analyzed a differential-algebraic equation (DAE) model for MECs. The model resembles a chemostat with ordinary differential equations (ODEs) for concentrations of substrate, microorganisms, and a…
▽ More
Microbial electrolysis cells (MECs) employ electroactive bacteria to perform extracellular electron transfer, enabling hydrogen generation from biodegradable substrates. In previous work, we developed and analyzed a differential-algebraic equation (DAE) model for MECs. The model resembles a chemostat with ordinary differential equations (ODEs) for concentrations of substrate, microorganisms, and an extracellular mediator involved in electron transfer. There is also an algebraic constraint for electric current and hydrogen production. Our goal is to determine the outcome of competition between methanogenic archaea and electroactive bacteria, because only the latter contribute to electric current and resulting hydrogen production. We investigate asymptotic stability in two industrially relevant versions of the model. An important aspect of chemostats models is the principle of competitive exclusion -- only microbes which grow at the lowest substrate concentration will survive as $t\to\infty$. We show that if methanogens grow at the lowest substrate concentration, then the equilibrium corresponding to competitive exclusion by methanogens is globally asymptotically stable. The analogous result for electroactive bacteria is not necessarily true. We show that local asymptotic stability of exclusion by electroactive bacteria is not guaranteed, even in a simplified version of the model. In this case, even if electroactive bacteria can grow at the lowest substrate concentration, a few additional conditions are required to guarantee local asymptotic stability. We also provide numerical simulations supporting these arguments. Our results suggest operating conditions that are most conducive to success of electroactive bacteria and the resulting current and hydrogen production in MECs. This will help identify when methane production or electricity and hydrogen production are favored.
△ Less
Submitted 6 July, 2020; v1 submitted 5 June, 2019;
originally announced June 2019.
-
Tumor Microenvironment-based Gene Signatures Divides Novel Immune and Stromal Subgroup Classification of Lung Adenocarcinoma
Authors:
Zihang Zeng,
Jiali Li,
Nannan Zhang,
Xue** Jiang,
Yan** Gao,
Liexi Xu,
Xingyu Liu,
Jiarui Chen,
Yuke Gao,
Linzhi Han,
Jiangbo Ren,
Yan Gong,
Conghua Xie
Abstract:
Tumor microenvironment has complex effects on tumorigenesis and metastasis. However, there is still a lack of comprehensive understanding of the relationship among molecular and cellular characteristics in tumor microenvironment, clinical prognosis and immunotherpy response. In this study, the immune and stromal (non-immune) signatures of tumor microenvironment were integrated to identify novel su…
▽ More
Tumor microenvironment has complex effects on tumorigenesis and metastasis. However, there is still a lack of comprehensive understanding of the relationship among molecular and cellular characteristics in tumor microenvironment, clinical prognosis and immunotherpy response. In this study, the immune and stromal (non-immune) signatures of tumor microenvironment were integrated to identify novel subgroups of lung adenocarcinoma by eigendecomposition and extraction algorithms of bioinformatics and machine learning, such as non-negative matrix factorization and multitask learning. Tumors were classified into 4 groups according to the activation of immunity and stroma by novel signatures. The 4 groups had different mutation landscape, molecular, cellular characteristics and prognosis, which have been validation in 6 independent data sets containing 1551 patients. High-immune and low-stromal activation group links to high immunocyte infiltration, high immunocompetence, low fibroblasts, endothelial cells, collagen, laminin, tumor mutation burden, and better overall survival. We developed a novel model based on tumor microenvironment by integrating immune and stromal activation, namely PMBT (prognostic model based on tumor microenvironment). The PMBT showed the value to predict overall survival and immunotherapy responses.
△ Less
Submitted 10 May, 2019;
originally announced May 2019.
-
Identifying viruses from metagenomic data by deep learning
Authors:
Jie Ren,
Kai Song,
Chao Deng,
Nathan A. Ahlgren,
Jed A. Fuhrman,
Yi Li,
Xiaohui Xie,
Fengzhu Sun
Abstract:
The recent development of metagenomic sequencing makes it possible to sequence microbial genomes including viruses in an environmental sample. Identifying viral sequences from metagenomic data is critical for downstream virus analyses. The existing reference-based and gene homology-based methods are not efficient in identifying unknown viruses or short viral sequences. Here we have developed a ref…
▽ More
The recent development of metagenomic sequencing makes it possible to sequence microbial genomes including viruses in an environmental sample. Identifying viral sequences from metagenomic data is critical for downstream virus analyses. The existing reference-based and gene homology-based methods are not efficient in identifying unknown viruses or short viral sequences. Here we have developed a reference-free and alignment-free machine learning method, DeepVirFinder, for predicting viral sequences in metagenomic data using deep learning techniques. DeepVirFinder was trained based on a large number of viral sequences discovered before May 2015. Evaluated on the sequences after that date, DeepVirFinder outperformed the state-of-the-art method VirFinder at all contig lengths. Enlarging the training data by adding millions of purified viral sequences from environmental metavirome samples significantly improves the accuracy for predicting under-represented viruses. Applying DeepVirFinder to real human gut metagenomic samples from patients with colorectal carcinoma (CRC) identified 51,138 viral sequences belonging to 175 bins. Ten bins were associated with the cancer status, indicating their potential use for non-invasive diagnosis of CRC. In summary, DeepVirFinder greatly improved the precision and recall rates of viral identification, and it will significantly accelerate the discovery rate of viruses.
△ Less
Submitted 20 June, 2018;
originally announced June 2018.
-
Alignment-Free Sequence Analysis and Applications
Authors:
Jie Ren,
Xin Bai,
Yang Young Lu,
Ku** Tang,
Ying Wang,
Gesine Reinert,
Fengzhu Sun
Abstract:
Genome and metagenome comparisons based on large amounts of next-generation sequencing (NGS) data pose significant challenges for alignment-based approaches due to the huge data size and the relatively short length of the reads. Alignment-free approaches based on the counts of word patterns in NGS data do not depend on the complete genome and are generally computationally efficient. Thus, they con…
▽ More
Genome and metagenome comparisons based on large amounts of next-generation sequencing (NGS) data pose significant challenges for alignment-based approaches due to the huge data size and the relatively short length of the reads. Alignment-free approaches based on the counts of word patterns in NGS data do not depend on the complete genome and are generally computationally efficient. Thus, they contribute significantly to genome and metagenome comparison. Recently, novel statistical approaches have been developed for the comparison of both long and shotgun sequences. These approaches have been applied to many problems including the comparison of gene regulatory regions, genome sequences, metagenomes, binning contigs in metagenomic data, identification of virus-host interactions, and detection of horizontal gene transfers. We provide an updated review of these applications and other related developments of word-count based approaches for alignment-free sequence analysis.
△ Less
Submitted 26 March, 2018;
originally announced March 2018.
-
Sensitivity and Bifurcation Analysis of a DAE Model for a Microbial Electrolysis Cell
Authors:
Harry J. Dudley,
Lu Lu,
Zhiyong Jason Ren,
David M. Bortz
Abstract:
Microbial electrolysis cells (MECs) are a promising new technology for producing hydrogen cheaply, efficiently, and sustainably. However, to scale up this technology, we need a better understanding of the processes in the devices. In this effort, we present a differential-algebraic equation (DAE) model of a microbial electrolysis cell with an algebraic constraint on current. We then perform sensit…
▽ More
Microbial electrolysis cells (MECs) are a promising new technology for producing hydrogen cheaply, efficiently, and sustainably. However, to scale up this technology, we need a better understanding of the processes in the devices. In this effort, we present a differential-algebraic equation (DAE) model of a microbial electrolysis cell with an algebraic constraint on current. We then perform sensitivity and bifurcation analysis for the DAE system. The model can be applied either to batch-cycle MECs or to continuous-flow MECs. We conduct differential-algebraic sensitivity analysis after fitting simulations to current density data for a batch-cycle MEC. The sensitivity analysis suggests which parameters have the greatest influence on the current density at particular times during the experiment. In particular, growth and consumption parameters for exoelectrogenic bacteria have a strong effect prior to the peak current density. An alternative strategy to maximizing peak current density is maintaining a long term stable equilibrium with non-zero current density in a continuous-flow MEC. We characterize the minimum dilution rate required for a stable nonzero current equilibrium and demonstrate transcritical bifurcations in the dilution rate parameter that exchange stability between several curves of equilibria. Specifically, increasing the dilution rate transitions the system through three regimes where the stable equilibrium exhibits (i) competitive exclusion by methanogens, (ii) coexistence, and (iii) competitive exclusion by exolectrogens. Positive long term current production is only feasible in the final two regimes. These results suggest how to modify system parameters to increase peak current density in a batch-cycle MEC or to increase the long term current density equilibrium value in a continuous-flow MEC.
△ Less
Submitted 17 February, 2018;
originally announced February 2018.
-
Discrete Dynamic Causal Modeling and Its Relationship with Directed Information
Authors:
Zhe Wang,
Yu Zheng,
David C. Zhu,
Jian Ren,
Tongtong Li
Abstract:
This paper explores the discrete Dynamic Causal Modeling (DDCM) and its relationship with Directed Information (DI). We prove the conditional equivalence between DDCM and DI in characterizing the causal relationship between two brain regions. The theoretical results are demonstrated using fMRI data obtained under both resting state and stimulus based state. Our numerical analysis is consistent wit…
▽ More
This paper explores the discrete Dynamic Causal Modeling (DDCM) and its relationship with Directed Information (DI). We prove the conditional equivalence between DDCM and DI in characterizing the causal relationship between two brain regions. The theoretical results are demonstrated using fMRI data obtained under both resting state and stimulus based state. Our numerical analysis is consistent with that reported in previous study.
△ Less
Submitted 18 September, 2017;
originally announced September 2017.
-
Inference of Markovian Properties of Molecular Sequences from NGS Data and Applications to Comparative Genomics
Authors:
Jie Ren,
Kai Song,
Minghua Deng,
Gesine Reinert,
Charles H. Cannon,
Fengzhu Sun
Abstract:
Next Generation Sequencing (NGS) technologies generate large amounts of short read data for many different organisms. The fact that NGS reads are generally short makes it challenging to assemble the reads and reconstruct the original genome sequence. For clustering genomes using such NGS data, word-count based alignment-free sequence comparison is a promising approach, but for this approach, the u…
▽ More
Next Generation Sequencing (NGS) technologies generate large amounts of short read data for many different organisms. The fact that NGS reads are generally short makes it challenging to assemble the reads and reconstruct the original genome sequence. For clustering genomes using such NGS data, word-count based alignment-free sequence comparison is a promising approach, but for this approach, the underlying expected word counts are essential.
A plausible model for this underlying distribution of word counts is given through modelling the DNA sequence as a Markov chain (MC). For single long sequences, efficient statistics are available to estimate the order of MCs and the transition probability matrix for the sequences. As NGS data do not provide a single long sequence, inference methods on Markovian properties of sequences based on single long sequences cannot be directly used for NGS short read data.
Here we derive a normal approximation for such word counts. We also show that the traditional Chi-square statistic has an approximate gamma distribution, using the Lander-Waterman model for physical map**. We propose several methods to estimate the order of the MC based on NGS reads and evaluate them using simulations. We illustrate the applications of our results by clustering genomic sequences of several vertebrate and tree species based on NGS reads using alignment-free sequence dissimilarity measures. We find that the estimated order of the MC has a considerable effect on the clustering results, and that the clustering results that use a MC of the estimated order give a plausible clustering of the species.
△ Less
Submitted 3 April, 2015;
originally announced April 2015.
-
Mean Exit Time and Escape Probability for a Tumor Growth System under Non-Gaussian Noise
Authors:
Jian Ren,
Chu** Li,
Ting Gao,
Xingye Kan,
**qiao Duan
Abstract:
Effects of non-Gaussian $α-$stable Lévy noise on the Gompertz tumor growth model are quantified by considering the mean exit time and escape probability of the cancer cell density from inside a safe or benign domain. The mean exit time and escape probability problems are formulated in a differential-integral equation with a fractional Laplacian operator. Numerical simulations are conducted to eval…
▽ More
Effects of non-Gaussian $α-$stable Lévy noise on the Gompertz tumor growth model are quantified by considering the mean exit time and escape probability of the cancer cell density from inside a safe or benign domain. The mean exit time and escape probability problems are formulated in a differential-integral equation with a fractional Laplacian operator. Numerical simulations are conducted to evaluate how the mean exit time and escape probability vary or bifurcates when $α$ changes. Some bifurcation phenomena are observed and their impacts are discussed.
△ Less
Submitted 28 November, 2011;
originally announced November 2011.
-
Reverse engineering of complex dynamical networks in the presence of time-delayed interactions based on noisy time series
Authors:
Wen-Xu Wang,
Jie Ren,
Ying-Cheng Lai,
Baowen Li
Abstract:
Reverse engineering of complex dynamical networks is important for a variety of fields where uncovering the full topology of unknown networks and estimating parameters characterizing the network structure and dynamical processes are of interest. We consider complex oscillator networks with time-delayed interactions in a noisy environment, and develop an effective method to infer the full topology…
▽ More
Reverse engineering of complex dynamical networks is important for a variety of fields where uncovering the full topology of unknown networks and estimating parameters characterizing the network structure and dynamical processes are of interest. We consider complex oscillator networks with time-delayed interactions in a noisy environment, and develop an effective method to infer the full topology of the network and evaluate the amount of time delay based solely on noise- contaminated time series. In particular, we develop an analytic theory establishing that the dynamical correlation matrix, which can be constructed purely from time series, can be manipulated to yield both the network topology and the amount of time delay simultaneously. Extensive numerical support is provided to validate the method. While our method provides a viable solution to the network inverse problem, significant difficulties, limitations, and challenges still remain, and these are discussed thoroughly.
△ Less
Submitted 18 December, 2012; v1 submitted 31 January, 2011;
originally announced January 2011.
-
Thermodynamic stability of small-world oscillator networks: A case study of proteins
Authors:
Jie Ren,
Baowen Li
Abstract:
We study vibrational thermodynamic stability of small-world oscillator networks, by relating the average mean-square displacement $S$ of oscillators to the eigenvalue spectrum of the Laplacian matrix of networks. We show that the cross-links suppress $S$ effectively and there exist two phases on the small-world networks: 1) an unstable phase: when $p\ll1/N$, $S\sim N$; 2) a stable phase: when…
▽ More
We study vibrational thermodynamic stability of small-world oscillator networks, by relating the average mean-square displacement $S$ of oscillators to the eigenvalue spectrum of the Laplacian matrix of networks. We show that the cross-links suppress $S$ effectively and there exist two phases on the small-world networks: 1) an unstable phase: when $p\ll1/N$, $S\sim N$; 2) a stable phase: when $p\gg1/N$, $S\sim p^{-1}$, \emph{i.e.}, $S/N\sim E_{cr}^{-1}$. Here, $p$ is the parameter of small-world, $N$ is the number of oscillators, and $E_{cr}=pN$ is the number of cross-links. The results are exemplified by various real protein structures that follow the same scaling behavior $S/N\sim E_{cr}^{-1}$ of the stable phase. We also show that it is the "small-world" property that plays the key role in the thermodynamic stability and is responsible for the universal scaling $S/N\sim E_{cr}^{-1}$, regardless of the model details.
△ Less
Submitted 7 May, 2009;
originally announced May 2009.
-
Randomness enhances cooperation: a resonance type phenomenon in evolutionary games
Authors:
Jie Ren,
Wen-Xu Wang,
Feng Qi
Abstract:
We investigate the effect of randomness in both relationships and decisions on the evolution of cooperation. Simulation results show, in such randomness' presence, the system evolves to a more frequency cooperation state than in its absence. Specifically, there is an optimal amount of randomness, which can induce the highest level of cooperation. The mechanism of randomness promoting cooperation…
▽ More
We investigate the effect of randomness in both relationships and decisions on the evolution of cooperation. Simulation results show, in such randomness' presence, the system evolves to a more frequency cooperation state than in its absence. Specifically, there is an optimal amount of randomness, which can induce the highest level of cooperation. The mechanism of randomness promoting cooperation resembles a coherence-resonance-like fashion, which could be of particular interest in evolutionary game dynamics in economic, biological and social systems.
△ Less
Submitted 6 April, 2007; v1 submitted 18 July, 2006;
originally announced July 2006.
-
Universal core modules govern the dynamics of cellular regulatory networks
Authors:
**g-**g Li,
Jie Ren,
Zhi Liang,
Meng Xu,
Dan Xie,
De-Shuang Huang,
Bing-Hong Wang,
Liwen Niu,
Huanqing Feng,
Wen-Xu Wang
Abstract:
This paper contains some immature results. It have been withdrawn.
This paper contains some immature results. It have been withdrawn.
△ Less
Submitted 13 January, 2007; v1 submitted 10 January, 2006;
originally announced January 2006.