-
Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation
Authors:
Yikun Zhang,
Geyan Ye,
Chaohao Yuan,
Bo Han,
Long-Kai Huang,
Jianhua Yao,
Wei Liu,
Yu Rong
Abstract:
Molecule-and-text cross-modal representation learning has emerged as a promising direction for enhancing the quality of molecular representation, thereby improving performance in various scientific fields, including drug discovery and materials science. Existing studies adopt a global alignment approach to learn the knowledge from different modalities. These global alignment approaches fail to cap…
▽ More
Molecule-and-text cross-modal representation learning has emerged as a promising direction for enhancing the quality of molecular representation, thereby improving performance in various scientific fields, including drug discovery and materials science. Existing studies adopt a global alignment approach to learn the knowledge from different modalities. These global alignment approaches fail to capture fine-grained information, such as molecular fragments and their corresponding textual description, which is crucial for downstream tasks. Furthermore, it is incapable to model such information using a similar global alignment strategy due to data scarcity of paired local part annotated data from existing datasets. In this paper, we propose Atomas, a multi-modal molecular representation learning framework to jointly learn representations from SMILES string and text. We design a Hierarchical Adaptive Alignment model to concurrently learn the fine-grained fragment correspondence between two modalities and align these representations of fragments in three levels. Additionally, Atomas's end-to-end training framework incorporates the tasks of understanding and generating molecule, thereby supporting a wider range of downstream tasks. In the retrieval task, Atomas exhibits robust generalization ability and outperforms the baseline by 30.8% of recall@1 on average. In the generation task, Atomas achieves state-of-the-art results in both molecule captioning task and molecule generation task. Moreover, the visualization of the Hierarchical Adaptive Alignment model further confirms the chemical significance of our approach. Our codes can be found at https://anonymous.4open.science/r/Atomas-03C3.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Functional Protein Design with Local Domain Alignment
Authors:
Chaohao Yuan,
Songyou Li,
Geyan Ye,
Yikun Zhang,
Long-Kai Huang,
Wenbing Huang,
Wei Liu,
Jianhua Yao,
Yu Rong
Abstract:
The core challenge of de novo protein design lies in creating proteins with specific functions or properties, guided by certain conditions. Current models explore to generate protein using structural and evolutionary guidance, which only provide indirect conditions concerning functions and properties. However, textual annotations of proteins, especially the annotations for protein domains, which d…
▽ More
The core challenge of de novo protein design lies in creating proteins with specific functions or properties, guided by certain conditions. Current models explore to generate protein using structural and evolutionary guidance, which only provide indirect conditions concerning functions and properties. However, textual annotations of proteins, especially the annotations for protein domains, which directly describe the protein's high-level functionalities, properties, and their correlation with target amino acid sequences, remain unexplored in the context of protein design tasks. In this paper, we propose Protein-Annotation Alignment Generation (PAAG), a multi-modality protein design framework that integrates the textual annotations extracted from protein database for controllable generation in sequence space. Specifically, within a multi-level alignment module, PAAG can explicitly generate proteins containing specific domains conditioned on the corresponding domain annotations, and can even design novel proteins with flexible combinations of different kinds of annotations. Our experimental results underscore the superiority of the aligned protein representations from PAAG over 7 prediction tasks. Furthermore, PAAG demonstrates a nearly sixfold increase in generation success rate (24.7% vs 4.7% in zinc finger, and 54.3% vs 8.7% in the immunoglobulin domain) in comparison to the existing model.
△ Less
Submitted 27 May, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptides
Authors:
Kewei Li,
Yuqian Wu,
Yutong Guo,
Yinheng Li,
Yusi Fan,
Ruochi Zhang,
Lan Huang,
Fengfeng Zhou
Abstract:
Activity cliff (AC) is a phenomenon that a pair of similar molecules differ by a small structural alternation but exhibit a large difference in their biochemical activities. The AC of small molecules has been extensively investigated but limited knowledge is accumulated about the AC phenomenon in peptides with canonical amino acids. This study introduces a quantitative definition and benchmarking…
▽ More
Activity cliff (AC) is a phenomenon that a pair of similar molecules differ by a small structural alternation but exhibit a large difference in their biochemical activities. The AC of small molecules has been extensively investigated but limited knowledge is accumulated about the AC phenomenon in peptides with canonical amino acids. This study introduces a quantitative definition and benchmarking framework AMPCliff for the AC phenomenon in antimicrobial peptides (AMPs) composed by canonical amino acids. A comprehensive analysis of the existing AMP dataset reveals a significant prevalence of AC within AMPs. AMPCliff quantifies the activities of AMPs by the metric minimum inhibitory concentration (MIC), and defines 0.9 as the minimum threshold for the normalized BLOSUM62 similarity score between a pair of aligned peptides with at least two-fold MIC changes. This study establishes a benchmark dataset of paired AMPs in Staphylococcus aureus from the publicly available AMP dataset GRAMPA, and conducts a rigorous procedure to evaluate various AMP AC prediction models, including nine machine learning, four deep learning algorithms, four masked language models, and four generative language models. Our analysis reveals that these models are capable of detecting AMP AC events and the pre-trained protein language ESM2 model demonstrates superior performance across the evaluations. The predictive performance of AMP activity cliffs remains to be further improved, considering that ESM2 with 33 layers only achieves the Spearman correlation coefficient=0.50 for the regression task of the MIC values on the benchmark dataset. Source code and additional resources are available at https://www.healthinformaticslab.org/supp/ or https://github.com/Kewei2023/AMPCliff-generation.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Messenger RNA Design via Expected Partition Function and Continuous Optimization
Authors:
Ning Dai,
Wei Yu Tang,
Tianshuo Zhou,
David H. Mathews,
Liang Huang
Abstract:
The tasks of designing RNAs are discrete optimization problems, and several versions of these problems are NP-hard. As an alternative to commonly used local search methods, we formulate these problems as continuous optimization and develop a general framework for this optimization based on a generalization of classical partition function which we call "expected partition function". The basic idea…
▽ More
The tasks of designing RNAs are discrete optimization problems, and several versions of these problems are NP-hard. As an alternative to commonly used local search methods, we formulate these problems as continuous optimization and develop a general framework for this optimization based on a generalization of classical partition function which we call "expected partition function". The basic idea is to start with a distribution over all possible candidate sequences, and extend the objective function from a sequence to a distribution. We then use gradient descent-based optimization methods to improve the extended objective function, and the distribution will gradually shrink towards a one-hot sequence (i.e., a single sequence). As a case study, we consider the important problem of mRNA design with wide applications in vaccines and therapeutics. While the recent work of LinearDesign can efficiently optimize mRNAs for minimum free energy (MFE), optimizing for ensemble free energy is much harder and likely intractable. Our approach can consistently improve over the LinearDesign solution in terms of ensemble free energy, with bigger improvements on longer sequences.
△ Less
Submitted 1 March, 2024; v1 submitted 29 December, 2023;
originally announced January 2024.
-
Undesignable RNA Structure Identification via Rival Structure Generation and Structure Decomposition
Authors:
Tianshuo Zhou,
Wei Yu Tang,
David H. Mathews,
Liang Huang
Abstract:
RNA design is the search for a sequence or set of sequences that will fold into predefined structures, also known as the inverse problem of RNA folding. While numerous RNA design methods have been invented to find sequences capable of folding into a target structure, little attention has been given to the identification of undesignable structures according to the minimum free energy (MFE) criterio…
▽ More
RNA design is the search for a sequence or set of sequences that will fold into predefined structures, also known as the inverse problem of RNA folding. While numerous RNA design methods have been invented to find sequences capable of folding into a target structure, little attention has been given to the identification of undesignable structures according to the minimum free energy (MFE) criterion under the Turner model. In this paper, we address this gap by first introducing mathematical theorems outlining sufficient conditions for recognizing undesignable structures, then proposing efficient algorithms, guided by these theorems, to verify the undesignability of RNA structures. Through the application of these theorems and algorithms to the Eterna100 puzzles, we demonstrate the ability to efficiently establish that 15 of the puzzles indeed fall within the category of undesignable structures. In addition, we provide specific insights from the study of undesignability, in the hope that it will enable more understanding of RNA folding and RNA design.
△ Less
Submitted 26 February, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
LinearSankoff: Linear-time Simultaneous Folding and Alignment of RNA Homologs
Authors:
Sizhen Li,
Ning Dai,
He Zhang,
Apoorv Malik,
David H. Mathews,
Liang Huang
Abstract:
The classical Sankoff algorithm for the simultaneous folding and alignment of homologous RNA sequences is highly influential, but it suffers from two major limitations in efficiency and modeling power. First, it takes $O(n^6)$ for two sequences where n is the average sequence length. Most implementations and variations reduce the runtime to $O(n^3)$ by restricting the alignment search space, but t…
▽ More
The classical Sankoff algorithm for the simultaneous folding and alignment of homologous RNA sequences is highly influential, but it suffers from two major limitations in efficiency and modeling power. First, it takes $O(n^6)$ for two sequences where n is the average sequence length. Most implementations and variations reduce the runtime to $O(n^3)$ by restricting the alignment search space, but this is still too slow for long sequences such as full-length viral genomes. On the other hand, the Sankoff algorithm and all its existing implementations use a rather simplistic alignment model, which can result in poor alignment accuracy. To address these problems, we propose LinearSankoff, which seamlessly integrates the original Sankoff algorithm with a powerful Hidden Markov Model-based alignment module. This extension substantially improves alignment quality, which in turn benefits secondary structure prediction quality, confirmed over a diverse set of RNA families. LinearSankoff also applies beam search heuristics and the A$^\star$-like algorithm to achieve that runtime scales linearly with sequence length. LinearSankoff is the first linear-time algorithm for simultaneous folding and alignment, and the first such algorithm to scale to coronavirus genomes (n $\approx$ 30,000nt). It only takes 10 minutes for a pair of SARS-CoV-2 and SARS-related genomes, and outperforms previous work at identifying crucial conserved structures between the two genomes.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
DiffDTM: A conditional structure-free framework for bioactive molecules generation targeted for dual proteins
Authors:
Lei Huang,
Zheng Yuan,
Huihui Yan,
Rong Sheng,
Lin**g Liu,
Fuzhou Wang,
Weidun Xie,
Nanjun Chen,
Fei Huang,
Songfang Huang,
Ka-Chun Wong,
Yaoyun Zhang
Abstract:
Advances in deep generative models shed light on de novo molecule generation with desired properties. However, molecule generation targeted for dual protein targets still faces formidable challenges including protein 3D structure data requisition for model training, auto-regressive sampling, and model generalization for unseen targets. Here, we proposed DiffDTM, a novel conditional structure-free…
▽ More
Advances in deep generative models shed light on de novo molecule generation with desired properties. However, molecule generation targeted for dual protein targets still faces formidable challenges including protein 3D structure data requisition for model training, auto-regressive sampling, and model generalization for unseen targets. Here, we proposed DiffDTM, a novel conditional structure-free deep generative model based on a diffusion model for dual targets based molecule generation to address the above issues. Specifically, DiffDTM receives protein sequences and molecular graphs as inputs instead of protein and molecular conformations and incorporates an information fusion module to achieve conditional generation in a one-shot manner. We have conducted comprehensive multi-view experiments to demonstrate that DiffDTM can generate drug-like, synthesis-accessible, novel, and high-binding affinity molecules targeting specific dual proteins, outperforming the state-of-the-art (SOTA) models in terms of multiple evaluation metrics. Furthermore, we utilized DiffDTM to generate molecules towards dopamine receptor D2 and 5-hydroxytryptamine receptor 1A as new antipsychotics. The experimental results indicate that DiffDTM can be easily plugged into unseen dual targets to generate bioactive molecules, addressing the issues of requiring insufficient active molecule data for training as well as the need to retrain when encountering new targets.
△ Less
Submitted 24 June, 2023;
originally announced June 2023.
-
Deep recurrent spiking neural networks capture both static and dynamic representations of the visual cortex under movie stimuli
Authors:
Liwei Huang,
ZhengYu Ma,
Huihui Zhou,
Yonghong Tian
Abstract:
In the real world, visual stimuli received by the biological visual system are predominantly dynamic rather than static. A better understanding of how the visual cortex represents movie stimuli could provide deeper insight into the information processing mechanisms of the visual system. Although some progress has been made in modeling neural responses to natural movies with deep neural networks, t…
▽ More
In the real world, visual stimuli received by the biological visual system are predominantly dynamic rather than static. A better understanding of how the visual cortex represents movie stimuli could provide deeper insight into the information processing mechanisms of the visual system. Although some progress has been made in modeling neural responses to natural movies with deep neural networks, the visual representations of static and dynamic information under such time-series visual stimuli remain to be further explored. In this work, considering abundant recurrent connections in the mouse visual system, we design a recurrent module based on the hierarchy of the mouse cortex and add it into Deep Spiking Neural Networks, which have been demonstrated to be a more compelling computational model for the visual cortex. Using Time-Series Representational Similarity Analysis, we measure the representational similarity between networks and mouse cortical regions under natural movie stimuli. Subsequently, we conduct a comparison of the representational similarity across recurrent/feedforward networks and image/video training tasks. Trained on the video action recognition task, recurrent SNN achieves the highest representational similarity and significantly outperforms feedforward SNN trained on the same task by 15% and the recurrent SNN trained on the image classification task by 8%. We investigate how static and dynamic representations of SNNs influence the similarity, as a way to explain the importance of these two forms of representations in biological neural coding. Taken together, our work is the first to apply deep recurrent SNNs to model the mouse visual cortex under movie stimuli and we establish that these networks are competent to capture both static and dynamic representations and make contributions to understanding the movie information processing mechanisms of the visual cortex.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Deep Spiking Neural Networks with High Representation Similarity Model Visual Pathways of Macaque and Mouse
Authors:
Liwei Huang,
Zhengyu Ma,
Liutao Yu,
Huihui Zhou,
Yonghong Tian
Abstract:
Deep artificial neural networks (ANNs) play a major role in modeling the visual pathways of primate and rodent. However, they highly simplify the computational properties of neurons compared to their biological counterparts. Instead, Spiking Neural Networks (SNNs) are more biologically plausible models since spiking neurons encode information with time sequences of spikes, just like biological neu…
▽ More
Deep artificial neural networks (ANNs) play a major role in modeling the visual pathways of primate and rodent. However, they highly simplify the computational properties of neurons compared to their biological counterparts. Instead, Spiking Neural Networks (SNNs) are more biologically plausible models since spiking neurons encode information with time sequences of spikes, just like biological neurons do. However, there is a lack of studies on visual pathways with deep SNNs models. In this study, we model the visual cortex with deep SNNs for the first time, and also with a wide range of state-of-the-art deep CNNs and ViTs for comparison. Using three similarity metrics, we conduct neural representation similarity experiments on three neural datasets collected from two species under three types of stimuli. Based on extensive similarity analyses, we further investigate the functional hierarchy and mechanisms across species. Almost all similarity scores of SNNs are higher than their counterparts of CNNs with an average of 6.6%. Depths of the layers with the highest similarity scores exhibit little differences across mouse cortical regions, but vary significantly across macaque regions, suggesting that the visual processing structure of mice is more regionally homogeneous than that of macaques. Besides, the multi-branch structures observed in some top mouse brain-like neural networks provide computational evidence of parallel processing streams in mice, and the different performance in fitting macaque neural representations under different stimuli exhibits the functional specialization of information processing in macaques. Taken together, our study demonstrates that SNNs could serve as promising candidates to better model and explain the functional hierarchy and mechanisms of the visual system.
△ Less
Submitted 22 May, 2023; v1 submitted 9 March, 2023;
originally announced March 2023.
-
LinearCoFold and LinearCoPartition: Linear-Time Algorithms for Secondary Structure Prediction of Interacting RNA molecules
Authors:
He Zhang,
Sizhen Li,
Liang Zhang,
David H. Mathews,
Liang Huang
Abstract:
Many ncRNAs function through RNA-RNA interactions. Fast and reliable RNA structure prediction with consideration of RNA-RNA interaction is useful. Some existing tools are less accurate due to omitting the competing of intermolecular and intramolecular base pairs, or focus more on predicting the binding region rather than predicting the complete secondary structure of two interacting strands. Vienn…
▽ More
Many ncRNAs function through RNA-RNA interactions. Fast and reliable RNA structure prediction with consideration of RNA-RNA interaction is useful. Some existing tools are less accurate due to omitting the competing of intermolecular and intramolecular base pairs, or focus more on predicting the binding region rather than predicting the complete secondary structure of two interacting strands. Vienna RNAcofold, which reduces the problem into the classical single sequence folding by concatenating two strands, scales in cubic time against the combined sequence length, and is slow for long sequences. To address these issues, we present LinearCoFold, which predicts the complete minimum free energy structure of two strands in linear runtime, and LinearCoPartition, which calculates the cofolding partition function and base pairing probabilities in linear runtime. LinearCoFold and LinearCoPartition follows the concatenation strategy of RNAcofold, but are orders of magnitude faster than RNAcofold. For example, on a sequence pair with combined length of 26,190 nt, LinearCoFold is 86.8x faster than RNAcofold MFE mode (0.6 minutes vs. 52.1 minutes), and LinearCoPartition is 642.3x faster than RNAcofold partition function mode (1.8 minutes vs. 1156.2 minutes). Different from the local algorithms, LinearCoFold and LinearCoPartition are global cofolding algorithms without restriction on base pair length. Surprisingly, LinearCoFold and LinearCoPartition's predictions have higher PPV and sensitivity of intermolecular base pairs. Furthermore, we apply LinearCoFold to predict the RNA-RNA interaction between SARS-CoV-2 gRNA and human U4 snRNA, which has been experimentally studied, and observe that LinearCoFold's prediction correlates better to the wet lab results.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
Can Pre-trained Models Really Learn Better Molecular Representations for AI-aided Drug Discovery?
Authors:
Ziqiao Zhang,
Yatao Bian,
Ailin Xie,
Pengju Han,
Long-Kai Huang,
Shuigeng Zhou
Abstract:
Self-supervised pre-training is gaining increasingly more popularity in AI-aided drug discovery, leading to more and more pre-trained models with the promise that they can extract better feature representations for molecules. Yet, the quality of learned representations have not been fully explored. In this work, inspired by the two phenomena of Activity Cliffs (ACs) and Scaffold Hop** (SH) in tr…
▽ More
Self-supervised pre-training is gaining increasingly more popularity in AI-aided drug discovery, leading to more and more pre-trained models with the promise that they can extract better feature representations for molecules. Yet, the quality of learned representations have not been fully explored. In this work, inspired by the two phenomena of Activity Cliffs (ACs) and Scaffold Hop** (SH) in traditional Quantitative Structure-Activity Relationship (QSAR) analysis, we propose a method named Representation-Property Relationship Analysis (RePRA) to evaluate the quality of the representations extracted by the pre-trained model and visualize the relationship between the representations and properties. The concepts of ACs and SH are generalized from the structure-activity context to the representation-property context, and the underlying principles of RePRA are analyzed theoretically. Two scores are designed to measure the generalized ACs and SH detected by RePRA, and therefore the quality of representations can be evaluated. In experiments, representations of molecules from 10 target tasks generated by 7 pre-trained models are analyzed. The results indicate that the state-of-the-art pre-trained models can overcome some shortcomings of canonical Extended-Connectivity FingerPrints (ECFP), while the correlation between the basis of the representation space and specific molecular substructures are not explicit. Thus, some representations could be even worse than the canonical fingerprints. Our method enables researchers to evaluate the quality of molecular representations generated by their proposed self-supervised pre-trained models. And our findings can guide the community to develop better pre-training techniques to regularize the occurrence of ACs and SH.
△ Less
Submitted 21 August, 2022;
originally announced September 2022.
-
MDM: Molecular Diffusion Model for 3D Molecule Generation
Authors:
Lei Huang,
Hengtong Zhang,
Tingyang Xu,
Ka-Chun Wong
Abstract:
Molecule generation, especially generating 3D molecular geometries from scratch (i.e., 3D \textit{de novo} generation), has become a fundamental task in drug designs. Existing diffusion-based 3D molecule generation methods could suffer from unsatisfactory performances, especially when generating large molecules. At the same time, the generated molecules lack enough diversity. This paper proposes a…
▽ More
Molecule generation, especially generating 3D molecular geometries from scratch (i.e., 3D \textit{de novo} generation), has become a fundamental task in drug designs. Existing diffusion-based 3D molecule generation methods could suffer from unsatisfactory performances, especially when generating large molecules. At the same time, the generated molecules lack enough diversity. This paper proposes a novel diffusion model to address those two challenges.
First, interatomic relations are not in molecules' 3D point cloud representations. Thus, it is difficult for existing generative models to capture the potential interatomic forces and abundant local constraints. To tackle this challenge, we propose to augment the potential interatomic forces and further involve dual equivariant encoders to encode interatomic forces of different strengths. Second, existing diffusion-based models essentially shift elements in geometry along the gradient of data density. Such a process lacks enough exploration in the intermediate steps of the Langevin dynamics. To address this issue, we introduce a distributional controlling variable in each diffusion/reverse step to enforce thorough explorations and further improve generation diversity.
Extensive experiments on multiple benchmarks demonstrate that the proposed model significantly outperforms existing methods for both unconditional and conditional generation tasks. We also conduct case studies to help understand the physicochemical properties of the generated molecules.
△ Less
Submitted 12 September, 2022;
originally announced September 2022.
-
LinearAlifold: Linear-Time Consensus Structure Prediction for RNA Alignments
Authors:
Liang Zhang,
Sizhen Li,
He Zhang,
David H. Mathews,
Liang Huang
Abstract:
Predicting the consensus structure of a set of aligned RNA homologs is a convenient method to find conserved structures in an RNA genome, which has applications in SARS-CoV-2 diagnostics and therapeutics. However, the state-of-the-art algorithm for this task, RNAalifold, is prohibitively slow for long sequences, due to a cubic scaling with the sequence length, and even slower when analyzing many s…
▽ More
Predicting the consensus structure of a set of aligned RNA homologs is a convenient method to find conserved structures in an RNA genome, which has applications in SARS-CoV-2 diagnostics and therapeutics. However, the state-of-the-art algorithm for this task, RNAalifold, is prohibitively slow for long sequences, due to a cubic scaling with the sequence length, and even slower when analyzing many such sequences, due to a superlinear scaling with the number of homologs, taking 4 days on 200 SARS-CoV variants. We present LinearAlifold, an efficient algorithm for folding aligned RNA homologs that scales linearly with both the sequence length and the number of sequences, based on our recent work LinearFold that folds a single RNA in linear time. Our work is orders of magnitude faster than RNAalifold (e.g., 0.5 hours on the above 200 sequences or 316 times speedup) and achieves comparable accuracies compared to a database of known structures. More interestingly, LinearAlifold's prediction on SARS-CoV-2 correlates well with experimentally determined structures, outperforming RNAalifold. Finally, LinearAlifold supports three modes: minimum free energy (MFE), partition function, and stochastic sampling, each of which takes under an hour for hundreds of SARS-CoV variants, while only the MFE mode of RNAalifold works for them, taking days or weeks.
△ Less
Submitted 29 June, 2022;
originally announced June 2022.
-
Prediction and Control of Focal Seizure Spread: Random Walk with Restart on Heterogeneous Brain Networks
Authors:
Chen Wang,
Sida Chen,
Liang Huang,
Lianchun Yu
Abstract:
Whole-brain models offer a promising method of predicting seizure spread, which is critical for successful surgery treatment of focal epilepsy. Existing methods are largely based on structural connectome, which ignores the effects of heterogeneity in regional excitability of brains. In this study, we used a whole-brain model to show that heterogeneity in nodal excitability had a significant impact…
▽ More
Whole-brain models offer a promising method of predicting seizure spread, which is critical for successful surgery treatment of focal epilepsy. Existing methods are largely based on structural connectome, which ignores the effects of heterogeneity in regional excitability of brains. In this study, we used a whole-brain model to show that heterogeneity in nodal excitability had a significant impact on seizure propagation in the networks, and compromised the prediction accuracy with structural connections. We then addressed this problem with an algorithm based on random walk with restart on graphs. We demonstrated that by establishing a relationship between the restarting probability and the excitability for each node, this algorithm could significantly improve the seizure spread prediction accuracy in heterogeneous networks, and was more robust against the extent of heterogeneity. We also strategized surgical seizure control as a process to identify and remove the key nodes (connections) responsible for the early spread of seizures from the focal region. Compared to strategies based on structural connections, virtual surgery with a strategy based on mRWER generated outcomes with a high success rate while maintaining low damage to the brain by removing fewer anatomical connections. These findings may have potential applications in develo** personalized surgery strategies for epilepsy.
△ Less
Submitted 14 April, 2022;
originally announced April 2022.
-
DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations
Authors:
Yuanfeng Ji,
Lu Zhang,
Jiaxiang Wu,
Bingzhe Wu,
Long-Kai Huang,
Tingyang Xu,
Yu Rong,
Lanqing Li,
Jie Ren,
Ding Xue,
Houtim Lai,
Shaoyong Xu,
**g Feng,
Wei Liu,
** Luo,
Shuigeng Zhou,
Junzhou Huang,
Peilin Zhao,
Yatao Bian
Abstract:
AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise},…
▽ More
AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise}, which is inevitable in real world AIDD applications.
In this work, we present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery, which comes with an open-source Python package that fully automates the data curation and OOD benchmarking processes. We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction, which involves both macromolecule (protein target) and small-molecule (drug compound). In contrast to only providing fixed datasets, DrugOOD offers automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise annotations and rigorous benchmarking of state-of-the-art OOD algorithms. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for \emph{graph OOD learning} problems. Extensive empirical studies have shown a significant performance gap between in-distribution and out-of-distribution experiments, which highlights the need to develop better schemes that can allow for OOD generalization under noise for AIDD.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
Rapid and Accurate Detection of SARS-CoV-2 Mutations using a Cas12a-based Sensing Platform
Authors:
C He,
C Lin,
G Mo,
B Xi,
A Li,
D Huang,
Y Wan,
F Chen,
Y Liang,
Q Zuo,
W Xu,
D Feng,
G Zhang,
L Han,
C Ke,
H Du,
L Huang
Abstract:
The increasing prevalence of SARS-CoV-2 variants with spike mutations has raised concerns owing to higher transmission rates, disease severity, and escape from neutralizing antibodies. Rapid and accurate detection of SARS-CoV-2 variants provides crucial information concerning the outbreaks of SARS-CoV-2 variants and possible lines of transmission. This information is vital for infection prevention…
▽ More
The increasing prevalence of SARS-CoV-2 variants with spike mutations has raised concerns owing to higher transmission rates, disease severity, and escape from neutralizing antibodies. Rapid and accurate detection of SARS-CoV-2 variants provides crucial information concerning the outbreaks of SARS-CoV-2 variants and possible lines of transmission. This information is vital for infection prevention and control. We used a Cas12a-based RT-PCR combined with CRISPR on-site rapid detection system (RT-CORDS) platform to detect the key mutations in SARS-COV-2 variants, such as 69/70 deletion, N501Y, and D614G. We used type-specific CRISPR RNAs (crRNAs) to identify wild-type (crRNA-W) and mutant (crRNA-M) sequences of SARS-CoV-2. We successfully differentiated mutant variants from wild-type SARS-CoV-2 with a sensitivity of $10^{-17}$ M (approximately 6 copies/$μ$L). The assay took just 10 min with the Cas12a/crRNA reaction after a simple RT-PCR using a fluorescence reporting system. In addition, a sensitivity of $10^{-16}$ M could be achieved when lateral flow strips were used as readouts. The accuracy of RT-CORDS for SARS-CoV-2 variant detection was 100% consistent with the sequencing data. In conclusion, using the RT-CORDS platform, we accurately, sensitively, specifically, and rapidly detected SARS-CoV-2 variants. This method may be used in clinical diagnosis.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Quantum Crystallography N-Representability
Authors:
Cherif F. Matta,
Lulu Huang,
Lou Massa
Abstract:
Linus Pauling contributions span structural biology, chemistry in its broadest definition, quantum mechanical theory, valence bond theory, and even nuclear physics. A principal tool developed and used by Pauling is Xray, and electron, diffraction. One possible extension of the Pauling oeuvre could be the marriage of crystallography and quantum mechanics. Such an effort dates back to the sixties an…
▽ More
Linus Pauling contributions span structural biology, chemistry in its broadest definition, quantum mechanical theory, valence bond theory, and even nuclear physics. A principal tool developed and used by Pauling is Xray, and electron, diffraction. One possible extension of the Pauling oeuvre could be the marriage of crystallography and quantum mechanics. Such an effort dates back to the sixties and has now flourished into an entire subfield termed quantum crystallography. Quantum crystallography could be achieved through the application of Clinton equations to yield N-representable density matrices consistent with experimental data. The implementation of the Clinton equations is qualitatively different for small and for large systems. For a small system, quantum mechanics is extracted from Xray data while for a large system, the quantum mechanics is injected into the system. In both cases, Nrepresentability is imposed by the use of the Clinton equations.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
Structure-aware Interactive Graph Neural Networks for the Prediction of Protein-Ligand Binding Affinity
Authors:
Shuangli Li,
**gbo Zhou,
Tong Xu,
Liang Huang,
Fan Wang,
Haoyi Xiong,
Weili Huang,
De**g Dou,
Hui Xiong
Abstract:
Drug discovery often relies on the successful prediction of protein-ligand binding affinity. Recent advances have shown great promise in applying graph neural networks (GNNs) for better affinity prediction by learning the representations of protein-ligand complexes. However, existing solutions usually treat protein-ligand complexes as topological graph data, thus the biomolecular structural inform…
▽ More
Drug discovery often relies on the successful prediction of protein-ligand binding affinity. Recent advances have shown great promise in applying graph neural networks (GNNs) for better affinity prediction by learning the representations of protein-ligand complexes. However, existing solutions usually treat protein-ligand complexes as topological graph data, thus the biomolecular structural information is not fully utilized. The essential long-range interactions among atoms are also neglected in GNN models. To this end, we propose a structure-aware interactive graph neural network (SIGN) which consists of two components: polar-inspired graph attention layers (PGAL) and pairwise interactive pooling (PiPool). Specifically, PGAL iteratively performs the node-edge aggregation process to update embeddings of nodes and edges while preserving the distance and angle information among atoms. Then, PiPool is adopted to gather interactive edges with a subsequent reconstruction loss to reflect the global interactions. Exhaustive experimental study on two benchmarks verifies the superiority of SIGN.
△ Less
Submitted 20 July, 2021;
originally announced July 2021.
-
Distance-aware Molecule Graph Attention Network for Drug-Target Binding Affinity Prediction
Authors:
**gbo Zhou,
Shuangli Li,
Liang Huang,
Haoyi Xiong,
Fan Wang,
Tong Xu,
Hui Xiong,
De**g Dou
Abstract:
Accurately predicting the binding affinity between drugs and proteins is an essential step for computational drug discovery. Since graph neural networks (GNNs) have demonstrated remarkable success in various graph-related tasks, GNNs have been considered as a promising tool to improve the binding affinity prediction in recent years. However, most of the existing GNN architectures can only encode t…
▽ More
Accurately predicting the binding affinity between drugs and proteins is an essential step for computational drug discovery. Since graph neural networks (GNNs) have demonstrated remarkable success in various graph-related tasks, GNNs have been considered as a promising tool to improve the binding affinity prediction in recent years. However, most of the existing GNN architectures can only encode the topological graph structure of drugs and proteins without considering the relative spatial information among their atoms. Whereas, different from other graph datasets such as social networks and commonsense knowledge graphs, the relative spatial position and chemical bonds among atoms have significant impacts on the binding affinity. To this end, in this paper, we propose a diStance-aware Molecule graph Attention Network (S-MAN) tailored to drug-target binding affinity prediction. As a dedicated solution, we first propose a position encoding mechanism to integrate the topological structure and spatial position information into the constructed pocket-ligand graph. Moreover, we propose a novel edge-node hierarchical attentive aggregation structure which has edge-level aggregation and node-level aggregation. The hierarchical attentive aggregation can capture spatial dependencies among atoms, as well as fuse the position-enhanced information with the capability of discriminating multiple spatial relations among atoms. Finally, we conduct extensive experiments on two standard datasets to demonstrate the effectiveness of S-MAN.
△ Less
Submitted 17 December, 2020;
originally announced December 2020.
-
Healthcare Utilization and Perceived Health Status from Falun Gong Practitioners in Taiwan: A Pilot SF-36 Survey
Authors:
Yu-Whuei Hu,
Li-Shan Huang,
Eric J. Yeh,
Mai He
Abstract:
Objective: Falun Gong (FLG) is a practice of mind and body focusing on moral character improvement along with meditative exercises. This 2002 pilot study explored perceived health status, medical resource utilization and related factors among Taiwanese FLG practitioners, compared to the general Taiwanese norm estimated by the 2001 National Health Interview Survey (NHIS). Methods: This cross-sectio…
▽ More
Objective: Falun Gong (FLG) is a practice of mind and body focusing on moral character improvement along with meditative exercises. This 2002 pilot study explored perceived health status, medical resource utilization and related factors among Taiwanese FLG practitioners, compared to the general Taiwanese norm estimated by the 2001 National Health Interview Survey (NHIS). Methods: This cross-sectional, observational study was based on a voluntary, paper-based survey conducted from October 2002 to February 2003 using the same Taiwanese SF-36 instrument employed by the NHIS. Primary outcomes included eight SF-36 domain scores and the number of medical visits. One-sample t-tests, one-way ANOVA and multivariate linear regression analyses were performed. Results: The response rate was 75.6% (1,210/1,600). Compared to the norm, the study cohort had significantly higher scores in six of eight SF-36 domains across gender and age (p<0.05). Among those with chronic diseases, 70% to 89% reported their conditions either improved or cured. 74.2%, 79.2%, 83.3%, and 85.6% quitted alcohol drinking, smoking, chewing betel nuts, and gambling. 62.7% reported a reduced number of medical visits (mean=13.53 before; mean=5.87 after). Conclusions: In this subject cohort, practicing FLG led to higher perceived health scores and reduced health resource utilization compared to the norm.
△ Less
Submitted 29 July, 2020;
originally announced July 2020.
-
Towards Increased Reliability, Transparency and Accessibility in Crosslinking Mass Spectrometry
Authors:
Alexander Leitner,
Alexandre M. J. J. Bonvin,
Christoph H. Borchers,
Robert J. Chalkley,
Julia Chamot-Rooke,
Colin W. Combe,
Jürgen Cox,
Meng-Qiu Dong,
Lutz Fischer,
Michael Götze,
Fabio C. Gozzo,
Albert J. R. Heck,
Michael R. Hoopmann,
Lan Huang,
Yasushi Ishihama,
Andrew R. Jones,
Nir Kalisman,
Oliver Kohlbacher,
Karl Mechtler,
Robert L. Moritz,
Eugen Netz,
Petr Novak,
Evgeniy Petrotchenko,
Andrej Sali,
Richard A. Scheltema
, et al. (11 additional authors not shown)
Abstract:
Crosslinking mass spectrometry (Crosslinking MS) has substantially matured as a method over the last two decades through parallel development in multiple labs, demonstrating its applicability for protein structure determination, conformation analysis and map** protein interactions in complex mixtures. Crosslinking MS has become a much-appreciated and routinely applied tool especially in structur…
▽ More
Crosslinking mass spectrometry (Crosslinking MS) has substantially matured as a method over the last two decades through parallel development in multiple labs, demonstrating its applicability for protein structure determination, conformation analysis and map** protein interactions in complex mixtures. Crosslinking MS has become a much-appreciated and routinely applied tool especially in structural biology. Therefore, it is timely that the community commits to the development of methodological and reporting standards. This white paper builds on an open process comprising a number of events at community conferences since 2015 and identifies aspects of Crosslinking MS for which guidelines should be developed as part of a Crosslinking MS standards initiative.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.
-
Algorithm for Optimized mRNA Design Improves Stability and Immunogenicity
Authors:
He Zhang,
Liang Zhang,
Ang Lin,
Congcong Xu,
Ziyu Li,
Kaibo Liu,
Boxiang Liu,
Xiaopin Ma,
Fanfan Zhao,
Weiguo Yao,
Hangwen Li,
David H. Mathews,
Yujian Zhang,
Liang Huang
Abstract:
Messenger RNA (mRNA) vaccines are being used for COVID-19, but still suffer from the critical issue of mRNA instability and degradation, which is a major obstacle in the storage, distribution, and efficacy of the vaccine. Previous work showed that optimizing secondary structure stability lengthens mRNA half-life, which, together with optimal codons, increases protein expression. Therefore, a princ…
▽ More
Messenger RNA (mRNA) vaccines are being used for COVID-19, but still suffer from the critical issue of mRNA instability and degradation, which is a major obstacle in the storage, distribution, and efficacy of the vaccine. Previous work showed that optimizing secondary structure stability lengthens mRNA half-life, which, together with optimal codons, increases protein expression. Therefore, a principled mRNA design algorithm must optimize both structural stability and codon usage to improve mRNA efficiency. However, due to synonymous codons, the mRNA design space is prohibitively large, e.g., there are $\sim\!10^{632}$ mRNAs for the SARS-CoV-2 Spike protein, which poses insurmountable challenges to previous methods. Here we provide a surprisingly simple solution to this hard problem by reducing it to a classical problem in computational linguistics, where finding the optimal mRNA is akin to finding the most likely sentence among similar sounding alternatives. Our algorithm, named LinearDesign, takes only 11 minutes for the Spike protein, and can jointly optimize stability and codon usage. Experimentally, without chemical modification, our designs substantially improve mRNA half-life and protein expression in vitro, and dramatically increase antibody response by up to 23$\times$ in vivo, compared to the codon-optimized benchmark. Our work enables the exploration of highly stable and efficient designs that are previously unreachable and is a timely tool not only for vaccines but also for mRNA medicine encoding all therapeutic proteins (e.g., monoclonal antibodies and anti-cancer drugs).
△ Less
Submitted 17 March, 2022; v1 submitted 21 April, 2020;
originally announced April 2020.
-
The architecture of co-culture spheroids regulates tumor invasion within a 3D extracellular matrix
Authors:
Yu Ling Huang,
Carina Shiau,
Cindy Wu,
Jeffrey E. Segall,
Mingming Wu
Abstract:
Tumor invasion, the process by which tumor cells break away from their primary tumor and gain access to vascular systems, is an important step in cancer metastasis. Most current 3D tumor invasion assays consisted of single tumor cells embedded within an extracellular matrix (ECM). These assays taught us much of what we know today on how key biophysical (e.g. ECM stiffness) and biochemical (e.g. cy…
▽ More
Tumor invasion, the process by which tumor cells break away from their primary tumor and gain access to vascular systems, is an important step in cancer metastasis. Most current 3D tumor invasion assays consisted of single tumor cells embedded within an extracellular matrix (ECM). These assays taught us much of what we know today on how key biophysical (e.g. ECM stiffness) and biochemical (e.g. cytokine gradients) parameters within the tumor microenvironment guided and regulated tumor invasion. One limitation of the single tumor cell invasion assay was that it did not account for cell-cell adhesion within the tumor. In this article, we developed a micrometer scale 3D co-culture spheroid invasion assay that was compatible with microscopic imaging. Micrometer scale co-culture spheroids (1:1 ratio of metastatic breast cancer MDA-MB-231 and non-tumorigenic epithelial MCF-10A cells) were made using an array of microwells, and then were embedded within a collagen matrix in a microfluidic platform. Real time imaging of tumor spheroid invasion revealed that the spatial distribution of the two cell types within the tumor spheroid critically regulated tumor invasion. This work linked tumor architecture with tumor invasion and highlighted the importance of the biophysical cues within the bulk of the tumor in tumor invasion.
△ Less
Submitted 9 February, 2020;
originally announced February 2020.
-
LinearFold: linear-time approximate RNA folding by 5'-to-3' dynamic programming and beam search
Authors:
Liang Huang,
He Zhang,
Dezhong Deng,
Kai Zhao,
Kaibo Liu,
David A. Hendrix,
David H. Mathews
Abstract:
Motivation: Predicting the secondary structure of an RNA sequence is useful in many applications. Existing algorithms (based on dynamic programming) suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications.
Results: We present a novel alternative $O(n^3)$-time dynamic programming algorithm for RNA folding t…
▽ More
Motivation: Predicting the secondary structure of an RNA sequence is useful in many applications. Existing algorithms (based on dynamic programming) suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications.
Results: We present a novel alternative $O(n^3)$-time dynamic programming algorithm for RNA folding that is amenable to heuristics that make it run in $O(n)$ time and $O(n)$ space, while producing a high-quality approximation to the optimal solution. Inspired by incremental parsing for context-free grammars in computational linguistics, our alternative dynamic programming algorithm scans the sequence in a left-to-right (5'-to-3') direction rather than in a bottom-up fashion, which allows us to employ the effective beam pruning heuristic. Our work, though inexact, is the first RNA folding algorithm to achieve linear runtime (and linear space) without imposing constraints on the output structure. Surprisingly, our approximate search results in even higher overall accuracy on a diverse database of sequences with known structures. More interestingly, it leads to significantly more accurate predictions on the longest sequence families in that database (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500+ nucleotides apart), both of which are well known to be challenging for the current models.
Availability: Our source code is available at https://github.com/LinearFold/LinearFold, and our webserver is at http://linearfold.org (sequence limit: 100,000nt).
△ Less
Submitted 21 December, 2019;
originally announced January 2020.
-
LinearPartition: Linear-Time Approximation of RNA Folding Partition Function and Base Pairing Probabilities
Authors:
He Zhang,
Liang Zhang,
David H. Mathews,
Liang Huang
Abstract:
RNA secondary structure prediction is widely used to understand RNA function. Recently, there has been a shift away from the classical minimum free energy (MFE) methods to partition function-based methods that account for folding ensembles and can therefore estimate structure and base pair probabilities. However, the classical partition function algorithm scales cubically with sequence length, and…
▽ More
RNA secondary structure prediction is widely used to understand RNA function. Recently, there has been a shift away from the classical minimum free energy (MFE) methods to partition function-based methods that account for folding ensembles and can therefore estimate structure and base pair probabilities. However, the classical partition function algorithm scales cubically with sequence length, and is therefore a slow calculation for long sequences. This slowness is even more severe than cubic-time MFE-based methods due to a larger constant factor in runtime. Inspired by the success of our recently proposed LinearFold algorithm that predicts the approximate MFE structure in linear time, we design a similar linear-time heuristic algorithm, LinearPartition, to approximate the partition function and base pairing probabilities, which is shown to be orders of magnitude faster than Vienna RNAfold and CONTRAfold (e.g., 2.5 days vs. 1.3 minutes on a sequence with length 32,753 nt). More interestingly, the resulting base pairing probabilities are even better correlated with the ground truth structures. LinearPartition also leads to a small accuracy improvement when used for downstream structure prediction on families with the longest length sequences (16S and 23S rRNA), as well as a substantial improvement on long-distance base pairs (500+ nt apart).
△ Less
Submitted 1 February, 2020; v1 submitted 31 December, 2019;
originally announced December 2019.
-
ThreshKnot: Thresholded ProbKnot for Improved RNA Secondary Structure Prediction
Authors:
Liang Zhang,
He Zhang,
David H. Mathews,
Liang Huang
Abstract:
RNA structure prediction is a challenging problem, especially with pseudoknots. Recently, there has been a shift from the classical minimum free energy-based methods (MFE) to partition function-based ones that assemble structures using base-pairing probabilities. Two examples of the latter group are the popular maximum expected accuracy (MEA) method and the ProbKnot method. ProbKnot is a fast heur…
▽ More
RNA structure prediction is a challenging problem, especially with pseudoknots. Recently, there has been a shift from the classical minimum free energy-based methods (MFE) to partition function-based ones that assemble structures using base-pairing probabilities. Two examples of the latter group are the popular maximum expected accuracy (MEA) method and the ProbKnot method. ProbKnot is a fast heuristic that pairs nucleotides that are reciprocally most probable pairing partners, and unlike MEA, can also predict structures with pseudoknots. However, ProbKnot's full potential has been largely overlooked. In particular, when introduced, it did not have an MEA-like hyperparameter that can balance between positive predictive value (PPV) and sensitivity. We show that a simple thresholded version of ProbKnot, which we call ThreshKnot, leads to more accurate overall predictions by filtering out unlikely pairs whose probabilities fall under a given threshold. We also show that on three widely-used folding engines (RNAstructure, Vienna RNAfold, and CONTRAfold), ThreshKnot always outperforms the much more involved MEA algorithm in (1) its higher structure prediction accuracy, (2) its capability to predict pseudoknots, and (3) its faster runtime and easier implementation. This suggests that ThreshKnot should replace MEA as the default partition function-based structure prediction algorithm. ThreshKnot is already available in the widely used RNAstructure software package version 6.2 (released November 27, 2019): https://rna.urmc.rochester.edu/RNAstructure.html
△ Less
Submitted 8 January, 2020; v1 submitted 29 December, 2019;
originally announced December 2019.
-
Population Pharmacokinetic Study of Tacrolimus in Pediatric Patients with Primary Nephrotic Syndrome: A Comparison of Linear and Nonlinear Michaelis Menten Pharmacokinetic Model
Authors:
Lingfei Huang,
Yixi Liu,
Zheng Jiao,
Junyan Wang,
Luo Fang,
Jianhua Mao
Abstract:
Background Little is known about the population pharmacokinetics (PPK) of tacrolimus (TAC) in pediatric primary nephrotic syndrome (PNS). This study aimed to compare the predictive performance between nonlinear and linear PK models and investigate the significant factors of TAC PK characteristics in pediatric PNS. Methods Data were obtained from 71 pediatric patients with PNS, along with 525 TAC t…
▽ More
Background Little is known about the population pharmacokinetics (PPK) of tacrolimus (TAC) in pediatric primary nephrotic syndrome (PNS). This study aimed to compare the predictive performance between nonlinear and linear PK models and investigate the significant factors of TAC PK characteristics in pediatric PNS. Methods Data were obtained from 71 pediatric patients with PNS, along with 525 TAC trough concentrations at steady state. The demographic, medical, and treatment details were collected. Genetic polymorphisms were analyzed. The PPK models were developed using nonlinear mixed effects model software. Two modeling strategies, linear compartmental and nonlinear Michaelis Menten (MM) models, were evaluated and compared. Results Body weight, age, daily dose of TAC, co-therapy drugs (including azole antifungal agents and diltiazem), and CYP3A5*3 genotype were important factors in the final linear model (onecompartment model), whereas only body weight, codrugs, and CYP3A5*3 genotype were the important factors in the nonlinear MM model. Apparent clearance and volume of distribution in the final linear model were 7.13 L/h and 142 L, respectively. The maximal dose rate (Vmax) of the nonlinear MM model was 1.92 mg/day and the average concentration at steady state at half-Vmax (Km) was 1.98 ng/mL. The nonlinear model described the data better than the linear model. Dosing regimens were proposed based on the nonlinear PK model.Conclusion Our findings demonstrate that the nonlinear MM model showed better predictive performance than the linear compartmental model, providing reliable support for optimizing TAC dosing and adjustment in children with PNS.
△ Less
Submitted 26 February, 2020; v1 submitted 3 December, 2019;
originally announced December 2019.
-
A network approach to quantifying radiotherapy effect on cancer: Radiosensitive gene group centrality
Authors:
Yu-Xiang Yao,
Zhi-Tong Bing,
Liang Huang,
Zi-Gang Huang,
Ying-Cheng Lai
Abstract:
Radiotherapy plays a vital role in cancer treatment, for which accurate prognosis is important for guiding sequential treatment and improving the curative effect for patients. An issue of great significance in radiotherapy is to assess tumor radiosensitivity for devising the optimal treatment strategy. Previous studies focused on gene expression in cells closely associated with radiosensitivity, b…
▽ More
Radiotherapy plays a vital role in cancer treatment, for which accurate prognosis is important for guiding sequential treatment and improving the curative effect for patients. An issue of great significance in radiotherapy is to assess tumor radiosensitivity for devising the optimal treatment strategy. Previous studies focused on gene expression in cells closely associated with radiosensitivity, but factors such as the response of a cancer patient to irradiation and the patient survival time are largely ignored. For clinical cancer treatment, a specific pre-treatment indicator taking into account cancer cell type and patient radiosensitivity is of great value but it has been missing. Here, we propose an effective indicator for radiosensitivity: radiosensitive gene group centrality (RSGGC), which characterizes the importance of the group of genes that are radiosensitive in the whole gene correlation network. We demonstrate, using both clinical patient data and experimental cancer cell lines, which RSGGC can provide a quantitative estimate of the effect of radiotherapy, with factors such as the patient survival time and the survived fraction of cancer cell lines under radiotherapy fully taken into account. Our main finding is that, for patients with a higher RSGGC score before radiotherapy, cancer treatment tends to be more effective. The RSGGC can have significant applications in clinical prognosis, serving as a key measure to classifying radiosensitive and radioresistant patients.
△ Less
Submitted 6 December, 2018;
originally announced December 2018.
-
Detecting causality in Plant electrical signal by a hybrid causal analysis approach
Authors:
Yang Chen,
Dong-Jie Zhao,
Chao Song,
Wei-He Liu,
Zi-Yang Wang,
Zhong-Yi Wang,
Guiliang Tang,
Lan Huang
Abstract:
At present, multi-electrode array (MEA) approach and optical recording allow us to acquire plant electrical activity with higher spatio-temporal resolution. To understand the dynamic information flow of the electrical signaling system and estimate the effective connectivity, we proposed a solution to combine the two casualty analysis approaches, i.e. Granger causality and transfer entropy, which t…
▽ More
At present, multi-electrode array (MEA) approach and optical recording allow us to acquire plant electrical activity with higher spatio-temporal resolution. To understand the dynamic information flow of the electrical signaling system and estimate the effective connectivity, we proposed a solution to combine the two casualty analysis approaches, i.e. Granger causality and transfer entropy, which they complement each other to measure dynamics effective connectivity of the complex system. Our findings in three qualitatively different levels of plant bioelectrical activities revealed direction of information flow and dynamic complex causal connectives by using the two causal analysis approaches, especially indicated that the direction of information flow is not only along the longitudinal section but also spreading in transection.
△ Less
Submitted 21 March, 2017;
originally announced March 2017.