-
Brant-2: Foundation Model for Brain Signals
Authors:
Zhizhang Yuan,
Daoze Zhang,
Junru Chen,
Gefei Gu,
Yang Yang
Abstract:
Foundational models benefit from pre-training on large amounts of unlabeled data and enable strong performance in a wide variety of applications with a small amount of labeled data. Such models can be particularly effective in analyzing brain signals, as this field encompasses numerous application scenarios, and it is costly to perform large-scale annotation. In this work, we present the largest f…
▽ More
Foundational models benefit from pre-training on large amounts of unlabeled data and enable strong performance in a wide variety of applications with a small amount of labeled data. Such models can be particularly effective in analyzing brain signals, as this field encompasses numerous application scenarios, and it is costly to perform large-scale annotation. In this work, we present the largest foundation model in brain signals, Brant-2. Compared to Brant, a foundation model designed for intracranial neural signals, Brant-2 not only exhibits robustness towards data variations and modeling scales but also can be applied to a broader range of brain neural data. By experimenting on an extensive range of tasks, we demonstrate that Brant-2 is adaptive to various application scenarios in brain signals. Further analyses reveal the scalability of the Brant-2, validate each component's effectiveness, and showcase our model's ability to maintain performance in scenarios with scarce labels.
△ Less
Submitted 28 March, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation
Authors:
Nianzu Yang,
Kaipeng Zeng,
Haotian Lu,
Yexin Wu,
Zexin Yuan,
Danni Chen,
Shengdian Jiang,
Jiaxiang Wu,
Yimin Wang,
Junchi Yan
Abstract:
Neuronal morphology is essential for studying brain functioning and understanding neurodegenerative disorders. As acquiring real-world morphology data is expensive, computational approaches for morphology generation have been studied. Traditional methods heavily rely on expert-set rules and parameter tuning, making it difficult to generalize across different types of morphologies. Recently, MorphV…
▽ More
Neuronal morphology is essential for studying brain functioning and understanding neurodegenerative disorders. As acquiring real-world morphology data is expensive, computational approaches for morphology generation have been studied. Traditional methods heavily rely on expert-set rules and parameter tuning, making it difficult to generalize across different types of morphologies. Recently, MorphVAE was introduced as the sole learning-based method, but its generated morphologies lack plausibility, i.e., they do not appear realistic enough and most of the generated samples are topologically invalid. To fill this gap, this paper proposes MorphGrower, which mimicks the neuron natural growth mechanism for generation. Specifically, MorphGrower generates morphologies layer by layer, with each subsequent layer conditioned on the previously generated structure. During each layer generation, MorphGrower utilizes a pair of sibling branches as the basic generation block and generates branch pairs synchronously. This approach ensures topological validity and allows for fine-grained generation, thereby enhancing the realism of the final generated morphologies. Results on four real-world datasets demonstrate that MorphGrower outperforms MorphVAE by a notable margin. Importantly, the electrophysiological response simulation demonstrates the plausibility of our generated samples from a neuroscience perspective. Our code is available at https://github.com/Thinklab-SJTU/MorphGrower.
△ Less
Submitted 27 May, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
DiffDTM: A conditional structure-free framework for bioactive molecules generation targeted for dual proteins
Authors:
Lei Huang,
Zheng Yuan,
Huihui Yan,
Rong Sheng,
Lin**g Liu,
Fuzhou Wang,
Weidun Xie,
Nanjun Chen,
Fei Huang,
Songfang Huang,
Ka-Chun Wong,
Yaoyun Zhang
Abstract:
Advances in deep generative models shed light on de novo molecule generation with desired properties. However, molecule generation targeted for dual protein targets still faces formidable challenges including protein 3D structure data requisition for model training, auto-regressive sampling, and model generalization for unseen targets. Here, we proposed DiffDTM, a novel conditional structure-free…
▽ More
Advances in deep generative models shed light on de novo molecule generation with desired properties. However, molecule generation targeted for dual protein targets still faces formidable challenges including protein 3D structure data requisition for model training, auto-regressive sampling, and model generalization for unseen targets. Here, we proposed DiffDTM, a novel conditional structure-free deep generative model based on a diffusion model for dual targets based molecule generation to address the above issues. Specifically, DiffDTM receives protein sequences and molecular graphs as inputs instead of protein and molecular conformations and incorporates an information fusion module to achieve conditional generation in a one-shot manner. We have conducted comprehensive multi-view experiments to demonstrate that DiffDTM can generate drug-like, synthesis-accessible, novel, and high-binding affinity molecules targeting specific dual proteins, outperforming the state-of-the-art (SOTA) models in terms of multiple evaluation metrics. Furthermore, we utilized DiffDTM to generate molecules towards dopamine receptor D2 and 5-hydroxytryptamine receptor 1A as new antipsychotics. The experimental results indicate that DiffDTM can be easily plugged into unseen dual targets to generate bioactive molecules, addressing the issues of requiring insufficient active molecule data for training as well as the need to retrain when encountering new targets.
△ Less
Submitted 24 June, 2023;
originally announced June 2023.
-
AF2-Mutation: Adversarial Sequence Mutations against AlphaFold2 on Protein Tertiary Structure Prediction
Authors:
Zhongju Yuan,
Tao Shen,
Sheng Xu,
Leiye Yu,
Ruobing Ren,
Siqi Sun
Abstract:
Deep learning-based approaches, such as AlphaFold2 (AF2), have significantly advanced protein tertiary structure prediction, achieving results comparable to real biological experimental methods. While AF2 has shown limitations in predicting the effects of mutations, its robustness against sequence mutations remains to be determined. Starting with the wild-type (WT) sequence, we investigate adversa…
▽ More
Deep learning-based approaches, such as AlphaFold2 (AF2), have significantly advanced protein tertiary structure prediction, achieving results comparable to real biological experimental methods. While AF2 has shown limitations in predicting the effects of mutations, its robustness against sequence mutations remains to be determined. Starting with the wild-type (WT) sequence, we investigate adversarial sequences generated via an evolutionary approach, which AF2 predicts to be substantially different from WT. Our experiments on CASP14 reveal that by modifying merely three residues in the protein sequence using a combination of replacement, deletion, and insertion strategies, the alteration in AF2's predictions, as measured by the Local Distance Difference Test (lDDT), reaches 46.61. Moreover, when applied to a specific protein, SPNS2, our proposed algorithm successfully identifies biologically meaningful residues critical to protein structure determination and potentially indicates alternative conformations, thus significantly expediting the experimental process.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Molecular Geometry-aware Transformer for accurate 3D Atomic System modeling
Authors:
Zheng Yuan,
Yaoyun Zhang,
Chuanqi Tan,
Wei Wang,
Fei Huang,
Songfang Huang
Abstract:
Molecular dynamic simulations are important in computational physics, chemistry, material, and biology. Machine learning-based methods have shown strong abilities in predicting molecular energy and properties and are much faster than DFT calculations. Molecular energy is at least related to atoms, bonds, bond angles, torsion angles, and nonbonding atom pairs. Previous Transformer models only use a…
▽ More
Molecular dynamic simulations are important in computational physics, chemistry, material, and biology. Machine learning-based methods have shown strong abilities in predicting molecular energy and properties and are much faster than DFT calculations. Molecular energy is at least related to atoms, bonds, bond angles, torsion angles, and nonbonding atom pairs. Previous Transformer models only use atoms as inputs which lack explicit modeling of the aforementioned factors. To alleviate this limitation, we propose Moleformer, a novel Transformer architecture that takes nodes (atoms) and edges (bonds and nonbonding atom pairs) as inputs and models the interactions among them using rotational and translational invariant geometry-aware spatial encoding. Proposed spatial encoding calculates relative position information including distances and angles among nodes and edges. We benchmark Moleformer on OC20 and QM9 datasets, and our model achieves state-of-the-art on the initial state to relaxed energy prediction of OC20 and is very competitive in QM9 on predicting quantum chemical properties compared to other Transformer and Graph Neural Network (GNN) methods which proves the effectiveness of the proposed geometry-aware spatial encoding in Moleformer.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
Advanced Graph and Sequence Neural Networks for Molecular Property Prediction and Drug Discovery
Authors:
Zhengyang Wang,
Meng Liu,
Youzhi Luo,
Zhao Xu,
Yaochen Xie,
Limei Wang,
Lei Cai,
Qi Qi,
Zhuoning Yuan,
Tianbao Yang,
Shuiwang Ji
Abstract:
Properties of molecules are indicative of their functions and thus are useful in many applications. With the advances of deep learning methods, computational approaches for predicting molecular properties are gaining increasing momentum. However, there lacks customized and advanced methods and comprehensive tools for this task currently. Here we develop a suite of comprehensive machine learning me…
▽ More
Properties of molecules are indicative of their functions and thus are useful in many applications. With the advances of deep learning methods, computational approaches for predicting molecular properties are gaining increasing momentum. However, there lacks customized and advanced methods and comprehensive tools for this task currently. Here we develop a suite of comprehensive machine learning methods and tools spanning different computational models, molecular representations, and loss functions for molecular property prediction and drug discovery. Specifically, we represent molecules as both graphs and sequences. Built on these representations, we develop novel deep models for learning from molecular graphs and sequences. In order to learn effectively from highly imbalanced datasets, we develop advanced loss functions that optimize areas under precision-recall curves. Altogether, our work not only serves as a comprehensive tool, but also contributes towards develo** novel and advanced graph and sequence learning methodologies. Results on both online and offline antibiotics discovery and molecular property prediction tasks show that our methods achieve consistent improvements over prior methods. In particular, our methods achieve #1 ranking in terms of both ROC-AUC and PRC-AUC on the AI Cures Open Challenge for drug discovery related to COVID-19. Our software is released as part of the MoleculeX library under AdvProp.
△ Less
Submitted 6 July, 2021; v1 submitted 1 December, 2020;
originally announced December 2020.
-
Review of Machine-Learning Methods for RNA Secondary Structure Prediction
Authors:
Qi Zhao,
Zheng Zhao,
Xiaoya Fan,
Zhengwei Yuan,
Qian Mao,
Yudong Yao
Abstract:
Secondary structure plays an important role in determining the function of non-coding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary structure. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagn…
▽ More
Secondary structure plays an important role in determining the function of non-coding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary structure. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagnated in the last decade. Recently, with the increasing availability of RNA structure data, new methods based on machine-learning technologies, especially deep learning, have alleviated the issue. In this review, we provide a comprehensive overview of RNA secondary structure prediction methods based on machine-learning technologies and a tabularized summary of the most important methods in this field. The current pending issues in the field of RNA secondary structure prediction and future trends are also discussed.
△ Less
Submitted 31 August, 2020;
originally announced September 2020.
-
Deep Learning for Automatic Spleen Length Measurement in Sickle Cell Disease Patients
Authors:
Zhen Yuan,
Esther Puyol-Anton,
Haran Jogeesvaran,
Catriona Reid,
Baba Inusa,
Andrew P. King
Abstract:
Sickle Cell Disease (SCD) is one of the most common genetic diseases in the world. Splenomegaly (abnormal enlargement of the spleen) is frequent among children with SCD. If left untreated, splenomegaly can be life-threatening. The current workflow to measure spleen size includes palpation, possibly followed by manual length measurement in 2D ultrasound imaging. However, this manual measurement is…
▽ More
Sickle Cell Disease (SCD) is one of the most common genetic diseases in the world. Splenomegaly (abnormal enlargement of the spleen) is frequent among children with SCD. If left untreated, splenomegaly can be life-threatening. The current workflow to measure spleen size includes palpation, possibly followed by manual length measurement in 2D ultrasound imaging. However, this manual measurement is dependent on operator expertise and is subject to intra- and inter-observer variability. We investigate the use of deep learning to perform automatic estimation of spleen length from ultrasound images. We investigate two types of approach, one segmentation-based and one based on direct length estimation, and compare the results against measurements made by human experts. Our best model (segmentation-based) achieved a percentage length error of 7.42%, which is approaching the level of inter-observer variability (5.47%-6.34%). To the best of our knowledge, this is the first attempt to measure spleen size in a fully automated way from ultrasound images.
△ Less
Submitted 6 September, 2020;
originally announced September 2020.
-
Geometric Characteristics of Dynamic Correlations for Combinatorial Regulation in Gene Expression Noise
Authors:
Jiajun Zhang,
Zhanjiang Yuan,
Tianshou Zhou
Abstract:
Knowing which mode of combinatorial regulation (typically, AND or OR logic operation) that a gene employs is important for determining its function in regulatory networks. Here, we introduce a dynamic cross-correlation function between the output of a gene and its upstream regulator concentrations for signatures of combinatorial regulation in gene expression noise. We find that the correlation f…
▽ More
Knowing which mode of combinatorial regulation (typically, AND or OR logic operation) that a gene employs is important for determining its function in regulatory networks. Here, we introduce a dynamic cross-correlation function between the output of a gene and its upstream regulator concentrations for signatures of combinatorial regulation in gene expression noise. We find that the correlation function is always upwards convex for the AND operation whereas downwards convex for the OR operation, whichever sources of noise (intrinsic or extrinsic or both). In turn, this fact implies a means for inferring regulatory synergies from available experimental data. The extensions and applications are discussed.
△ Less
Submitted 17 April, 2009; v1 submitted 17 April, 2009;
originally announced April 2009.
-
Synchronization and clustering of synthetic genetic networks: A role for cis-regulatory modules
Authors:
Jiajun Zhang,
Zhanjiang Yuan,
Tianshou Zhou
Abstract:
The effect of signal integration through cis-regulatory modules (CRMs) on synchronization and clustering of populations of two-component genetic oscillators coupled by quorum sensing is in detail investigated. We find that the CRMs play an important role in achieving synchronization and clustering. For this, we investigate 6 possible cis-regulatory input functions (CRIFs) with AND, OR, ANDN, ORN…
▽ More
The effect of signal integration through cis-regulatory modules (CRMs) on synchronization and clustering of populations of two-component genetic oscillators coupled by quorum sensing is in detail investigated. We find that the CRMs play an important role in achieving synchronization and clustering. For this, we investigate 6 possible cis-regulatory input functions (CRIFs) with AND, OR, ANDN, ORN, XOR, and EQU types of responses in two possible kinds of cell-to-cell communications: activator-regulated communication (i.e., the autoinducer regulates the activator) and repressor-regulated communication (i.e., the autoinducer regulates the repressor). Both theoretical analysis and numerical simulation show that different CRMs drive fundamentally different cellular patterns, such as complete synchronization, various cluster-balanced states and several cluster-nonbalanced states.
△ Less
Submitted 2 April, 2009;
originally announced April 2009.
-
Cis-Regulatory Modules Drive Dynamic Patterns of a Multicellular System
Authors:
Jiajun Zhang,
Zhanjiang Yuan,
Tianshou Zhou
Abstract:
How intracellular and extracellular signals are integrated by transcription factors is essential for understanding complex cellular patterns at the population level. In this Letter, by using a synthetic genetic oscillator coupled to a quorum-sensing apparatus, we propose an experimentally feasible cis-regulatory module (CRM) which performs four possible logic operations (ANDN, ORN, NOR and NAND)…
▽ More
How intracellular and extracellular signals are integrated by transcription factors is essential for understanding complex cellular patterns at the population level. In this Letter, by using a synthetic genetic oscillator coupled to a quorum-sensing apparatus, we propose an experimentally feasible cis-regulatory module (CRM) which performs four possible logic operations (ANDN, ORN, NOR and NAND) of input signals. We show both numerically and theoretically that these different CRMs drive fundamentally different dynamic patterns, such as synchronization, clustering and splay state.
△ Less
Submitted 22 March, 2009;
originally announced March 2009.