-
Clustering for Protein Representation Learning
Authors:
Ruijie Quan,
Wenguan Wang,
Fan Ma,
Hehe Fan,
Yi Yang
Abstract:
Protein representation learning is a challenging task that aims to capture the structure and function of proteins from their amino acid sequences. Previous methods largely ignored the fact that not all amino acids are equally important for protein folding and activity. In this article, we propose a neural clustering framework that can automatically discover the critical components of a protein by…
▽ More
Protein representation learning is a challenging task that aims to capture the structure and function of proteins from their amino acid sequences. Previous methods largely ignored the fact that not all amino acids are equally important for protein folding and activity. In this article, we propose a neural clustering framework that can automatically discover the critical components of a protein by considering both its primary and tertiary structure information. Our framework treats a protein as a graph, where each node represents an amino acid and each edge represents a spatial or sequential connection between amino acids. We then apply an iterative clustering strategy to group the nodes into clusters based on their 1D and 3D positions and assign scores to each cluster. We select the highest-scoring clusters and use their medoid nodes for the next iteration of clustering, until we obtain a hierarchical and informative representation of the protein. We evaluate on four protein-related tasks: protein fold classification, enzyme reaction classification, gene ontology term prediction, and enzyme commission number prediction. Experimental results demonstrate that our method achieves state-of-the-art performance.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
ProtChatGPT: Towards Understanding Proteins with Large Language Models
Authors:
Chao Wang,
Hehe Fan,
Ruijie Quan,
Yi Yang
Abstract:
Protein research is crucial in various fundamental disciplines, but understanding their intricate structure-function relationships remains challenging. Recent Large Language Models (LLMs) have made significant strides in comprehending task-specific knowledge, suggesting the potential for ChatGPT-like systems specialized in protein to facilitate basic research. In this work, we introduce ProtChatGP…
▽ More
Protein research is crucial in various fundamental disciplines, but understanding their intricate structure-function relationships remains challenging. Recent Large Language Models (LLMs) have made significant strides in comprehending task-specific knowledge, suggesting the potential for ChatGPT-like systems specialized in protein to facilitate basic research. In this work, we introduce ProtChatGPT, which aims at learning and understanding protein structures via natural languages. ProtChatGPT enables users to upload proteins, ask questions, and engage in interactive conversations to produce comprehensive answers. The system comprises protein encoders, a Protein-Language Pertaining Transformer (PLP-former), a projection adapter, and an LLM. The protein first undergoes protein encoders and PLP-former to produce protein embeddings, which are then projected by the adapter to conform with the LLM. The LLM finally combines user questions with projected embeddings to generate informative answers. Experiments show that ProtChatGPT can produce promising responses to proteins and their corresponding questions. We hope that ProtChatGPT could form the basis for further exploration and application in protein research. Code and our pre-trained model will be publicly available.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Breathing cluster in complex neuron-astrocyte networks
Authors:
Ya Wang,
Liang Wang,
Huawei Fan,
Jun Ma,
Hui Cao,
Xingang Wang
Abstract:
Brain activities are featured by spatially distributed neural clusters of coherent firings and a spontaneous switching of the clusters between the synchrony and asynchrony states. Evidences from {\it in vivo} experiments suggest that astrocytes, a type of glial cell regarded previously as providing only structural and metabolic supports to neurons, participate actively in brain functions and play…
▽ More
Brain activities are featured by spatially distributed neural clusters of coherent firings and a spontaneous switching of the clusters between the synchrony and asynchrony states. Evidences from {\it in vivo} experiments suggest that astrocytes, a type of glial cell regarded previously as providing only structural and metabolic supports to neurons, participate actively in brain functions and play a crucial role in regulating the neural firing activities, yet the mechanism remains unknown. Introducing astrocyte as a reservoir of the glutamate released from neuron synapses, here we propose the model of complex neuron-astrocyte network and employ it to explore the roles of astrocyte in regulating the synchronization behaviors of networked neurons. It is found that a fraction of neurons on the network can be synchronized as a cluster, while the remaining neurons are kept as desynchronized. Moreover, during the course of network evolution, the cluster is switching between the synchrony and asynchrony states intermittently, henceforth the phenomenon of ``breathing cluster". By the method of symmetry-based analysis, we conduct a theoretical investigation on the stability of the cluster and the mechanism generating the breathing activities. It is revealed that the contents of the cluster are determined by the network symmetry and the breathing activities are due to the interplay between the neural network and the astrocyte. The breathing phenomenon is demonstrated in network models of different structures and neural dynamics. The studies give insights into the cellular mechanism of astrocytes in regulating neural activities, and shed lights onto the spontaneous state switching of the neocortex.
△ Less
Submitted 26 January, 2023;
originally announced February 2023.
-
Criticality in Reservoir Computer of Coupled Phase Oscillators
Authors:
Liang Wang,
Huawei Fan,
**ghua Xiao,
Yueheng Lan,
Xingang Wang
Abstract:
Accumulating evidences show that the cerebral cortex is operating near a critical state featured by power-law size distribution of neural avalanche activities, yet evidence of this critical state in artificial neural networks mimicking the cerebral cortex is lacking. Here we design an artificial neural network of coupled phase oscillators and, by the technique of reservoir computing in machine lea…
▽ More
Accumulating evidences show that the cerebral cortex is operating near a critical state featured by power-law size distribution of neural avalanche activities, yet evidence of this critical state in artificial neural networks mimicking the cerebral cortex is lacking. Here we design an artificial neural network of coupled phase oscillators and, by the technique of reservoir computing in machine learning, train it for predicting chaos. It is found that when the machine is properly trained, oscillators in the reservoir are synchronized into clusters whose sizes follow a power-law distribution. This feature, however, is absent when the machine is poorly trained. Additionally, it is found that despite the synchronization degree of the original network, once properly trained, the reservoir network is always developed to the same critical state, exemplifying the "attractor" nature of this state in machine learning. The generality of the results is verified in different reservoir models and by different target systems, and it is found that the scaling exponent of the distribution is independent on the reservoir details and the bifurcation parameter of the target system, but is modified when the dynamics of the target system is changed to a different type. The findings shed lights on the nature of machine learning, and are helpful to the design of high-performance machine in physical systems.
△ Less
Submitted 22 July, 2021;
originally announced August 2021.
-
Synchronization within synchronization: transients and intermittency in ecological networks
Authors:
Huawei Fan,
Ling-Wei Kong,
Xingang Wang,
Alan Hastings,
Ying-Cheng Lai
Abstract:
Transients are fundamental to ecological systems with significant implications to management, conservation, and biological control. We uncover a type of transient synchronization behavior in spatial ecological networks whose local dynamics are of the chaotic, predator-prey type. In the parameter regime where there is phase synchronization among all the patches, complete synchronization (i.e., sync…
▽ More
Transients are fundamental to ecological systems with significant implications to management, conservation, and biological control. We uncover a type of transient synchronization behavior in spatial ecological networks whose local dynamics are of the chaotic, predator-prey type. In the parameter regime where there is phase synchronization among all the patches, complete synchronization (i.e., synchronization in both phase and amplitude) can arise in certain pairs of patches as determined by the network symmetry - henceforth the phenomenon of "synchronization within synchronization." Distinct patterns of complete synchronization coexist but, due to intrinsic instability or noise, each pattern is a transient and there is random, intermittent switching among the patterns in the course of time evolution. The probability distribution of the transient time is found to follow an algebraic scaling law with a divergent average transient lifetime. Based on symmetry considerations, we develop a stability analysis to understand these phenomena. The general principle of symmetry can also be exploited to explain previously discovered, counterintuitive synchronization behaviors in ecological networks.
△ Less
Submitted 20 November, 2020;
originally announced November 2020.
-
How network temporal dynamics shape a mutualistic system with invasive species?
Authors:
Andrzej Jarynowski,
Fco. Alejandro Lopez-Nunez,
Han Fan
Abstract:
Ecological networks allow us to study the structure and function of ecosystems and gain insights on species resilience/stability. The study of this ecological networks is usually a snapshop focused in a limited specific range of space and time, prevent us to perceive the real dynamics of ecological processes. By definition, an alien species has some ecological strategies and traits that permit it…
▽ More
Ecological networks allow us to study the structure and function of ecosystems and gain insights on species resilience/stability. The study of this ecological networks is usually a snapshop focused in a limited specific range of space and time, prevent us to perceive the real dynamics of ecological processes. By definition, an alien species has some ecological strategies and traits that permit it to compete better than the native species (e.g. absence of predators, different bloom period, high grow rate, etc.). Plant-pollinator networks provide valuable services to whole ecosystems and the introduction of an alien species may have different effects on the native network (competitive facilitation, native species extinction, etc.). While scientists acknowledge the significance of network connectivity in driving ecosystem services, the inclusion of temporary networks in ecological models is still in its infancy. We propose to use existing data on seasonality to develop a simulation platform that show inference between temporality of networks and invasions traits. Our focus is only to pick up some simple model to show, that theoretically temporal aspect play a role (different extinction patterns) to encourage ecologist to get involved in temporal networks. Moreover, the derived simulations could be further extended and adjust to other ecological questions.
△ Less
Submitted 16 July, 2014;
originally announced July 2014.
-
Digital PCR provides sensitive and absolute calibration for high throughput sequencing
Authors:
Richard A. White III,
Paul Blainey,
H. Christina Fan,
Stephen R. Quake
Abstract:
Several of the next generation sequencers are limited in their sample preparation process by the need to make an absolute measurement of the number of template molecules in the library to be sequenced. As currently practiced, the practical effects of this requirement compromise sequencing performance, both by requiring large amounts of sample DNA and by requiring extra sequencing runs to be perf…
▽ More
Several of the next generation sequencers are limited in their sample preparation process by the need to make an absolute measurement of the number of template molecules in the library to be sequenced. As currently practiced, the practical effects of this requirement compromise sequencing performance, both by requiring large amounts of sample DNA and by requiring extra sequencing runs to be performed. We used digital PCR to quantitate sequencing libraries, and demonstrated its sensitivity and robustness by preparing and sequencing libraries from subnanogram amounts of bacterial and human DNA on the 454 and Solexa sequencing platforms. This assay allows absolute quantitation and eliminates uncertainties associated with the construction and application of standard curves. The digital PCR platform consumes subfemptogram amounts of the sequencing library and gives highly accurate results, allowing the optimal DNA concentration to be used in setting up sequencing runs without costly and time-consuming titration techniques. This approach also reduces the input sample requirement more than 1000-fold: from micrograms of DNA to less than a nanogram.
△ Less
Submitted 15 August, 2008;
originally announced August 2008.
-
Detection of Aneuploidy with Digital PCR
Authors:
H. Christina Fan,
Stephen R. Quake
Abstract:
The widespread use of genetic testing in high risk pregnancies has created strong interest in rapid and accurate molecular diagnostics for common chromosomal aneuploidies. We show here that digital polymerase chain reaction (dPCR) can be used for accurate measurement of trisomy 21 (Down's Syndrome), the most common human aneuploidy. dPCR is generally applicable to any aneuploidy, does not depend…
▽ More
The widespread use of genetic testing in high risk pregnancies has created strong interest in rapid and accurate molecular diagnostics for common chromosomal aneuploidies. We show here that digital polymerase chain reaction (dPCR) can be used for accurate measurement of trisomy 21 (Down's Syndrome), the most common human aneuploidy. dPCR is generally applicable to any aneuploidy, does not depend on allelic distribution or gender, and is able to detect signals in the presence of mosaics or contaminating maternal DNA.
△ Less
Submitted 8 May, 2007;
originally announced May 2007.