Search | arXiv e-print repository

A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding

Authors: Yiqing Shen, Zan Chen, Michail Mamalakis, Luhan He, Haiyang Xia, Tianbin Li, Yanzhou Su, Junjun He, Yu Guang Wang

Abstract: The parallels between protein sequences and natural language in their sequential structures have inspired the application of large language models (LLMs) to protein understanding. Despite the success of LLMs in NLP, their effectiveness in comprehending protein sequences remains an open question, largely due to the absence of datasets linking protein sequences to descriptive text. Researchers have… ▽ More The parallels between protein sequences and natural language in their sequential structures have inspired the application of large language models (LLMs) to protein understanding. Despite the success of LLMs in NLP, their effectiveness in comprehending protein sequences remains an open question, largely due to the absence of datasets linking protein sequences to descriptive text. Researchers have then attempted to adapt LLMs for protein understanding by integrating a protein sequence encoder with a pre-trained LLM. However, this adaptation raises a fundamental question: "Can LLMs, originally designed for NLP, effectively comprehend protein sequences as a form of language?" Current datasets fall short in addressing this question due to the lack of a direct correlation between protein sequences and corresponding text descriptions, limiting the ability to train and evaluate LLMs for protein understanding effectively. To bridge this gap, we introduce ProteinLMDataset, a dataset specifically designed for further self-supervised pretraining and supervised fine-tuning (SFT) of LLMs to enhance their capability for protein sequence comprehension. Specifically, ProteinLMDataset includes 17.46 billion tokens for pretraining and 893,000 instructions for SFT. Additionally, we present ProteinLMBench, the first benchmark dataset consisting of 944 manually verified multiple-choice questions for assessing the protein understanding capabilities of LLMs. ProteinLMBench incorporates protein-related details and sequences in multiple languages, establishing a new standard for evaluating LLMs' abilities in protein comprehension. The large language model InternLM2-7B, pretrained and fine-tuned on the ProteinLMDataset, outperforms GPT-4 on ProteinLMBench, achieving the highest accuracy score. The dataset and the benchmark are available at https://huggingface.co/datasets/tsynbio/ProteinLMBench. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2405.12144 [pdf]

Alterations of electrocortical activity during hand movements induced by motor cortex glioma

Authors: Yihan Wu, Tao Chang, Siliang Chen, Xiaodong Niu, Yu Li, Yuan Fang, Lei Yang, Yixuan Zong, Yaoxin Yang, Yuehua Li, Mengsong Wang, Wen Yang, Yixuan Wu, Chen Fu, Xia Fang, Yuxin Quan, Xilin Peng, Qiang Sun, Marc M. Van Hulle, Yanhui Liu, Ning Jiang, Dario Farina, Yuan Yang, Jiayuan He, Qing Mao

Abstract: Glioma cells can reshape functional neuronal networks by hijacking neuronal synapses, leading to partial or complete neurological dysfunction. These mechanisms have been previously explored for language functions. However, the impact of glioma on sensorimotor functions is still unknown. Therefore, we recruited a control group of patients with unaffected motor cortex and a group of patients with gl… ▽ More Glioma cells can reshape functional neuronal networks by hijacking neuronal synapses, leading to partial or complete neurological dysfunction. These mechanisms have been previously explored for language functions. However, the impact of glioma on sensorimotor functions is still unknown. Therefore, we recruited a control group of patients with unaffected motor cortex and a group of patients with glioma-infiltrated motor cortex, and recorded high-density electrocortical signals during finger movement tasks. The results showed that glioma suppresses task-related synchronization in the high-gamma band and reduces the power across all frequency bands. The resulting atypical motor information transmission model with discrete signaling pathways and delayed responses disrupts the stability of neuronal encoding patterns for finger movement kinematics across various temporal-spatial scales. These findings demonstrate that gliomas functionally invade neural circuits within the motor cortex. This result advances our understanding of motor function processing in chronic disease states, which is important to advance the surgical strategies and neurorehabilitation approaches for patients with malignant gliomas. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2404.13902 [pdf]

Evaluating experiences in a digital nutrition education program for people with multiple sclerosis: a qualitative study

Authors: RD Russell, J He, LJ Black, A Begley

Abstract: Background Multiple sclerosis (MS) is a complex immune-mediated disease with no currently known cure. There is growing evidence to support the role of diet in reducing some of the symptoms and disease progression in MS, and we previously developed and tested the feasibility of a digital nutrition education program for people with MS. Objective The aim of this study was to explore factors that infl… ▽ More Background Multiple sclerosis (MS) is a complex immune-mediated disease with no currently known cure. There is growing evidence to support the role of diet in reducing some of the symptoms and disease progression in MS, and we previously developed and tested the feasibility of a digital nutrition education program for people with MS. Objective The aim of this study was to explore factors that influenced engagement in the digital nutrition education program, including features influencing capability, opportunity, and motivation to change their dietary behaviours. Methods Semi-structured interviews were conducted with people who MS who completed some or all of the program, until data saturation was reached. Interviews were analysed inductively using thematic analysis. Themes were deductively mapped against the COM-B behaviour change model. Results 16 interviews were conducted with participants who completed all (n=10) or some of the program (n=6). Four themes emerged: 1) Acquiring and validating nutrition knowledge; 2) Influence of time and social support; 3) Getting in early to improve health; and 4) Accounting for food literacy experiences. Discussion This is the first online nutrition program with suitable behavioural supports for people with MS. It highlights the importance of disease-specific and evidence-based nutrition education to support people with MS to make dietary changes. Acquiring nutrition knowledge, coupled with practical support mechanisms such as recipe booklets and goal-setting, emerged as crucial for facilitating engagement with the program. Conclusions When designing education programs for people with MS and other neurological conditions, healthcare professionals and program designers should consider flexible delivery and building peer support to address the needs and challenges faced by participants. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2311.16666 [pdf, other]

MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures

Authors: Zhuoyuan Wang, Jiacong Mi, Shan Lu, Jieyue He

Abstract: The quest for accurate prediction of drug molecule properties poses a fundamental challenge in the realm of Artificial Intelligence Drug Discovery (AIDD). An effective representation of drug molecules emerges as a pivotal component in this pursuit. Contemporary leading-edge research predominantly resorts to self-supervised learning (SSL) techniques to extract meaningful structural representations… ▽ More The quest for accurate prediction of drug molecule properties poses a fundamental challenge in the realm of Artificial Intelligence Drug Discovery (AIDD). An effective representation of drug molecules emerges as a pivotal component in this pursuit. Contemporary leading-edge research predominantly resorts to self-supervised learning (SSL) techniques to extract meaningful structural representations from large-scale, unlabeled molecular data, subsequently fine-tuning these representations for an array of downstream tasks. However, an inherent shortcoming of these studies lies in their singular reliance on one modality of molecular information, such as molecule image or SMILES representations, thus neglecting the potential complementarity of various molecular modalities. In response to this limitation, we propose MolIG, a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures. MolIG model innovatively leverages the coherence and correlation between molecule graph and molecule image to execute self-supervised tasks, effectively amalgamating the strengths of both molecular representation forms. This holistic approach allows for the capture of pivotal molecular structural characteristics and high-level semantic information. Upon completion of pre-training, Graph Neural Network (GNN) Encoder is used for the prediction of downstream tasks. In comparison to advanced baseline models, MolIG exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups such as MoleculeNet Benchmark Group and ADMET Benchmark Group. △ Less

Submitted 19 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: 8 pages

arXiv:2310.13913 [pdf, other]

Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models

Authors: Lihang Liu, Shanzhuo Zhang, Donglong He, Xianbin Ye, **gbo Zhou, Xiaonan Zhang, Yaoyao Jiang, Weiming Diao, Hang Yin, Hua Chai, Fan Wang, **gzhou He, Liang Zheng, Yonghui Li, Xiaomin Fang

Abstract: Protein-ligand structure prediction is an essential task in drug discovery, predicting the binding interactions between small molecules (ligands) and target proteins (receptors). Recent advances have incorporated deep learning techniques to improve the accuracy of protein-ligand structure prediction. Nevertheless, the experimental validation of docking conformations remains costly, it raises conce… ▽ More Protein-ligand structure prediction is an essential task in drug discovery, predicting the binding interactions between small molecules (ligands) and target proteins (receptors). Recent advances have incorporated deep learning techniques to improve the accuracy of protein-ligand structure prediction. Nevertheless, the experimental validation of docking conformations remains costly, it raises concerns regarding the generalizability of these deep learning-based methods due to the limited training data. In this work, we show that by pre-training on a large-scale docking conformation generated by traditional physics-based docking tools and then fine-tuning with a limited set of experimentally validated receptor-ligand complexes, we can obtain a protein-ligand structure prediction model with outstanding performance. Specifically, this process involved the generation of 100 million docking conformations for protein-ligand pairings, an endeavor consuming roughly 1 million CPU core days. The proposed model, HelixDock, aims to acquire the physical knowledge encapsulated by the physics-based docking tools during the pre-training phase. HelixDock has been rigorously benchmarked against both physics-based and deep learning-based baselines, demonstrating its exceptional precision and robust transferability in predicting binding confirmation. In addition, our investigation reveals the scaling laws governing pre-trained protein-ligand structure prediction models, indicating a consistent enhancement in performance with increases in model parameters and the volume of pre-training data. Moreover, we applied HelixDock to several drug discovery-related tasks to validate its practical utility. HelixDock demonstrates outstanding capabilities on both cross-docking and structure-based virtual screening benchmarks. △ Less

Submitted 22 May, 2024; v1 submitted 21 October, 2023; originally announced October 2023.

arXiv:2307.12996 [pdf, other]

Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning

Authors: Romain Lacombe, Andrew Gaut, Jeff He, David Lüdeke, Kateryna Pistunova

Abstract: Deep learning in computational biochemistry has traditionally focused on molecular graphs neural representations; however, recent advances in language models highlight how much scientific knowledge is encoded in text. To bridge these two modalities, we investigate how molecular property information can be transferred from natural language to graph representations. We study property prediction perf… ▽ More Deep learning in computational biochemistry has traditionally focused on molecular graphs neural representations; however, recent advances in language models highlight how much scientific knowledge is encoded in text. To bridge these two modalities, we investigate how molecular property information can be transferred from natural language to graph representations. We study property prediction performance gains after using contrastive learning to align neural graph representations with representations of textual descriptions of their characteristics. We implement neural relevance scoring strategies to improve text retrieval, introduce a novel chemically-valid molecular graph augmentation strategy inspired by organic reactions, and demonstrate improved performance on downstream MoleculeNet property classification tasks. We achieve a +4.26% AUROC gain versus models pre-trained on the graph modality alone, and a +1.54% gain compared to recently proposed molecular graph/text contrastively trained MoMu model (Su et al. 2022). △ Less

Submitted 22 July, 2023; originally announced July 2023.

Comments: 2023 ICML Workshop on Computational Biology

arXiv:2306.05445 [pdf, other]

Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning

Authors: Shuxin Zheng, Jiyan He, Chang Liu, Yu Shi, Ziheng Lu, Weitao Feng, Fusong Ju, Jiaxi Wang, Jianwei Zhu, Yaosen Min, He Zhang, Shidi Tang, Hongxia Hao, Peiran **, Chi Chen, Frank Noé, Haiguang Liu, Tie-Yan Liu

Abstract: Advances in deep learning have greatly improved structure prediction of molecules. However, many macroscopic observations that are important for real-world applications are not functions of a single molecular structure, but rather determined from the equilibrium distribution of structures. Traditional methods for obtaining these distributions, such as molecular dynamics simulation, are computation… ▽ More Advances in deep learning have greatly improved structure prediction of molecules. However, many macroscopic observations that are important for real-world applications are not functions of a single molecular structure, but rather determined from the equilibrium distribution of structures. Traditional methods for obtaining these distributions, such as molecular dynamics simulation, are computationally expensive and often intractable. In this paper, we introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems. Inspired by the annealing process in thermodynamics, DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system, such as a chemical graph or a protein sequence. This framework enables efficient generation of diverse conformations and provides estimations of state densities. We demonstrate the performance of DiG on several molecular tasks, including protein conformation sampling, ligand structure sampling, catalyst-adsorbate sampling, and property-guided structure generation. DiG presents a significant advancement in methodology for statistically understanding molecular systems, opening up new research opportunities in molecular science. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: 80 pages, 11 figures

arXiv:2211.08119 [pdf]

DeepRGVP: A Novel Microstructure-Informed Supervised Contrastive Learning Framework for Automated Identification Of The Retinogeniculate Pathway Using dMRI Tractography

Authors: Sipei Li, Jianzhong He, Tengfei Xue, Guoqiang Xie, Shun Yao, Yuqian Chen, Erickson F. Torio, Yuan**g Feng, Dhiego CA Bastos, Yogesh Rathi, Nikos Makris, Ron Kikinis, Wenya Linda Bi, Alexandra J Golby, Lauren J O'Donnell, Fan Zhang

Abstract: The retinogeniculate pathway (RGVP) is responsible for carrying visual information from the retina to the lateral geniculate nucleus. Identification and visualization of the RGVP are important in studying the anatomy of the visual system and can inform treatment of related brain diseases. Diffusion MRI (dMRI) tractography is an advanced imaging method that uniquely enables in vivo map** of the 3… ▽ More The retinogeniculate pathway (RGVP) is responsible for carrying visual information from the retina to the lateral geniculate nucleus. Identification and visualization of the RGVP are important in studying the anatomy of the visual system and can inform treatment of related brain diseases. Diffusion MRI (dMRI) tractography is an advanced imaging method that uniquely enables in vivo map** of the 3D trajectory of the RGVP. Currently, identification of the RGVP from tractography data relies on expert (manual) selection of tractography streamlines, which is time-consuming, has high clinical and expert labor costs, and affected by inter-observer variability. In this paper, we present what we believe is the first deep learning framework, namely DeepRGVP, to enable fast and accurate identification of the RGVP from dMRI tractography data. We design a novel microstructure-informed supervised contrastive learning method that leverages both streamline label and tissue microstructure information to determine positive and negative pairs. We propose a simple and successful streamline-level data augmentation method to address highly imbalanced training data, where the number of RGVP streamlines is much lower than that of non-RGVP streamlines. We perform comparisons with several state-of-the-art deep learning methods that were designed for tractography parcellation, and we show superior RGVP identification results using DeepRGVP. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: 5 pages, 2 figures, 2 tables

arXiv:2208.05863 [pdf, other]

GEM-2: Next Generation Molecular Property Prediction Network by Modeling Full-range Many-body Interactions

Authors: Lihang Liu, Donglong He, Xiaomin Fang, Shanzhuo Zhang, Fan Wang, **gzhou He, Hua Wu

Abstract: Molecular property prediction is a fundamental task in the drug and material industries. Physically, the properties of a molecule are determined by its own electronic structure, which is a quantum many-body system and can be exactly described by the Schr"odinger equation. Full-range many-body interactions between electrons have been proven effective in obtaining an accurate solution of the Schr"od… ▽ More Molecular property prediction is a fundamental task in the drug and material industries. Physically, the properties of a molecule are determined by its own electronic structure, which is a quantum many-body system and can be exactly described by the Schr"odinger equation. Full-range many-body interactions between electrons have been proven effective in obtaining an accurate solution of the Schr"odinger equation by classical computational chemistry methods, although modeling such interactions consumes an expensive computational cost. Meanwhile, deep learning methods have also demonstrated their competence in molecular property prediction tasks. Inspired by the classical computational chemistry methods, we design a novel method, namely GEM-2, which comprehensively considers full-range many-body interactions in molecules. Multiple tracks are utilized to model the full-range interactions between the many-bodies with different orders, and a novel axial attention mechanism is designed to approximate the full-range interaction modeling with much lower computational cost. Extensive experiments demonstrate the overwhelming superiority of GEM-2 over multiple baseline methods in quantum chemistry and drug discovery tasks. The ablation studies also verify the effectiveness of the full-range many-body interactions. △ Less

Submitted 20 October, 2022; v1 submitted 11 August, 2022; originally announced August 2022.

arXiv:2207.13921 [pdf, other]

doi 10.1038/s42256-023-00721-6

HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative

Authors: Xiaomin Fang, Fan Wang, Lihang Liu, **gzhou He, Dayong Lin, Yingfei Xiang, Xiaonan Zhang, Hua Wu, Hui Li, Le Song

Abstract: AI-based protein structure prediction pipelines, such as AlphaFold2, have achieved near-experimental accuracy. These advanced pipelines mainly rely on Multiple Sequence Alignments (MSAs) as inputs to learn the co-evolution information from the homologous sequences. Nonetheless, searching MSAs from protein databases is time-consuming, usually taking dozens of minutes. Consequently, we attempt to ex… ▽ More AI-based protein structure prediction pipelines, such as AlphaFold2, have achieved near-experimental accuracy. These advanced pipelines mainly rely on Multiple Sequence Alignments (MSAs) as inputs to learn the co-evolution information from the homologous sequences. Nonetheless, searching MSAs from protein databases is time-consuming, usually taking dozens of minutes. Consequently, we attempt to explore the limits of fast protein structure prediction by using only primary sequences of proteins. HelixFold-Single is proposed to combine a large-scale protein language model with the superior geometric learning capability of AlphaFold2. Our proposed method, HelixFold-Single, first pre-trains a large-scale protein language model (PLM) with thousands of millions of primary sequences utilizing the self-supervised learning paradigm, which will be used as an alternative to MSAs for learning the co-evolution information. Then, by combining the pre-trained PLM and the essential components of AlphaFold2, we obtain an end-to-end differentiable model to predict the 3D coordinates of atoms from only the primary sequence. HelixFold-Single is validated in datasets CASP14 and CAMEO, achieving competitive accuracy with the MSA-based methods on the targets with large homologous families. Furthermore, HelixFold-Single consumes much less time than the mainstream pipelines for protein structure prediction, demonstrating its potential in tasks requiring many predictions. The code of HelixFold-Single is available at https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold-single, and we also provide stable web services on https://paddlehelix.baidu.com/app/drug/protein-single/forecast. △ Less

Submitted 21 February, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

Journal ref: Nature Machine Intelligence, 2023

arXiv:2207.11695 [pdf, other]

doi 10.1103/PhysRevE.107.024402

Calcium oscillation on homogeneous and heterogeneous networks of ryanodine receptor

Authors: Zhong-Xue Gao, Tian-Tian Li, Han-Yu Jiang, Jun He

Abstract: Calcium oscillation is an important calcium homeostasis, imbalance of which is the key mechanism of initiation and progression of many major diseases. The formation and maintenance of calcium homeostasis are closely related to the spatial distribution of calcium channels. In the current paper, a theoretical framework is established by abstracting the spatial distribution of the calcium channels as… ▽ More Calcium oscillation is an important calcium homeostasis, imbalance of which is the key mechanism of initiation and progression of many major diseases. The formation and maintenance of calcium homeostasis are closely related to the spatial distribution of calcium channels. In the current paper, a theoretical framework is established by abstracting the spatial distribution of the calcium channels as a nonlinear biological complex network with calcium channels as nodes and Ca$^{2+}$ as edges. A dynamical model for a RyR is adopted to investigate the effect of spatial distribution on calcium oscillation. The mean-field model can be well reproduced from the complete graph and dense Erdös-Rényi network. The synchronization of RyRs is found important to generate a global calcium oscillation. The clique graph with a cluster structure can not produce a global oscillation due to the failure of synchronization between clusters. A more realistic geometric network is constructed in a two-dimensional plane based on the experimental information about the RyR arrangement of clusters and the frequency distribution of cluster sizes. Different from the clique graph, the global oscillation can be generated with reasonable parameters on the geometric network. The simulation also suggests that existence of small clusters and rogue RyR's plays an important role in the maintenance of global calcium oscillation through kee** synchronization between large clusters. Such results support the heterogeneous distribution of RyR's with different-size clusters, which is helpful to understand recent observations with super resolution nanoscale imaging techniques. The current theoretical framework can also be extent to investigate other phenomena in calcium signal transduction. △ Less

Submitted 1 February, 2023; v1 submitted 24 July, 2022; originally announced July 2022.

Comments: 14 pages, 8 figures, to be published in Phys. Rev. E

arXiv:2106.07267 [pdf, ps, other]

doi 10.1088/1674-1056/ac21bf

Nonlinear signal transduction network with multistate

Authors: Han-Yu Jiang, Jun He

Abstract: Signal transduction is an important and basic mechanism to cell life activities. The stochastic state transition of receptor induces the release of signaling molecular, which triggers the state transition of other receptors. It constructs a nonlinear sigaling network, and leads to robust switchlike properties which are critical to biological function. Network architectures and state transitions of… ▽ More Signal transduction is an important and basic mechanism to cell life activities. The stochastic state transition of receptor induces the release of signaling molecular, which triggers the state transition of other receptors. It constructs a nonlinear sigaling network, and leads to robust switchlike properties which are critical to biological function. Network architectures and state transitions of receptor affect the performance of this biological network. In this work, we perform a study of nonlinear signaling on biological polymorphic network by analyzing network dynamics of the Ca$^{2+}$ induced Ca$^{2+}$ release mechanism, where fast and slow processes are involved and the receptor has four conformational states. Three types of networks, Erdös-Rényi network, Watts-Strogatz network and BaraBási-Albert network, are considered with different parameters. The dynamics of the biological networks exhibit different patterns at different time scales. At short time scale, the second open state is essential to reproduce the quasi-bistable regime, which emerges at a critical strength of connection for all three states involved in the fast processes and disappears at another critical point. The pattern at short time scale is not sensitive to the network architecture. At long time scale, only monostable regime is observed, and difference of network architectures affects the results more seriously. Our finding identifies features of nonlinear signaling networks with multistate that may underlie their biological function. △ Less

Submitted 25 October, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

Comments: 12 pages, 7 figures

Journal ref: Chin. Phys. B 30 (2021) 118703

arXiv:2104.03139 [pdf]

doi 10.1186/s12870-022-03607-8

Functional annotation of cree** bentgrass protein sequences based on convolutional neural network

Authors: Han-Yu Jiang, Jun He

Abstract: Background: Cree** bentgrass (Agrostis soionifera) is a perennial grass of Gramineae, belonging to cold season turfgrass, but has poor disease resistance. Up to now, little is known about the induced systemic resistance (ISR) mechanism, especially the relevant functional proteins, which is important to disease resistance of turfgrass. Achieving more information of proteins of infected cree** b… ▽ More Background: Cree** bentgrass (Agrostis soionifera) is a perennial grass of Gramineae, belonging to cold season turfgrass, but has poor disease resistance. Up to now, little is known about the induced systemic resistance (ISR) mechanism, especially the relevant functional proteins, which is important to disease resistance of turfgrass. Achieving more information of proteins of infected cree** bentgrass is helpful to understand the ISR mechanism. Results: With BDO treatment, cree** bentgrass seedlings were grown, and the ISR response was induced by infecting Rhizoctonia solani. High-quality protein sequences of cree** bentgrass seedlings were obtained. Some of protein sequences were functionally annotated according to the database alignment while a large part of the obtained protein sequences was left non-annotated. To treat the non-annotated sequences, a prediction model based on convolutional neural network was established with the dataset from Uniport database in three domains to acquire good performance, especially the higher false positive control rate. With established model, the non-annotated protein sequences of cree** bentgrass were analyzed to annotate proteins relevant to disease-resistance response and signal transduction. Conclusions: The prediction model based on convolutional neural network was successfully applied to select good candidates of the proteins with functions relevant to the ISR mechanism from the protein sequences which cannot be annotated by database alignment. The waste of sequence data can be avoided, and research time and labor will be saved in further research of protein of cree** bentgrass by molecular biology technology. It also provides reference for other sequence analysis of turfgrass disease-resistance research. △ Less

Submitted 24 May, 2022; v1 submitted 7 April, 2021; originally announced April 2021.

Comments: 12 pages,3 figures. Accepted by BMC plant biology

Journal ref: BMC Plant Biol 22, 227 (2022)

arXiv:2006.04018

Projecting and comparing non-pharmaceutical interventions to contain COVID-19 in major economies

Authors: **g**g He, Xuefei Guan, Xiaochang Duan, Tian Shen, **g Lin

Abstract: Non-pharmaceutical interventions (NPIs) such as quarantine, self-isolation, social distancing, and virus-contact tracing can greatly reduce the spread of the virus during a pandemic. In the wave of the COVID-19 pandemic, many countries have implemented various NPIs for infection control and mitigation. However, the stringency of the NPIs and the resulting impact among different countries remain un… ▽ More Non-pharmaceutical interventions (NPIs) such as quarantine, self-isolation, social distancing, and virus-contact tracing can greatly reduce the spread of the virus during a pandemic. In the wave of the COVID-19 pandemic, many countries have implemented various NPIs for infection control and mitigation. However, the stringency of the NPIs and the resulting impact among different countries remain unclear due to the lack of quantitative factors. In this study we took a further step to incorporate the effect of the NPIs into the pandemic dynamics model using the concept of policy intensity factor (PIF). This idea enables us to characterize the transition rates as time varying quantities instead of constant values, and thus capturing the dynamical behavior of the basic reproduction number variation in the pandemic. By leveraging a great amount of data reported by the governments and the World Health Organization, we projected the dynamics of the pandemic for the major economies in the world, including the numbers of infected, susceptible, and recovered cases, as well as the pandemic durations. It is observed that the proposed variable-rate susceptible-exposed-infected-recovered (VR-SEIR) model fits and projects the pandemic dynamics very well. We further showed that the resulting PIFs correlate with the stringency of NPIs, which allows us to project the final affected numbers of people in those countries when their current NPIs have been imposed for 90, 180, 360 days. It provides a quantitative insight into the effectiveness of the implemented NPIs, and sheds a new light on minimizing both affected people from COVID-19 and the economic impact. △ Less

Submitted 14 December, 2020; v1 submitted 6 June, 2020; originally announced June 2020.

Comments: The results in this study projects the pandemic will end in one half to one year, which apparently is meaningless. Therefore, it is considered not accurate. To avoid unnecessary ambiguity, the authors would like to withdraw this draft. Thank you

arXiv:2005.08203 [pdf, ps, other]

doi 10.1088/1572-9494/abc7af

Three-dimensional cytoplasmic calcium propagation with boundaries

Authors: Han-Yu Jiang, Jun He

Abstract: Ca$^{2+}$ plays an important role in cell signal transduction. Its intracellular propagation is the most basic process of Ca$^{2+}$ signaling, such as calcium wave and double messenger system. In this work, with both numerical simulation and mean field ansatz, the 3-dimensional probability distribution of Ca$^{2+}$, which is read out by phosphorylation, is studied in two scenarios with boundaries.… ▽ More Ca$^{2+}$ plays an important role in cell signal transduction. Its intracellular propagation is the most basic process of Ca$^{2+}$ signaling, such as calcium wave and double messenger system. In this work, with both numerical simulation and mean field ansatz, the 3-dimensional probability distribution of Ca$^{2+}$, which is read out by phosphorylation, is studied in two scenarios with boundaries. The coverage of distribution of Ca$^{2+}$ is found at an order of magnitude of $μ$m, which is consistent with experimental observed calcium spike and wave. Our results suggest that the double messenger system may occur in the ER-PM junction to acquire great efficiency. The buffer effect of kinase is also discussed by calculating the average position of phosphorylations and free Ca$^{2+}$. The results are helpful to understand the mechanism of Ca$^{2+}$ signaling. △ Less

Submitted 10 November, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

Comments: 8 pages, 7 figures

Journal ref: Communications in Theoretical Physics [ Vol. 73, No. 1, (2021) 015601 (12pp) ]

arXiv:1912.01505 [pdf, ps, other]

An integrated heterogeneous Poisson model for neuron functions in hand movement during reaching and grasp

Authors: Shu-Chuan Chen, Lung-An Li, Ji** He

Abstract: To understand potential encoding mechanism of motor cortical neurons for control commands during reach-to-grasp movements, experiments to record neuronal activities from primary motor cortical regions have been conducted in many research laboratories (for example, (7), (17)). The most popular approach in neuroscience community is to fit the Analysis of Variance (ANOVA) model using the firing rates… ▽ More To understand potential encoding mechanism of motor cortical neurons for control commands during reach-to-grasp movements, experiments to record neuronal activities from primary motor cortical regions have been conducted in many research laboratories (for example, (7), (17)). The most popular approach in neuroscience community is to fit the Analysis of Variance (ANOVA) model using the firing rates of individual neurons. In addition to consider neural firing counts but also temporal intervals, (5) proposed to apply Analysis of Covariance (ANCOVA) model. Due to the nature of the data, in this paper we propose to apply an integrated method, called heterogeneous Poisson regression model, to categorize different neural activities. Three scenarios are discussed to show that the proposed heterogeneous Poisson regression model can overcome some disadvantages of the traditional Poisson regression model. △ Less

Submitted 27 November, 2019; originally announced December 2019.

arXiv:1901.06864 [pdf, other]

Can all-atom protein dynamics be reconstructed from the knowledge of C-alpha time evolution?

Authors: Jiaojiao Liu, ** Dai, Jianfeng He, Xubiao Peng, Antti J. Niemi

Abstract: We inquire to what extent protein peptide plane and side chain dynamics can be reconstructed from knowledge of C-alpha dynamics. Due to lack of experimental data we analyze all atom molecular dynamics trajectories from Anton supercomputer, and for clarity we limit our attention to the peptide plane O atoms and side chain C-beta atoms. We try and reconstruct their dynamics using four different appr… ▽ More We inquire to what extent protein peptide plane and side chain dynamics can be reconstructed from knowledge of C-alpha dynamics. Due to lack of experimental data we analyze all atom molecular dynamics trajectories from Anton supercomputer, and for clarity we limit our attention to the peptide plane O atoms and side chain C-beta atoms. We try and reconstruct their dynamics using four different approaches. Three of these are the publicly available reconstruction programs Pulchra, Remo Scwrl4. The fourth, Statistical Method, builds entirely on statistical analysis of Protein Data Bank (PDB) structures. All four methods place the O and C-beta atoms accurately along the Anton trajectories. However, the Statistical Method performs best. The results suggest that under physiological conditions, the all atom dynamics is slaved to that of C-alpha atoms. The results can help improve all atom force fields, and advance reconstruction and refinement methods for reduced protein structures. The results provide impetus for development of effective coarse grained force fields in terms of reduced coordinates. △ Less

Submitted 21 January, 2019; originally announced January 2019.

Comments: 24 figures

arXiv:1809.09553 [pdf]

Prediction of Coronary Heart Disease Using Routine Blood Tests

Authors: Ning Meng, Peng Zhang, Junfeng Li, Jun He, ** Zhu

Abstract: Background --The objective of this study was to examine the association of routine blood test results with coronary heart disease (CHD) risk, to incorporate them into coronary prediction models and to compare the discrimination properties of this approach with other prediction functions. Methods and Results --This work was designed as a retrospective, single-center study of a hospital-based cohort… ▽ More Background --The objective of this study was to examine the association of routine blood test results with coronary heart disease (CHD) risk, to incorporate them into coronary prediction models and to compare the discrimination properties of this approach with other prediction functions. Methods and Results --This work was designed as a retrospective, single-center study of a hospital-based cohort. The 5060 CHD patients (2365 men and 2695 women) were 1 to 97 years old at baseline with 8 years (2009-2017) of medical records, 5051 health check-ups and 5075 cases of other diseases. We developed a two-layer Gradient Boosting Decision Tree(GBDT) model based on routine blood data to predict the risk of coronary heart disease, which could identify 86% of people with coronary heart disease. We built a dataset with 15,000 routine blood tests results. Using this dataset, we trained the two-layer GBDT model to classify healthy status, coronary heart disease and other diseases. As a result of the classification after machine learning, we found that the sensitivity of detecting the health data was approximately 93% for all data, and the sensitivity of detecting CHD was 93% for disease data that included coronary heart disease. On this basis, we further visualized the correlation between routine blood results and related data items, and there was an obvious pattern in health and coronary heart disease in all data presentations, which can be used for clinical reference. Finally, we briefly analyzed the results above from the perspective of pathophysiology. Conclusions --Routine blood data provides more information about CHD than what we already know through the correlation between test results and related data items. A simple coronary disease prediction model was developed using a GBDT algorithm, which will allow physicians to predict CHD risk in patients without overt CHD. △ Less

Submitted 11 September, 2018; originally announced September 2018.

arXiv:1807.00094 [pdf, other]

Classification of lung nodules in CT images based on Wasserstein distance in differential geometry

Authors: Min Zhang, Qianli Ma, Chengfeng Wen, Hai Chen, Deruo Liu, Xianfeng Gu, Jie He, Xiaoyin Xu

Abstract: Lung nodules are commonly detected in screening for patients with a risk for lung cancer. Though the status of large nodules can be easily diagnosed by fine needle biopsy or bronchoscopy, small nodules are often difficult to classify on computed tomography (CT). Recent works have shown that shape analysis of lung nodules can be used to differentiate benign lesions from malignant ones, though exist… ▽ More Lung nodules are commonly detected in screening for patients with a risk for lung cancer. Though the status of large nodules can be easily diagnosed by fine needle biopsy or bronchoscopy, small nodules are often difficult to classify on computed tomography (CT). Recent works have shown that shape analysis of lung nodules can be used to differentiate benign lesions from malignant ones, though existing methods are limited in their sensitivity and specificity. In this work we introduced a new 3D shape analysis within the framework of differential geometry to calculate the Wasserstein distance between benign and malignant lung nodules to derive an accurate classification scheme. The Wasserstein distance between the nodules is calculated based on our new spherical optimal mass transport, this new algorithm works directly on sphere by using spherical metric, which is much more accurate and efficient than previous methods. In the process of deformation, the area-distortion factor gives a probability measure on the unit sphere, which forms the Wasserstein space. From known cases of benign and malignant lung nodules, we can calculate a unique optimal mass transport map between their correspondingly deformed Wasserstein spaces. This transportation cost defines the Wasserstein distance between them and can be used to classify new lung nodules into either the benign or malignant class. To the best of our knowledge, this is the first work that utilizes Wasserstein distance for lung nodule classification. The advantages of Wasserstein distance are it is invariant under rigid motions and scalings, thus it intrinsically measures shape distance even when the underlying shapes are of high complexity, making it well suited to classify lung nodules as they have different sizes, orientations, and appearances. △ Less

Submitted 29 June, 2018; originally announced July 2018.

arXiv:1805.09999 [pdf]

Subdivisions of the posteromedial cortex in disorders of consciousness

Authors: Yue Cui, Ming Song, Darren M. Lipnicki, Yi Yang, Chuyang Ye, Lingzhong Fan, **g Sui, Tianzi Jiang, Jianghong He

Abstract: Evidence suggests that disruptions of the posteromedial cortex (PMC) and posteromedial corticothalamic connectivity contribute to disorders of consciousness (DOCs). While most previous studies treated the PMC as a whole, this structure is functionally heterogeneous. The present study investigated whether particular subdivisions of the PMC are specifically associated with DOCs. Participants were DO… ▽ More Evidence suggests that disruptions of the posteromedial cortex (PMC) and posteromedial corticothalamic connectivity contribute to disorders of consciousness (DOCs). While most previous studies treated the PMC as a whole, this structure is functionally heterogeneous. The present study investigated whether particular subdivisions of the PMC are specifically associated with DOCs. Participants were DOC patients, 21 vegetative state/unresponsive wakefulness syndrome (VS/UWS), 12 minimally conscious state (MCS), and 29 healthy controls. Individual PMC and thalamus were divided into distinct subdivisions by their fiber tractograpy to each other and default mode regions, and white matter integrity and brain activity between/within subdivisions were assessed. The thalamus was represented mainly in the dorsal and posterior portions of the PMC, and the white matter tracts connecting these subdivisions to the thalamus had less integrity in VS/UWS patients than in MCS patients and healthy controls, as well as in patients who did not recover after 12 months than in patients who did. The structural substrates were validated by finding impaired functional fluctuations within this PMC subdivision. This study is the first to show that tracts from dorsal and posterior subdivisions of the PMC to the thalamus contribute to DOCs. △ Less

Submitted 25 May, 2018; originally announced May 2018.

arXiv:1801.03268 [pdf]

Prognostication of chronic disorders of consciousness using brain functional networks and clinical characteristics

Authors: Ming Song, Yi Yang, Jianghong He, Zhengyi Yang, Shan Yu, Qiuyou Xie, Xiaoyu Xia, Yuanyuan Dang, Qiang Zhang, Xinhuai Wu, Yue Cui, Bing Hou, Ronghao Yu, Ruxiang Xu, Tianzi Jiang

Abstract: Disorders of consciousness are a heterogeneous mixture of different diseases or injuries. Although some indicators and models have been proposed for prognostication, any single method when used alone carries a high risk of false prediction. This study aimed to develop a multidomain prognostic model that combines resting state functional MRI with three clinical characteristics to predict one year o… ▽ More Disorders of consciousness are a heterogeneous mixture of different diseases or injuries. Although some indicators and models have been proposed for prognostication, any single method when used alone carries a high risk of false prediction. This study aimed to develop a multidomain prognostic model that combines resting state functional MRI with three clinical characteristics to predict one year outcomes at the single-subject level. The model discriminated between patients who would later recover consciousness and those who would not with an accuracy of around 90% on three datasets from two medical centers. It was also able to identify the prognostic importance of different predictors, including brain functions and clinical characteristics. To our knowledge, this is the first implementation reported of a multidomain prognostic model based on resting state functional MRI and clinical characteristics in chronic disorders of consciousness. We therefore suggest that this novel prognostic model is accurate, robust, and interpretable. △ Less

Submitted 6 September, 2018; v1 submitted 10 January, 2018; originally announced January 2018.

Comments: Although some prognostic indicators and models have been proposed for disorders of consciousness, each single method when used alone carries risks of false prediction. Song et al. report that a model combining resting state functional MRI with clinical characteristics provided accurate, robust, and interpretable prognostications. 52 pages, 1 table, 7 figures

arXiv:1706.01345 [pdf, other]

Virtual reality analysis of intrinsic protein geometry with applications to cis peptide planes

Authors: Yanzhen Hou, ** Dai, Nevena Ilieva, Antti J. Niemi, Xubiao Peng, Jianfeng He

Abstract: A protein is traditionally visualised as a piecewise linear discrete curve, and its geometry is conventionally characterised by the extrinsically determined Ramachandran angles. However, a protein backbone has also two independent intrinsic geometric structures, due to the peptide planes and the side chains. Here we adapt and develop modern 3D virtual reality techniques to scrutinize the atomic ge… ▽ More A protein is traditionally visualised as a piecewise linear discrete curve, and its geometry is conventionally characterised by the extrinsically determined Ramachandran angles. However, a protein backbone has also two independent intrinsic geometric structures, due to the peptide planes and the side chains. Here we adapt and develop modern 3D virtual reality techniques to scrutinize the atomic geometry along a protein backbone, in the vicinity of a peptide plane. For this we compare backbone geometry-based (extrinsic) and structure-based (intrinsic) coordinate systems, and as an example we inspect the trans and cis peptide planes. We reveal systematics in the way how a cis peptide plane deforms the neighbouring atomic geometry, and we develop a virtual reality based visual methodology that can identify the presence of a cis peptide plane from the arrangement of atoms in its vicinity. Our approach can easily detect exceptionally placed atoms in crystallographic structures. Thus it can be employed as a powerful visual refinement tool which is applicable also in the case when resolution of the protein structure is limited and whenever refinement is needed. As concrete examples we identify a number of crystallographic protein structures in Protein Data Bank (PDB) that display exceptional atomic positions around their cis peptide planes. △ Less

Submitted 6 June, 2017; v1 submitted 5 June, 2017; originally announced June 2017.

Comments: 25 figures

arXiv:1612.01396 [pdf, other]

doi 10.1103/PhysRevE.95.032406

Towards multistage modelling of protein dynamics with monomeric Myc oncoprotein as an example

Authors: Jiaojiao Liu, ** Dai, Jianfeng He, Antti J. Niemi, Nevena Ilieva

Abstract: We propose to combine a mean field approach with all atom molecular dynamics into a multistage algorithm that can model protein folding and dynamics over very long time periods yet with atomic level precision. As an example we investigate an isolated monomeric Myc oncoprotein that has been implicated in carcinomas including those in colon, breast and lungs. Under physiological conditions a monomer… ▽ More We propose to combine a mean field approach with all atom molecular dynamics into a multistage algorithm that can model protein folding and dynamics over very long time periods yet with atomic level precision. As an example we investigate an isolated monomeric Myc oncoprotein that has been implicated in carcinomas including those in colon, breast and lungs. Under physiological conditions a monomeric Myc is presumed to be an example of intrinsically disordered proteins, that pose a serious challenge to existing modelling techniques. We argue that a room temperature monomeric Myc is in a dynamical state, it oscillates between different conformations that we identify. For this we adopt the C-alpha backbone of Myc in a crystallographic heteromer as an initial Ansatz for the monomeric structure. We construct a multisoliton of the pertinent Landau free energy, to describe the C-alpha profile with ultra high precision. We use Glauber dynamics to resolve how the multisoliton responds to repeated increases and decreases in ambient temperature. We confirm that the initial structure is unstable in isolation. We reveal a highly degenerate ground state landscape, an attractive set towards which Glauber dynamics converges in the limit of vanishing ambient temperature. We analyse the thermal stability of this Glauber attractor using room temperature molecular dynamics. We identify and scrutinise a particularly stable subset in which the two helical segments of the original multisoliton align in parallel, next to each other. During the MD time evolution of a representative structure from this subset, we observe intermittent quasiparticle oscillations along the C-terminal alpha-helix, some of which resemble a translating Davydov's Amide-I soliton. We propose that the presence of oscillatory motion is in line with the expected intrinsically disordered character of Myc. △ Less

Submitted 5 December, 2016; originally announced December 2016.

Comments: 17 figures

Journal ref: Phys. Rev. E 95, 032406 (2017)

arXiv:1610.06945 [pdf]

Decreased aneurysmal subarachnoid hemorrhage incidence rate in elderly population than in middle aged population: a retrospective analysis of 8,144 cases in Mainland China

Authors: Yi Xiang J Wang, Lihong Zhang, Lin Zhao, Jian He, Xian-Jun Zeng, Heng Liu, Yun-jun Yang, Shang-Wei Ding, Zhong-Fei Xu, Yong-Min He, Lin Yang, Lan Sun, Ke-jie Mu, Bai-Song Wang, Xiao-Hong Xu, Zhong-You Ji, Jian-hua Liu, **-Zhou Fang, Rui Hou, Feng Fan, Guang Ming Peng, Sheng-Hong Ju

Abstract: Purpose: Rupture of an intracranial aneurysm is the most common cause of subarachnoid haemorrhage (SAH), which is a life-threatening acute cerebrovascular event that typically affects working-age people. This study aims to investigate the aneurysmal SAH incidence rate in elderly population than in middle aged population in China. Materials and methods: Aneurysmal SAH cases were collected retrospec… ▽ More Purpose: Rupture of an intracranial aneurysm is the most common cause of subarachnoid haemorrhage (SAH), which is a life-threatening acute cerebrovascular event that typically affects working-age people. This study aims to investigate the aneurysmal SAH incidence rate in elderly population than in middle aged population in China. Materials and methods: Aneurysmal SAH cases were collected retrospectively from the archives of 21 hospitals in Mainland China. All the cases collected were from September 2016 and backward consecutively for a period of time up to 8 years. SAH was initially diagnosed by brain computed tomography, and CT angiography (CTA) or digital subtraction angiography (DSA) was followed and SAH was confirmed to be due to cerebral aneurysm. When for cases multiple bleeding occurred, the age of the first SAH was used in this study. The toltal incidence from all hospital at each age were summed together for females and males; then adjusted by the total population number at each age for females and males. The total population data was from the 2010 population census of the People's Republic of China. Results: In total there were 8,144 cases, with 4,861 females and 3,283 males. Our analysis shows for both females and males the relative aneurysmal SAH rate started to decrease after around 65 years old. The males the relative aneurysmal SAH rate might have started to decrease after around 55 years old. Conclusion: In contrast to previous reports, our data demonstrated a decreased aneurysmal subarachnoid hemorrhage incidence rate in elderly population than in middle aged population. Our data therefore support the hypothesis that aneurysms do not grow progressively once they form but probably either rupture or stabilize and that very elderly patients are at a reduced risk of rupture compared with atients who are younger with the same-sized aneurysms. △ Less

Submitted 19 October, 2016; originally announced October 2016.

Comments: Total 16 pages, 3 figures

arXiv:1511.07313 [pdf, other]

doi 10.1103/PhysRevE.93.032409

Bloch spin waves and emergent structure in protein folding with HIV envelope glycoprotein as an example

Authors: ** Dai, Antti J. Niemi, Jianfeng He, Adam Sieradzan, Nevena Ilieva

Abstract: We inquire how structure emerges during the process of protein folding. For this we scrutinise col- lective many-atom motions during all-atom molecular dynamics simulations. We introduce, develop and employ various topological techniques, in combination with analytic tools that we deduce from the concept of integrable models and structure of discrete nonlinear Schroedinger equation. The example we… ▽ More We inquire how structure emerges during the process of protein folding. For this we scrutinise col- lective many-atom motions during all-atom molecular dynamics simulations. We introduce, develop and employ various topological techniques, in combination with analytic tools that we deduce from the concept of integrable models and structure of discrete nonlinear Schroedinger equation. The example we consider is an alpha-helical subunit of the HIV envelope glycoprotein gp41. The helical structure is stable when the subunit is part of the biological oligomer. But in isolation the helix becomes unstable, and the monomer starts deforming. We follow the process computationally. We interpret the evolving structure both in terms of a backbone based Heisenberg spin chain and in terms of a side chain based XY spin chain. We find that in both cases the formation of protein super-secondary structure is akin the formation of a topological Bloch domain wall along a spin chain. During the process we identify three individual Bloch walls and we show that each of them can be modelled with a very high precision in terms of a soliton solution to a discrete nonlinear Schroedinger equation. △ Less

Submitted 23 November, 2015; originally announced November 2015.

Comments: 20 pages 29 figures

Journal ref: Phys. Rev. E 93, 032409 (2016)

arXiv:1412.7972 [pdf, other]

doi 10.1063/1.4905586

Aspects of structural landscape of human islet amyloid polypeptide

Authors: Jianfeng He, ** Dai, **g Li, Xubiao Peng, Antti J. Niemi

Abstract: The human islet amyloid polypeptide (hIAPP) co-operates with insulin to maintain glycemic balance. It also constitutes the amyloid plaques that aggregate in the pancreas of type-II diabetic patients. We have performed extensive in silico investigations to analyse the structural landscape of monomeric hIAPP, which is presumed to be intrinsically disordered. For this we construct from first principl… ▽ More The human islet amyloid polypeptide (hIAPP) co-operates with insulin to maintain glycemic balance. It also constitutes the amyloid plaques that aggregate in the pancreas of type-II diabetic patients. We have performed extensive in silico investigations to analyse the structural landscape of monomeric hIAPP, which is presumed to be intrinsically disordered. For this we construct from first principles a highly predictive energy function that describes a monomeric hIAPP observed in a NMR experiment, as a local energy minimum. We subject our theoretical model of hIAPP to repeated heating and cooling simulations, back and forth between a high temperature regime where the conformation resembles a random walker and a low temperature limit where no thermal motions prevail. We find that the final low temperature conformations display a high level of degeneracy, in a manner which is fully in line with the presumed intrinsically disordered character of hIAPP. In particular, we identify an isolated family of alpha-helical conformations that might cause the transition to amyloidosis, by nucleation. △ Less

Submitted 26 December, 2014; originally announced December 2014.

arXiv:1403.5241 [pdf]

Spatiotemporal Dissociation of Brain Activity underlying Subjective Awareness, Objective Performance and Confidence

Authors: Qi Li, Zachary Hill, Biyu J. He

Abstract: Despite intense recent research, the neural correlates of conscious visual perception remain elusive. The most established paradigm for studying brain mechanisms underlying conscious perception is to keep the physical sensory inputs constant and identify brain activities that correlate with the changing content of conscious awareness. However, such a contrast based on conscious content alone would… ▽ More Despite intense recent research, the neural correlates of conscious visual perception remain elusive. The most established paradigm for studying brain mechanisms underlying conscious perception is to keep the physical sensory inputs constant and identify brain activities that correlate with the changing content of conscious awareness. However, such a contrast based on conscious content alone would not only reveal brain activities directly contributing to conscious perception, but also include brain activities that precede or follow it. To address this issue, we devised a paradigm whereby we collected, trial-by-trial, measures of objective performance, subjective awareness, and the confidence level of subjective awareness. Using magnetoencephalography recordings in healthy human volunteers, we dissociated brain activities underlying these different cognitive phenomena. Our results provide strong evidence that widely distributed slow cortical potentials (SCPs) correlate with subjective awareness, even after the effects of objective performance and confidence were both removed. The SCP correlate of conscious perception manifests strongly in its waveform, phase, and power. In contrast, objective performance and confidence were both contributed by relatively transient brain activity. These results shed new light on the brain mechanisms of conscious, unconscious, and metacognitive processing. △ Less

Submitted 20 March, 2014; originally announced March 2014.

Comments: Published version of J. Neurosci

arXiv:1304.1985 [pdf]

Origins and evolutionary genomics of the novel 2013 avian-origin H7N9 influenza A virus in China: Early findings

Authors: Jiankui He, Luwen Ning, Yin Tong

Abstract: In March and early April 2013, a new avian-origin influenza A (H7N9) virus (A-OIV) emerged in the eastern China. During the first week of April, the 18 infection cases have been confirmed and 6 people have died since March. This virus has caused global concern as a potential pandemic threat. Here we use evolutionary analysis to reconstruct the origins and early development of the A-OIV viruses. We… ▽ More In March and early April 2013, a new avian-origin influenza A (H7N9) virus (A-OIV) emerged in the eastern China. During the first week of April, the 18 infection cases have been confirmed and 6 people have died since March. This virus has caused global concern as a potential pandemic threat. Here we use evolutionary analysis to reconstruct the origins and early development of the A-OIV viruses. We found that A-OIV was derived from a reassortment of three avian flu virus strains, and substantial mutations have been detected. Our results highlight the need for systematic surveillance of influenza in birds, and provide evidence that the mixing of new genetic elements in avian can result in the emergence of viruses with pandemic potential in humans. △ Less

Submitted 9 April, 2013; v1 submitted 7 April, 2013; originally announced April 2013.

Comments: 8 pages, 5 figures, 2 table

arXiv:1303.2333 [pdf]

Warburg Effect due to Exposure to Different Types of Radiation

Authors: Zhitong Bing, Bin Ao, Yanan Zhang, Fengling Wang, Caiyong Ye, **peng He, **tu Sun, Jie Xiong, Nan Ding, Xiao-fei Gao, Ji Qi, Sheng Zhang, Guangming Zhou, Lei Yang

Abstract: Cancer cells maintain a high level of aerobic glycolysis (the Warburg effect), which is associated with their rapid proliferation. Many studies have reported that the suppression of glycolysis and activation of oxidative phosphorylation can repress the growth of cancer cells through regulation of key regulators. Whether Warburg effect of cancer cells could be switched by some other environmental s… ▽ More Cancer cells maintain a high level of aerobic glycolysis (the Warburg effect), which is associated with their rapid proliferation. Many studies have reported that the suppression of glycolysis and activation of oxidative phosphorylation can repress the growth of cancer cells through regulation of key regulators. Whether Warburg effect of cancer cells could be switched by some other environmental stimulus? Herein, we report an interesting phenomenon in which cells alternated between glycolysis and mitochondrial respiration depending on the type of radiation they were exposed to. We observed enhanced glycolysis and mitochondrial respiration in HeLa cells exposed to 2-Gy X-ray and 2-Gy carbon ion radiation, respectively. This discovery may provide novel insights for tumor therapy. △ Less

Submitted 10 March, 2013; originally announced March 2013.

arXiv:1204.6313 [pdf, other]

Low-dimensional clustering detects incipient dominant influenza strain clusters

Authors: Jiankui He, Michael W. Deem

Abstract: Influenza has been circulating in the human population and has caused three pandemics in the last century (1918 H1N1, 1957 H2N2, 1968 H3N2). The 2009 A(H1N1) was classified by the World Health Organization (WHO) as the fourth pandemic. Influenza has a high evolution rate, which makes vaccine design challenging. We here consider an approach for early detection of new dominant strains. By clustering… ▽ More Influenza has been circulating in the human population and has caused three pandemics in the last century (1918 H1N1, 1957 H2N2, 1968 H3N2). The 2009 A(H1N1) was classified by the World Health Organization (WHO) as the fourth pandemic. Influenza has a high evolution rate, which makes vaccine design challenging. We here consider an approach for early detection of new dominant strains. By clustering the 2009 A(H1N1) sequence data, we found two main clusters. We then define a metric to detect the emergence of dominant strains. We show on historical H3N2 data that this method is able to identify a cluster around an incipient dominant strain before it becomes dominant. For example, for H3N2 as of March 30, 2009, the method detects the cluster for the new A/British Columbia/RV1222/2009 strain. This strain detection tool would appear to be useful for annual influenza vaccine selection. △ Less

Submitted 27 April, 2012; originally announced April 2012.

Comments: 50 pages, 6 figures, 1 table, supplement

Journal ref: Protein Engineering, Design & Selection 23 (2010) 935-946

arXiv:1008.2714 [pdf, ps, other]

Heterogeneous diversity of spacers within CRISPR

Authors: Jiankui He, Michael W. Deem

Abstract: Clustered regularly interspaced short palindromic repeats (CRISPR) in bacterial and archaeal DNA have recently been shown to be a new type of anti-viral immune system in these organisms. We here study the diversity of spacers in CRISPR under selective pressure. We propose a population dynamics model that explains the biological observation that the leader-proximal end of CRISPR is more diversified… ▽ More Clustered regularly interspaced short palindromic repeats (CRISPR) in bacterial and archaeal DNA have recently been shown to be a new type of anti-viral immune system in these organisms. We here study the diversity of spacers in CRISPR under selective pressure. We propose a population dynamics model that explains the biological observation that the leader-proximal end of CRISPR is more diversified and the leader-distal end of CRISPR is more conserved. This result is shown to be in agreement with recent experiments. Our results show thatthe CRISPR spacer structure is influenced by and provides a record of the viral challenges that bacteria face. △ Less

Submitted 16 August, 2010; originally announced August 2010.

Comments: 5 pages, 5 figures, to appear in Phys. Rev. Lett

arXiv:0902.4021 [pdf, ps, other]

doi 10.1103/PhysRevE.79.031907

Spontaneous Emergence of Modularity in a Model of Evolving Individuals and in Real Networks

Authors: Jiankui He, Jun Sun, Michael W. Deem

Abstract: We investigate the selective forces that promote the emergence of modularity in nature. We demonstrate the spontaneous emergence of modularity in a population of individuals that evolve in a changing environment. We show that the level of modularity correlates with the rapidity and severity of environmental change. The modularity arises as a synergistic response to the noise in the environment i… ▽ More We investigate the selective forces that promote the emergence of modularity in nature. We demonstrate the spontaneous emergence of modularity in a population of individuals that evolve in a changing environment. We show that the level of modularity correlates with the rapidity and severity of environmental change. The modularity arises as a synergistic response to the noise in the environment in the presence of horizontal gene transfer. We suggest that the hierarchical structure observed in the natural world may be a broken symmetry state, which generically results from evolution in a changing environment. To support our results, we analyze experimental protein interaction data and show that protein interaction networks became increasingly modular as evolution proceeded over the last four billion years. We also discuss a method to determine the divergence time of a protein. △ Less

Submitted 23 February, 2009; originally announced February 2009.

Comments: 27 pages, 24 figures; to appear in Phys. Rev. E

Showing 1–32 of 32 results for author: He, J