Search | arXiv e-print repository

A principled framework to assess information theoretical fitness of brain functional sub-circuits

Authors: Duy Duong-Tran, Nghi Nguyen, Shizhuo Mu, Jiong Chen, **gxuan Bao, Frederick Xu, Sumita Garai, Jose Cadena-Pico, Alan David Kaplan, Tianlong Chen, Yize Zhao, Li Shen, Joaquín Goñi

Abstract: In systems and network neuroscience, many common practices in brain connectomic analysis are often not properly scrutinized. One such practice is map** a predetermined set of sub-circuits, like functional networks (FNs), onto subjects' functional connectomes (FCs) without adequately assessing the information-theoretic appropriateness of the partition. Another practice that goes unchallenged is t… ▽ More In systems and network neuroscience, many common practices in brain connectomic analysis are often not properly scrutinized. One such practice is map** a predetermined set of sub-circuits, like functional networks (FNs), onto subjects' functional connectomes (FCs) without adequately assessing the information-theoretic appropriateness of the partition. Another practice that goes unchallenged is thresholding weighted FCs to remove spurious connections without justifying the chosen threshold. This paper leverages recent theoretical advances in Stochastic Block Models (SBMs) to formally define and quantify the information-theoretic fitness (e.g., prominence) of a predetermined set of FNs when mapped to individual FCs under different fMRI task conditions. Our framework allows for evaluating any combination of FC granularity, FN partition, and thresholding strategy, thereby optimizing these choices to preserve important topological features of the human brain connectomes. Our results pave the way for the proper use of predetermined FNs and thresholding methods and provide insights for future research in individualized parcellations. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.02066 [pdf, other]

Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models

Authors: Songtao Liu, Hanjun Dai, Yue Zhao, Peng Liu

Abstract: Molecule synthesis through machine learning is one of the fundamental problems in drug discovery. Current data-driven strategies employ one-step retrosynthesis models and search algorithms to predict synthetic routes in a top-bottom manner. Despite their effective performance, these strategies face limitations in the molecule synthetic route generation due to a greedy selection of the next molecul… ▽ More Molecule synthesis through machine learning is one of the fundamental problems in drug discovery. Current data-driven strategies employ one-step retrosynthesis models and search algorithms to predict synthetic routes in a top-bottom manner. Despite their effective performance, these strategies face limitations in the molecule synthetic route generation due to a greedy selection of the next molecule set without any lookahead. Furthermore, existing strategies cannot control the generation of synthetic routes based on possible criteria such as material costs, yields, and step count. In this work, we propose a general and principled framework via conditional residual energy-based models (EBMs), that focus on the quality of the entire synthetic route based on the specific criteria. By incorporating an additional energy-based function into our probabilistic model, our proposed algorithm can enhance the quality of the most probable synthetic routes (with higher probabilities) generated by various strategies in a plug-and-play fashion. Extensive experiments demonstrate that our framework can consistently boost performance across various strategies and outperforms previous state-of-the-art top-1 accuracy by a margin of 2.5%. Code is available at https://github.com/SongtaoLiu0823/CREBM. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by ICML 2024(Oral)

arXiv:2404.08023 [pdf, other]

Pathology-genomic fusion via biologically informed cross-modality graph learning for survival analysis

Authors: Zeyu Zhang, Yuanshen Zhao, **gxian Duan, Yaou Liu, Hairong Zheng, Dong Liang, Zhenyu Zhang, Zhi-Cheng Li

Abstract: The diagnosis and prognosis of cancer are typically based on multi-modal clinical data, including histology images and genomic data, due to the complex pathogenesis and high heterogeneity. Despite the advancements in digital pathology and high-throughput genome sequencing, establishing effective multi-modal fusion models for survival prediction and revealing the potential association between histo… ▽ More The diagnosis and prognosis of cancer are typically based on multi-modal clinical data, including histology images and genomic data, due to the complex pathogenesis and high heterogeneity. Despite the advancements in digital pathology and high-throughput genome sequencing, establishing effective multi-modal fusion models for survival prediction and revealing the potential association between histopathology and transcriptomics remains challenging. In this paper, we propose Pathology-Genome Heterogeneous Graph (PGHG) that integrates whole slide images (WSI) and bulk RNA-Seq expression data with heterogeneous graph neural network for cancer survival analysis. The PGHG consists of biological knowledge-guided representation learning network and pathology-genome heterogeneous graph. The representation learning network utilizes the biological prior knowledge of intra-modal and inter-modal data associations to guide the feature extraction. The node features of each modality are updated through attention-based graph learning strategy. Unimodal features and bi-modal fused features are extracted via attention pooling module and then used for survival prediction. We evaluate the model on low-grade gliomas, glioblastoma, and kidney renal papillary cell carcinoma datasets from the Cancer Genome Atlas (TCGA) and the First Affiliated Hospital of Zhengzhou University (FAHZU). Extensive experimental results demonstrate that the proposed method outperforms both unimodal and other multi-modal fusion models. For demonstrating the model interpretability, we also visualize the attention heatmap of pathological images and utilize integrated gradient algorithm to identify important tissue structure, biological pathways and key genes. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.08203 [pdf, other]

Learnable Community-Aware Transformer for Brain Connectome Analysis with Token Clustering

Authors: Yanting Yang, Beidi Zhao, Zhuohao Ni, Yize Zhao, Xiaoxiao Li

Abstract: Neuroscientific research has revealed that the complex brain network can be organized into distinct functional communities, each characterized by a cohesive group of regions of interest (ROIs) with strong interconnections. These communities play a crucial role in comprehending the functional organization of the brain and its implications for neurological conditions, including Autism Spectrum Disor… ▽ More Neuroscientific research has revealed that the complex brain network can be organized into distinct functional communities, each characterized by a cohesive group of regions of interest (ROIs) with strong interconnections. These communities play a crucial role in comprehending the functional organization of the brain and its implications for neurological conditions, including Autism Spectrum Disorder (ASD) and biological differences, such as in gender. Traditional models have been constrained by the necessity of predefined community clusters, limiting their flexibility and adaptability in deciphering the brain's functional organization. Furthermore, these models were restricted by a fixed number of communities, hindering their ability to accurately represent the brain's dynamic nature. In this study, we present a token clustering brain transformer-based model ($\texttt{TC-BrainTF}$) for joint community clustering and classification. Our approach proposes a novel token clustering (TC) module based on the transformer architecture, which utilizes learnable prompt tokens with orthogonal loss where each ROI embedding is projected onto the prompt embedding space, effectively clustering ROIs into communities and reducing the dimensions of the node representation via merging with communities. Our results demonstrate that our learnable community-aware model $\texttt{TC-BrainTF}$ offers improved accuracy in identifying ASD and classifying genders through rigorous testing on ABIDE and HCP datasets. Additionally, the qualitative analysis on $\texttt{TC-BrainTF}$ has demonstrated the effectiveness of the designed TC module and its relevance to neuroscience interpretations. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.04552 [pdf]

Analysis of a Leslie-Gower model with Alle effects, cooperative hunting, and constant placement rates

Authors: Yonghui Zhao

Abstract: This paper investigates the dynamical properties of the Leslie-Gower model with Alle effects, cooperative hunting, and constant placement rates. The conditions for the existence of the triple equilibrium point of the model are first analyzed. Subsequently, the canonical type theory and the qualitative theory of planar systems are applied to obtain that the triple equilibrium point can be a node wi… ▽ More This paper investigates the dynamical properties of the Leslie-Gower model with Alle effects, cooperative hunting, and constant placement rates. The conditions for the existence of the triple equilibrium point of the model are first analyzed. Subsequently, the canonical type theory and the qualitative theory of planar systems are applied to obtain that the triple equilibrium point can be a node with a residual dimension of 2 and an equilibrium point with a residual dimension of 3 under different parameter conditions. Finally, it is proved that the system bifurcates with a residual dimension of 2 in the vicinity of the node with cooperative hunting and placement rate as branching parameters. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2402.18784 [pdf, other]

Brain-inspired and Self-based Artificial Intelligence

Authors: Yi Zeng, Feifei Zhao, Yuxuan Zhao, Dongcheng Zhao, Enmeng Lu, Qian Zhang, Yuwei Wang, Hui Feng, Zhuoya Zhao, Jihang Wang, Qingqun Kong, Yinqian Sun, Yang Li, Guobin Shen, Bing Han, Yiting Dong, Wenxuan Pan, Xiang He, Aorigele Bao, ** Wang

Abstract: The question "Can machines think?" and the Turing Test to assess whether machines could achieve human-level intelligence is one of the roots of AI. With the philosophical argument "I think, therefore I am", this paper challenge the idea of a "thinking machine" supported by current AIs since there is no sense of self in them. Current artificial intelligence is only seemingly intelligent information… ▽ More The question "Can machines think?" and the Turing Test to assess whether machines could achieve human-level intelligence is one of the roots of AI. With the philosophical argument "I think, therefore I am", this paper challenge the idea of a "thinking machine" supported by current AIs since there is no sense of self in them. Current artificial intelligence is only seemingly intelligent information processing and does not truly understand or be subjectively aware of oneself and perceive the world with the self as human intelligence does. In this paper, we introduce a Brain-inspired and Self-based Artificial Intelligence (BriSe AI) paradigm. This BriSe AI paradigm is dedicated to coordinating various cognitive functions and learning strategies in a self-organized manner to build human-level AI models and robotic applications. Specifically, BriSe AI emphasizes the crucial role of the Self in sha** the future AI, rooted with a practical hierarchical Self framework, including Perception and Learning, Bodily Self, Autonomous Self, Social Self, and Conceptual Self. The hierarchical framework of the Self highlights self-based environment perception, self-bodily modeling, autonomous interaction with the environment, social interaction and collaboration with others, and even more abstract understanding of the Self. Furthermore, the positive mutual promotion and support among multiple levels of Self, as well as between Self and learning, enhance the BriSe AI's conscious understanding of information and flexible adaptation to complex environments, serving as a driving force propelling BriSe AI towards real Artificial General Intelligence. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.16359 [pdf, other]

Feedback Efficient Online Fine-Tuning of Diffusion Models

Authors: Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Sergey Levine, Tommaso Biancalani

Abstract: Diffusion models excel at modeling complex data distributions, including those of images, proteins, and small molecules. However, in many cases, our goal is to model parts of the distribution that maximize certain properties: for example, we may want to generate images with high aesthetic quality, or molecules with high bioactivity. It is natural to frame this as a reinforcement learning (RL) prob… ▽ More Diffusion models excel at modeling complex data distributions, including those of images, proteins, and small molecules. However, in many cases, our goal is to model parts of the distribution that maximize certain properties: for example, we may want to generate images with high aesthetic quality, or molecules with high bioactivity. It is natural to frame this as a reinforcement learning (RL) problem, in which the objective is to fine-tune a diffusion model to maximize a reward function that corresponds to some property. Even with access to online queries of the ground-truth reward function, efficiently discovering high-reward samples can be challenging: they might have a low probability in the initial distribution, and there might be many infeasible samples that do not even have a well-defined reward (e.g., unnatural images or physically impossible molecules). In this work, we propose a novel reinforcement learning procedure that efficiently explores on the manifold of feasible samples. We present a theoretical analysis providing a regret guarantee, as well as empirical validation across three domains: images, biological sequences, and molecules. △ Less

Submitted 27 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: Under review (codes will be released soon)

arXiv:2402.06079 [pdf, other]

DiscDiff: Latent Diffusion Model for DNA Sequence Generation

Authors: Zehui Li, Yuhao Ni, William A V Beardall, Guoxuan Xia, Akashaditya Das, Guy-Bart Stan, Yiren Zhao

Abstract: This paper introduces a novel framework for DNA sequence generation, comprising two key components: DiscDiff, a Latent Diffusion Model (LDM) tailored for generating discrete DNA sequences, and Absorb-Escape, a post-training algorithm designed to refine these sequences. Absorb-Escape enhances the realism of the generated sequences by correcting `round errors' inherent in the conversion process betw… ▽ More This paper introduces a novel framework for DNA sequence generation, comprising two key components: DiscDiff, a Latent Diffusion Model (LDM) tailored for generating discrete DNA sequences, and Absorb-Escape, a post-training algorithm designed to refine these sequences. Absorb-Escape enhances the realism of the generated sequences by correcting `round errors' inherent in the conversion process between latent and input spaces. Our approach not only sets new standards in DNA sequence generation but also demonstrates superior performance over existing diffusion models, in generating both short and long DNA sequences. Additionally, we introduce EPD-GenDNA, the first comprehensive, multi-species dataset for DNA generation, encompassing 160,000 unique sequences from 15 species. We hope this study will advance the generative modelling of DNA, with potential implications for gene therapy and protein production. △ Less

Submitted 17 April, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: Different from the prior work "Latent Diffusion Model for DNA Sequence Generation" (arXiv:2310.06150), we updated the evaluation framework and compared the DiscDiff with other methods comprehensively. In addition, a post-training framework is proposed to increase the quality of generated sequences

arXiv:2401.07657 [pdf, other]

Empirical Evidence for the Fragment level Understanding on Drug Molecular Structure of LLMs

Authors: Xiuyuan Hu, Guoqing Liu, Yang Zhao, Hao Zhang

Abstract: AI for drug discovery has been a research hotspot in recent years, and SMILES-based language models has been increasingly applied in drug molecular design. However, no work has explored whether and how language models understand the chemical spatial structure from 1D sequences. In this work, we pre-train a transformer model on chemical language and fine-tune it toward drug design objectives, and i… ▽ More AI for drug discovery has been a research hotspot in recent years, and SMILES-based language models has been increasingly applied in drug molecular design. However, no work has explored whether and how language models understand the chemical spatial structure from 1D sequences. In this work, we pre-train a transformer model on chemical language and fine-tune it toward drug design objectives, and investigate the correspondence between high-frequency SMILES substrings and molecular fragments. The results indicate that language models can understand chemical structures from the perspective of molecular fragments, and the structural knowledge learned through fine-tuning is reflected in the high-frequency SMILES substrings generated by the model. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: Accepted by AAAI 2024 workshop: Large Language Models for Biological Discoveries (LLMs4Bio)

arXiv:2401.06155 [pdf, other]

De novo Drug Design using Reinforcement Learning with Multiple GPT Agents

Authors: Xiuyuan Hu, Guoqing Liu, Yang Zhao, Hao Zhang

Abstract: De novo drug design is a pivotal issue in pharmacology and a new area of focus in AI for science research. A central challenge in this field is to generate molecules with specific properties while also producing a wide range of diverse candidates. Although advanced technologies such as transformer models and reinforcement learning have been applied in drug design, their potential has not been full… ▽ More De novo drug design is a pivotal issue in pharmacology and a new area of focus in AI for science research. A central challenge in this field is to generate molecules with specific properties while also producing a wide range of diverse candidates. Although advanced technologies such as transformer models and reinforcement learning have been applied in drug design, their potential has not been fully realized. Therefore, we propose MolRL-MGPT, a reinforcement learning algorithm with multiple GPT agents for drug molecular generation. To promote molecular diversity, we encourage the agents to collaborate in searching for desirable molecules in diverse directions. Our algorithm has shown promising results on the GuacaMol benchmark and exhibits efficacy in designing inhibitors against SARS-CoV-2 protein targets. The codes are available at: https://github.com/HXYfighter/MolRL-MGPT. △ Less

Submitted 21 December, 2023; originally announced January 2024.

Comments: Accepted by NeurIPS 2023

arXiv:2401.03004 [pdf]

SAPNet: a deep learning model for identification of single-molecule peptide post-translational modifications with surface enhanced Raman spectroscopy

Authors: Mulusew W. Yaltaye, Yingqi Zhao, Eva Bozo, Pei-Lin Xin, Vahid Farrah, Francesco De Angelis, Jian-An Huang

Abstract: Nanopore resistive pulse sensors are emerging technologies for single-molecule protein sequencing. But they can hardly detect small post-translational modifications (PTMs) such as hydroxylation in single-molecule level. While a combination of surface enhanced Raman spectroscopy (SERS) with plasmonic nanopores can detect the small PTMs, the blinking Raman peaks in the single-molecule SERS spectra l… ▽ More Nanopore resistive pulse sensors are emerging technologies for single-molecule protein sequencing. But they can hardly detect small post-translational modifications (PTMs) such as hydroxylation in single-molecule level. While a combination of surface enhanced Raman spectroscopy (SERS) with plasmonic nanopores can detect the small PTMs, the blinking Raman peaks in the single-molecule SERS spectra leads to a big challenge in data analysis and PTM identification. Herein, we developed and validated a one-dimensional convolutional neural network (1D-CNN) for amino acids and peptides identification from their PTMs including hydroxylation and phosphorylation by their single-molecule SERS spectra, named Single Amino acid and Peptide Network (SAPNet). Our work combines cutting-edge plasmonic nanopore technology for SERS signal acquisition and deep learning for fully automated extraction of information from the SERS signals. The SAPNet model achieved an overall accuracy of 99.66% for the identification of amino acids from their modification, and 98.38% for the identification of peptides from their PTM translation. We also evaluated the model with out-of-sample examples with good performance. Our work can be beneficial for early detection of diseases such as cancers and Alzheimer's disease. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: 20 pages, 5 figures, 2 tables

arXiv:2312.14249 [pdf, other]

GenoCraft: A Comprehensive, User-Friendly Web-Based Platform for High-Throughput Omics Data Analysis and Visualization

Authors: Yingzhou Lu, Minjie Shen, Yue Zhao, Chenhao Li, Fan Meng, Xiao Wang, David Herrington, Yue Wang, Tim Fu, Capucine Van Rechem

Abstract: The surge in high-throughput omics data has reshaped the landscape of biological research, underlining the need for powerful, user-friendly data analysis and interpretation tools. This paper presents GenoCraft, a web-based comprehensive software solution designed to handle the entire pipeline of omics data processing. GenoCraft offers a unified platform featuring advanced bioinformatics tools, cov… ▽ More The surge in high-throughput omics data has reshaped the landscape of biological research, underlining the need for powerful, user-friendly data analysis and interpretation tools. This paper presents GenoCraft, a web-based comprehensive software solution designed to handle the entire pipeline of omics data processing. GenoCraft offers a unified platform featuring advanced bioinformatics tools, covering all aspects of omics data analysis. It encompasses a range of functionalities, such as normalization, quality control, differential analysis, network analysis, pathway analysis, and diverse visualization techniques. This software makes state-of-the-art omics data analysis more accessible to a wider range of users. With GenoCraft, researchers and data scientists have access to an array of cutting-edge bioinformatics tools under a user-friendly interface, making it a valuable resource for managing and analyzing large-scale omics data. The API with an interactive web interface is publicly available at https://genocraft.stanford. edu/. We also release all the codes in https://github.com/futianfan/GenoCraft. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2312.02203 [pdf, other]

Learning High-Order Relationships of Brain Regions

Authors: Weikang Qiu, Huangrui Chu, Selena Wang, Haolan Zuo, Xiaoxiao Li, Yize Zhao, Rex Ying

Abstract: Discovering reliable and informative relationships among brain regions from functional magnetic resonance imaging (fMRI) signals is essential in phenotypic predictions. Most of the current methods fail to accurately characterize those interactions because they only focus on pairwise connections and overlook the high-order relationships of brain regions. We propose that these high-order relationshi… ▽ More Discovering reliable and informative relationships among brain regions from functional magnetic resonance imaging (fMRI) signals is essential in phenotypic predictions. Most of the current methods fail to accurately characterize those interactions because they only focus on pairwise connections and overlook the high-order relationships of brain regions. We propose that these high-order relationships should be maximally informative and minimally redundant (MIMR). However, identifying such high-order relationships is challenging and under-explored due to the exponential search space and the absence of a tractable objective. In response to this gap, we propose a novel method named HYBRID which aims to extract MIMR high-order relationships from fMRI data. HYBRID employs a CONSTRUCTOR to identify hyperedge structures, and a WEIGHTER to compute a weight for each hyperedge, which avoids searching in exponential space. HYBRID achieves the MIMR objective through an innovative information bottleneck framework named multi-head drop-bottleneck with theoretical guarantees. Our comprehensive experiments demonstrate the effectiveness of our model. Our model outperforms the state-of-the-art predictive model by an average of 11.2%, regarding the quality of hyperedges measured by CPM, a standard protocol for studying brain connections. △ Less

Submitted 8 June, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

Comments: Accepted at ICML 2024, Camera Ready Version

arXiv:2311.00652 [pdf, other]

The physical origin of aneurysm growth, dissection, and rupture

Authors: Tom Y. Zhao, **-Tae Kim, Min Cho, Akhil Narang, John A. Rogers, Neelesh A. Patankar

Abstract: Rupture of aortic aneurysms is by far the most fatal heart disease, with a mortality rate exceeding 80%. There are no reliable clinical protocols to predict growth, dissection, and rupture because the fundamental physics driving aneurysm progression is unknown. Here, via in-vitro experiments, we show that a blood-wall, fluttering instability manifests in synthetic arteries under pulsatile forcing.… ▽ More Rupture of aortic aneurysms is by far the most fatal heart disease, with a mortality rate exceeding 80%. There are no reliable clinical protocols to predict growth, dissection, and rupture because the fundamental physics driving aneurysm progression is unknown. Here, via in-vitro experiments, we show that a blood-wall, fluttering instability manifests in synthetic arteries under pulsatile forcing. We establish a phase space to prove that the transition from stable flow to unstable aortic flutter is accurately predicted by a flutter instability parameter derived from first principles. Time resolved strain maps of the evolving system reveal the dynamical characteristics of aortic flutter that drive aneurysm progression. We show that low level instability can trigger permanent aortic growth, even in the absence of material remodeling. Sufficiently large flutter beyond a secondary threshold localizes strain in the walls to the length scale clinically observed in aortic dissection. Lastly, significant physical flutter beyond a tertiary threshold can ultimately induce aneurysm rupture via failure modes reported from necropsy. Resolving the fundamental physics of aneurysm progression directly leads to clinical protocols that forecast growth as well as intercept dissection and rupture by pinpointing their physical origin. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.02546 [pdf, other]

Joint Design of Protein Sequence and Structure based on Motifs

Authors: Zhenqiao Song, Yunlong Zhao, Yufei Song, Wenxian Shi, Yang Yang, Lei Li

Abstract: Designing novel proteins with desired functions is crucial in biology and chemistry. However, most existing work focus on protein sequence design, leaving protein sequence and structure co-design underexplored. In this paper, we propose GeoPro, a method to design protein backbone structure and sequence jointly. Our motivation is that protein sequence and its backbone structure constrain each other… ▽ More Designing novel proteins with desired functions is crucial in biology and chemistry. However, most existing work focus on protein sequence design, leaving protein sequence and structure co-design underexplored. In this paper, we propose GeoPro, a method to design protein backbone structure and sequence jointly. Our motivation is that protein sequence and its backbone structure constrain each other, and thus joint design of both can not only avoid nonfolding and misfolding but also produce more diverse candidates with desired functions. To this end, GeoPro is powered by an equivariant encoder for three-dimensional (3D) backbone structure and a protein sequence decoder guided by 3D geometry. Experimental results on two biologically significant metalloprotein datasets, including $β$-lactamases and myoglobins, show that our proposed GeoPro outperforms several strong baselines on most metrics. Remarkably, our method discovers novel $β$-lactamases and myoglobins which are not present in protein data bank (PDB) and UniProt. These proteins exhibit stable folding and active site environments reminiscent of those of natural proteins, demonstrating their excellent potential to be biologically functional. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2309.05863 [pdf, other]

The bionic neural network for external simulation of human locomotor system

Authors: Yue Shi, Shuhao Ma, Yihui Zhao

Abstract: Muscle forces and joint kinematics estimated with musculoskeletal (MSK) modeling techniques offer useful metrics describing movement quality. Model-based computational MSK models can interpret the dynamic interaction between the neural drive to muscles, muscle dynamics, body and joint kinematics, and kinetics. Still, such a set of solutions suffers from high computational time and muscle recruitme… ▽ More Muscle forces and joint kinematics estimated with musculoskeletal (MSK) modeling techniques offer useful metrics describing movement quality. Model-based computational MSK models can interpret the dynamic interaction between the neural drive to muscles, muscle dynamics, body and joint kinematics, and kinetics. Still, such a set of solutions suffers from high computational time and muscle recruitment problems, especially in complex modeling. In recent years, data-driven methods have emerged as a promising alternative due to the benefits of flexibility and adaptability. However, a large amount of labeled training data is not easy to be acquired. This paper proposes a physics-informed deep learning method based on MSK modeling to predict joint motion and muscle forces. The MSK model is embedded into the neural network as an ordinary differential equation (ODE) loss function with physiological parameters of muscle activation dynamics and muscle contraction dynamics to be identified. These parameters are automatically estimated during the training process which guides the prediction of muscle forces combined with the MSK forward dynamics model. Experimental validations on two groups of data, including one benchmark dataset and one self-collected dataset from six healthy subjects, are performed. The results demonstrate that the proposed deep learning method can effectively identify subject-specific MSK physiological parameters and the trained physics-informed forward-dynamics surrogate yields accurate motion and muscle forces predictions. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: 10

arXiv:2308.11978 [pdf, other]

Will More Expressive Graph Neural Networks do Better on Generative Tasks?

Authors: Xiandong Zou, Xiangyu Zhao, Pietro Liò, Yiren Zhao

Abstract: Graph generation poses a significant challenge as it involves predicting a complete graph with multiple nodes and edges based on simply a given label. This task also carries fundamental importance to numerous real-world applications, including de-novo drug and molecular design. In recent years, several successful methods have emerged in the field of graph generation. However, these approaches suff… ▽ More Graph generation poses a significant challenge as it involves predicting a complete graph with multiple nodes and edges based on simply a given label. This task also carries fundamental importance to numerous real-world applications, including de-novo drug and molecular design. In recent years, several successful methods have emerged in the field of graph generation. However, these approaches suffer from two significant shortcomings: (1) the underlying Graph Neural Network (GNN) architectures used in these methods are often underexplored; and (2) these methods are often evaluated on only a limited number of metrics. To fill this gap, we investigate the expressiveness of GNNs under the context of the molecular graph generation task, by replacing the underlying GNNs of graph generative models with more expressive GNNs. Specifically, we analyse the performance of six GNNs in two different generative frameworks -- autoregressive generation models, such as GCPN and GraphAF, and one-shot generation models, such as GraphEBM -- on six different molecular generative objectives on the ZINC-250k dataset. Through our extensive experiments, we demonstrate that advanced GNNs can indeed improve the performance of GCPN, GraphAF, and GraphEBM on molecular generation tasks, but GNN expressiveness is not a necessary condition for a good GNN-based generative model. Moreover, we show that GCPN and GraphAF with advanced GNNs can achieve state-of-the-art results across 17 other non-GNN-based graph generative approaches, such as variational autoencoders and Bayesian optimisation models, on the proposed molecular generative objectives (DRD2, Median1, Median2), which are important metrics for de-novo molecular design. △ Less

Submitted 20 February, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

Comments: 2nd Learning on Graphs Conference (LoG 2023). 26 pages, 5 figures, 11 tables

arXiv:2308.11846 [pdf, other]

A Data-Driven Approach to Morphogenesis under Structural Instability

Authors: Yingjie Zhao, Zhi** Xu

Abstract: Morphological development into evolutionary patterns under structural instability is ubiquitous in living systems and often of vital importance for engineering structures. Here we propose a data-driven approach to understand and predict their spatiotemporal complexities. A machine-learning framework is proposed based on the physical modeling of morphogenesis triggered by internal or external forci… ▽ More Morphological development into evolutionary patterns under structural instability is ubiquitous in living systems and often of vital importance for engineering structures. Here we propose a data-driven approach to understand and predict their spatiotemporal complexities. A machine-learning framework is proposed based on the physical modeling of morphogenesis triggered by internal or external forcing. Digital libraries of structural patterns are constructed from the simulation data, which are then used to recognize the abnormalities, predict their development, and assist in risk assessment and prognosis. The capabilities to identify the key bifurcation characteristics and predict the history-dependent development from the global and local features are demonstrated by examples of brain growth and aerospace structural design, which offer guidelines for disease diagnosis/prognosis and instability-tolerant design. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.06219 [pdf]

Acoustofluidic Engineering Functional Vessel-on-a-Chip

Authors: Yue Wu, Yuwen Zhao, Khayrul Islam, Yuyuan Zhou, Saeed Omidi, Yevgeny Berdichevsky, Yaling Liu

Abstract: Construction of in vitro vascular models is of great significance to various biomedical research, such as pharmacokinetics and hemodynamics, thus is an important direction in tissue engineering. In this work, a standing surface acoustic wave field was constructed to spatially arrange suspended endothelial cells into a designated patterning. The cell patterning was maintained after the acoustic fie… ▽ More Construction of in vitro vascular models is of great significance to various biomedical research, such as pharmacokinetics and hemodynamics, thus is an important direction in tissue engineering. In this work, a standing surface acoustic wave field was constructed to spatially arrange suspended endothelial cells into a designated patterning. The cell patterning was maintained after the acoustic field was withdrawn by the solidified hydrogel. Then, interstitial flow was provided to activate vessel tube formation. Thus, a functional vessel-on-a-chip was engineered with specific vessel geometry. Vascular function, including perfusability and vascular barrier function, was characterized by beads loading and dextran diffusion, respectively. A computational atomistic simulation model was proposed to illustrate how solutes cross vascular lipid bilayer. The reported acoustofluidic methodology is capable of facile and reproducible fabrication of functional vessel network with specific geometry. It is promising to facilitate the development of both fundamental research and regenerative therapy. △ Less

Submitted 17 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

arXiv:2307.05628 [pdf, other]

DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks

Authors: Daoan Zhang, Weitong Zhang, Yu Zhao, Jianguo Zhang, Bing He, Chenchen Qin, Jianhua Yao

Abstract: Pre-trained large language models demonstrate potential in extracting information from DNA sequences, yet adapting to a variety of tasks and data modalities remains a challenge. To address this, we propose DNAGPT, a generalized DNA pre-training model trained on over 200 billion base pairs from all mammals. By enhancing the classic GPT model with a binary classification task (DNA sequence order), a… ▽ More Pre-trained large language models demonstrate potential in extracting information from DNA sequences, yet adapting to a variety of tasks and data modalities remains a challenge. To address this, we propose DNAGPT, a generalized DNA pre-training model trained on over 200 billion base pairs from all mammals. By enhancing the classic GPT model with a binary classification task (DNA sequence order), a numerical regression task (guanine-cytosine content prediction), and a comprehensive token language, DNAGPT can handle versatile DNA analysis tasks while processing both sequence and numerical data. Our evaluation of genomic signal and region recognition, mRNA abundance regression, and artificial genomes generation tasks demonstrates DNAGPT's superior performance compared to existing models designed for specific downstream tasks, benefiting from pre-training using the newly designed model structure. △ Less

Submitted 30 August, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

arXiv:2306.09391 [pdf, other]

Multi-omics Prediction from High-content Cellular Imaging with Deep Learning

Authors: Rahil Mehrizi, Arash Mehrjou, Maryana Alegro, Yi Zhao, Benedetta Carbone, Carl Fishwick, Johanna Vappiani, **g Bi, Siobhan Sanford, Hakan Keles, Marcus Bantscheff, Cuong Nguyen, Patrick Schwab

Abstract: High-content cellular imaging, transcriptomics, and proteomics data provide rich and complementary views on the molecular layers of biology that influence cellular states and function. However, the biological determinants through which changes in multi-omics measurements influence cellular morphology have not yet been systematically explored, and the degree to which cell imaging could potentially… ▽ More High-content cellular imaging, transcriptomics, and proteomics data provide rich and complementary views on the molecular layers of biology that influence cellular states and function. However, the biological determinants through which changes in multi-omics measurements influence cellular morphology have not yet been systematically explored, and the degree to which cell imaging could potentially enable the prediction of multi-omics directly from cell imaging data is therefore currently unclear. Here, we address the question of whether it is possible to predict bulk multi-omics measurements directly from cell images using Image2Omics - a deep learning approach that predicts multi-omics in a cell population directly from high-content images of cells stained with multiplexed fluorescent dyes. We perform an experimental evaluation in gene-edited macrophages derived from human induced pluripotent stem cells (hiPSC) under multiple stimulation conditions and demonstrate that Image2Omics achieves significantly better performance in predicting transcriptomics and proteomics measurements directly from cell images than predictions based on the mean observed training set abundance. We observed significant predictability of abundances for 4927 (18.72%; 95% CI: 6.52%, 35.52%) and 3521 (13.38%; 95% CI: 4.10%, 32.21%) transcripts out of 26137 in M1 and M2-stimulated macrophages respectively and for 422 (8.46%; 95% CI: 0.58%, 25.83%) and 697 (13.98%; 95% CI: 2.41%, 32.83%) proteins out of 4986 in M1 and M2-stimulated macrophages respectively. Our results show that some transcript and protein abundances are predictable from cell imaging and that cell imaging may potentially, in some settings and depending on the mechanisms of interest and desired performance threshold, even be a scalable and resource-efficient substitute for multi-omics measurements. △ Less

Submitted 21 May, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.05143 [pdf, other]

Genomic Interpreter: A Hierarchical Genomic Deep Neural Network with 1D Shifted Window Transformer

Authors: Zehui Li, Akashaditya Das, William A V Beardall, Yiren Zhao, Guy-Bart Stan

Abstract: Given the increasing volume and quality of genomics data, extracting new insights requires interpretable machine-learning models. This work presents Genomic Interpreter: a novel architecture for genomic assay prediction. This model outperforms the state-of-the-art models for genomic assay prediction tasks. Our model can identify hierarchical dependencies in genomic sites. This is achieved through… ▽ More Given the increasing volume and quality of genomics data, extracting new insights requires interpretable machine-learning models. This work presents Genomic Interpreter: a novel architecture for genomic assay prediction. This model outperforms the state-of-the-art models for genomic assay prediction tasks. Our model can identify hierarchical dependencies in genomic sites. This is achieved through the integration of 1D-Swin, a novel Transformer-based block designed by us for modelling long-range hierarchical data. Evaluated on a dataset containing 38,171 DNA segments of 17K base pairs, Genomic Interpreter demonstrates superior performance in chromatin accessibility and gene expression prediction and unmasks the underlying `syntax' of gene regulation. △ Less

Submitted 28 June, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: 40th International Conference on Machine Learning (ICML 2023) Workshop on Computational Biology (WCB)

arXiv:2306.01802 [pdf, other]

Linear Time GPs for Inferring Latent Trajectories from Neural Spike Trains

Authors: Matthew Dowling, Yuan Zhao, Il Memming Park

Abstract: Latent Gaussian process (GP) models are widely used in neuroscience to uncover hidden state evolutions from sequential observations, mainly in neural activity recordings. While latent GP models provide a principled and powerful solution in theory, the intractable posterior in non-conjugate settings necessitates approximate inference schemes, which may lack scalability. In this work, we propose cvH… ▽ More Latent Gaussian process (GP) models are widely used in neuroscience to uncover hidden state evolutions from sequential observations, mainly in neural activity recordings. While latent GP models provide a principled and powerful solution in theory, the intractable posterior in non-conjugate settings necessitates approximate inference schemes, which may lack scalability. In this work, we propose cvHM, a general inference framework for latent GP models leveraging Hida-Matérn kernels and conjugate computation variational inference (CVI). With cvHM, we are able to perform variational inference of latent neural trajectories with linear time complexity for arbitrary likelihoods. The reparameterization of stationary kernels using Hida-Matérn GPs helps us connect the latent variable models that encode prior assumptions through dynamical systems to those that encode trajectory assumptions through GPs. In contrast to previous work, we use bidirectional information filtering, leading to a more concise implementation. Furthermore, we employ the Whittle approximate likelihood to achieve highly efficient hyperparameter learning. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: Published at ICML 2023

arXiv:2305.11278 [pdf, other]

Real-Time Variational Method for Learning Neural Trajectory and its Dynamics

Authors: Matthew Dowling, Yuan Zhao, Il Memming Park

Abstract: Latent variable models have become instrumental in computational neuroscience for reasoning about neural computation. This has fostered the development of powerful offline algorithms for extracting latent neural trajectories from neural recordings. However, despite the potential of real time alternatives to give immediate feedback to experimentalists, and enhance experimental design, they have rec… ▽ More Latent variable models have become instrumental in computational neuroscience for reasoning about neural computation. This has fostered the development of powerful offline algorithms for extracting latent neural trajectories from neural recordings. However, despite the potential of real time alternatives to give immediate feedback to experimentalists, and enhance experimental design, they have received markedly less attention. In this work, we introduce the exponential family variational Kalman filter (eVKF), an online recursive Bayesian method aimed at inferring latent trajectories while simultaneously learning the dynamical system generating them. eVKF works for arbitrary likelihoods and utilizes the constant base measure exponential family to model the latent state stochasticity. We derive a closed-form variational analogue to the predict step of the Kalman filter which leads to a provably tighter bound on the ELBO compared to another online variational method. We validate our method on synthetic and real-world data, and, notably, show that it achieves competitive performance △ Less

Submitted 18 May, 2023; originally announced May 2023.

Comments: Published at ICLR 2023

arXiv:2304.01345 [pdf, other]

Establishing group-level brain structural connectivity incorporating anatomical knowledge under latent space modeling

Authors: Selena Wang, Yiting Wang, Frederick H. Xu, Li Shen, Yize Zhao

Abstract: Brain structural connectivity, capturing the white matter fiber tracts among brain regions inferred by diffusion MRI (dMRI), provides a unique characterization of brain anatomical organization. One fundamental question to address with structural connectivity is how to properly summarize and perform statistical inference for a group-level connectivity architecture, for instance, under different sex… ▽ More Brain structural connectivity, capturing the white matter fiber tracts among brain regions inferred by diffusion MRI (dMRI), provides a unique characterization of brain anatomical organization. One fundamental question to address with structural connectivity is how to properly summarize and perform statistical inference for a group-level connectivity architecture, for instance, under different sex groups, or disease cohorts. Existing analyses commonly summarize group-level brain connectivity by a simple entry-wise sample mean or median across individual brain connectivity matrices. However, such a heuristic approach fully ignores the associations among structural connections and the topological properties of brain networks. In this project, we propose a latent space-based generative network model to estimate group-level brain connectivity. We name our method the attributes-informed brain connectivity (ABC) model, which compared with existing group-level connectivity estimations, (1) offers an interpretable latent space representation of the group-level connectivity, (2) incorporates the anatomical knowledge of nodes and tests its co-varying relationship with connectivity and (3) quantifies the uncertainty and evaluates the likelihood of the estimated group-level effects against chance. We devise a novel Bayesian MCMC algorithm to estimate the model. By applying the ABC model to study brain structural connectivity stratified by sex among Alzheimer's Disease (AD) subjects and healthy controls incorporating the anatomical attributes (volume, thickness and area) on nodes, our method shows superior predictive power on out-of-sample structural connectivity and identifies meaningful sex-specific network neuromarkers for AD. △ Less

Submitted 21 February, 2023; originally announced April 2023.

arXiv:2303.12259 [pdf, other]

Brain-inspired bodily self-perception model for robot rubber hand illusion

Authors: Yuxuan Zhao, Enmeng Lu, Yi Zeng

Abstract: At the core of bodily self-consciousness is the perception of the ownership of one's body. Recent efforts to gain a deeper understanding of the mechanisms behind the brain's encoding of the self-body have led to various attempts to develop a unified theoretical framework to explain related behavioral and neurophysiological phenomena. A central question to be explained is how body illusions such as… ▽ More At the core of bodily self-consciousness is the perception of the ownership of one's body. Recent efforts to gain a deeper understanding of the mechanisms behind the brain's encoding of the self-body have led to various attempts to develop a unified theoretical framework to explain related behavioral and neurophysiological phenomena. A central question to be explained is how body illusions such as the rubber hand illusion actually occur. Despite the conceptual descriptions of the mechanisms of bodily self-consciousness and the possible relevant brain areas, the existing theoretical models still lack an explanation of the computational mechanisms by which the brain encodes the perception of one's body and how our subjectively perceived body illusions can be generated by neural networks. Here we integrate the biological findings of bodily self-consciousness to propose a Brain-inspired bodily self-perception model, by which perceptions of bodily self can be autonomously constructed without any supervision signals. We successfully validated our computational model with six rubber hand illusion experiments and a disability experiment on platforms including a iCub humanoid robot and simulated environments. The experimental results show that our model can not only well replicate the behavioral and neural data of monkeys in biological experiments, but also reasonably explain the causes and results of the rubber hand illusion from the neuronal level due to advantages in biological interpretability, thus contributing to the revealing of the computational and neural mechanisms underlying the occurrence of the rubber hand illusion. △ Less

Submitted 26 April, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

Comments: 34 pages, 11 figures and 1 table

arXiv:2302.06120 [pdf, other]

Knowledge from Large-Scale Protein Contact Prediction Models Can Be Transferred to the Data-Scarce RNA Contact Prediction Task

Authors: Yiren Jian, Chongyang Gao, Chen Zeng, Yunjie Zhao, Soroush Vosoughi

Abstract: RNA, whose functionality is largely determined by its structure, plays an important role in many biological activities. The prediction of pairwise structural proximity between each nucleotide of an RNA sequence can characterize the structural information of the RNA. Historically, this problem has been tackled by machine learning models using expert-engineered features and trained on scarce labeled… ▽ More RNA, whose functionality is largely determined by its structure, plays an important role in many biological activities. The prediction of pairwise structural proximity between each nucleotide of an RNA sequence can characterize the structural information of the RNA. Historically, this problem has been tackled by machine learning models using expert-engineered features and trained on scarce labeled datasets. Here, we find that the knowledge learned by a protein-coevolution Transformer-based deep neural network can be transferred to the RNA contact prediction task. As protein datasets are orders of magnitude larger than those for RNA contact prediction, our findings and the subsequent framework greatly reduce the data scarcity bottleneck. Experiments confirm that RNA contact prediction through transfer learning using a publicly available protein model is greatly improved. Our findings indicate that the learned structural patterns of proteins can be transferred to RNAs, opening up potential new avenues for research. △ Less

Submitted 18 January, 2024; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: The code is available at https://github.com/yiren-jian/CoT-RNA-Transfer

arXiv:2302.00887 [pdf]

doi 10.1038/s41586-021-03936-y

Kainate receptor modulation by NETO2

Authors: Lingli He, Jiahui Sun, Yiwei Gao, Bin Li, Yuhang Wang, Yanli Dong, Weidong An, Hang Li, Bei Yang, Yuhan Ge, Xuejun Cai Zhang, Yun Stone Shi, Yan Zhao

Abstract: Glutamate-gated kainate receptors (KARs) are ubiquitous in the central nervous system of vertebrates, mediate synaptic transmission on post-synapse, and modulate transmitter release on pre-synapse. In the brain, the trafficking, gating kinetics, and pharmacology of KARs are tightly regulated by Neuropilin and tolloid-like proteins (Netos). Here we report cryo-EM structures of homo-tetrameric GluK2… ▽ More Glutamate-gated kainate receptors (KARs) are ubiquitous in the central nervous system of vertebrates, mediate synaptic transmission on post-synapse, and modulate transmitter release on pre-synapse. In the brain, the trafficking, gating kinetics, and pharmacology of KARs are tightly regulated by Neuropilin and tolloid-like proteins (Netos). Here we report cryo-EM structures of homo-tetrameric GluK2 in complex with Neto2 at inhibited and desensitized states, illustrating variable stoichiometry of GluK2-Neto2 complexes, with one or two Neto2 subunits associate with the GluK2. We find that Neto2 accesses only two broad faces of KARs, intermolecularly crosslinking the lower-lobe of ATDA/C, upper-lobe of LBDB/D, and lower-lobe of LBDA/C, illustrating how Neto2 regulates receptor-gating kinetics. The transmembrane helix of Neto2 is positioned proximal to the selectivity filter and competes with the amphiphilic H1-helix after M4 for interacting with an ICD formed by the M1-M2 linkers of the receptor, revealing how rectification is regulated by Neto2. △ Less

Submitted 2 February, 2023; originally announced February 2023.

Journal ref: Nature, 599(7884), 325-329 (2021)

arXiv:2301.08391 [pdf]

Brain Model State Space Reconstruction Using an LSTM Neural Network

Authors: Yueyang Liu, Artemio Soto-Breceda, Yun Zhao, Phillipa Karoly, Mark J. Cook, David B. Grayden, Daniel Schmidt, Levin Kuhlmann1

Abstract: Objective Kalman filtering has previously been applied to track neural model states and parameters, particularly at the scale relevant to EEG. However, this approach lacks a reliable method to determine the initial filter conditions and assumes that the distribution of states remains Gaussian. This study presents an alternative, data-driven method to track the states and parameters of neural mas… ▽ More Objective Kalman filtering has previously been applied to track neural model states and parameters, particularly at the scale relevant to EEG. However, this approach lacks a reliable method to determine the initial filter conditions and assumes that the distribution of states remains Gaussian. This study presents an alternative, data-driven method to track the states and parameters of neural mass models (NMMs) from EEG recordings using deep learning techniques, specifically an LSTM neural network. Approach An LSTM filter was trained on simulated EEG data generated by a neural mass model using a wide range of parameters. With an appropriately customised loss function, the LSTM filter can learn the behaviour of NMMs. As a result, it can output the state vector and parameters of NMMs given observation data as the input. Main Results Test results using simulated data yielded correlations with R squared of around 0.99 and verified that the method is robust to noise and can be more accurate than a nonlinear Kalman filter when the initial conditions of the Kalman filter are not accurate. As an example of real-world application, the LSTM filter was also applied to real EEG data that included epileptic seizures, and revealed changes in connectivity strength parameters at the beginnings of seizures. Significance Tracking the state vector and parameters of mathematical brain models is of great importance in the area of brain modelling, monitoring, imaging and control. This approach has no need to specify the initial state vector and parameters, which is very difficult to do in practice because many of the variables being estimated cannot be measured directly in physiological experiments. This method may be applied using any neural mass model and, therefore, provides a general, novel, efficient approach to estimate brain model variables that are often difficult to measure. △ Less

Submitted 19 January, 2023; originally announced January 2023.

arXiv:2210.13323 [pdf, other]

A Comparative Study of Compartmental Models for COVID-19 Transmission in Ontario, Canada

Authors: Yuxuan Zhao, Samuel W. K. Wong

Abstract: The number of confirmed COVID-19 cases reached over 1.3 million in Ontario, Canada by June 4, 2022. The continued spread of the virus underlying COVID-19 has been spurred by the emergence of variants since the initial outbreak in December, 2019. Much attention has thus been devoted to tracking and modelling the transmission of COVID-19. Compartmental models are commonly used to mimic epidemic tran… ▽ More The number of confirmed COVID-19 cases reached over 1.3 million in Ontario, Canada by June 4, 2022. The continued spread of the virus underlying COVID-19 has been spurred by the emergence of variants since the initial outbreak in December, 2019. Much attention has thus been devoted to tracking and modelling the transmission of COVID-19. Compartmental models are commonly used to mimic epidemic transmission mechanisms and are easy to understand. Their performance in real-world settings, however, needs to be more thoroughly assessed. In this comparative study, we examine five compartmental models -- four existing ones and an extended model that we propose -- and analyze their ability to describe COVID-19 transmission in Ontario from January 2022 to June 2022. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: 26 pages, 8 figures

arXiv:2205.08720 [pdf, ps, other]

Pattern formation of parasite-host model induced by fear effect

Authors: Yong Ye, Yi Zhao, Jiaying Zhou

Abstract: In this paper, based on the epidemiological microparasite model, a parasite-host model is established by considering the fear effect of susceptible individuals on infectors. We explored the pattern formation with the help of numerical simulation, and analyzed the effects of fear effect, infected host mortality, population diffusion rate and reducing reproduction ability of infected hosts on popula… ▽ More In this paper, based on the epidemiological microparasite model, a parasite-host model is established by considering the fear effect of susceptible individuals on infectors. We explored the pattern formation with the help of numerical simulation, and analyzed the effects of fear effect, infected host mortality, population diffusion rate and reducing reproduction ability of infected hosts on population activities in different degrees. Theoretically, we give the general conditions for the stability of the model under non-diffusion and considering the Turing instability caused by diffusion. Our results indicate how fear affects the distribution of the uninfected and infected hosts in the habitat and quantify the influence of the fear factor on the spatiotemporal pattern of the population. In addition, we analyze the influence of natural death rate, reproduction ability of infected hosts, and diffusion level of uninfected (infected) hosts on the spatiotemporal pattern, respectively. The results present that the growth of pattern induced by intensified fear effect follows the certain rule: cold spots $\rightarrow$ cold spots-stripes $\rightarrow$ cold stripes $\rightarrow$ hot stripes $\rightarrow$ hot spots-stripes $\rightarrow$ hot spots. Interestingly, the natural mortality and fear effect take the opposite effect on the growth order of the pattern. From the perspective of biological significance, we find that the degree of fear effect can reshape the distribution of population to meet the previous rule. △ Less

Submitted 18 May, 2022; originally announced May 2022.

Comments: 28 pages, 11 figures

MSC Class: 92-XX

arXiv:2204.06159 [pdf]

Systematic conformation-to-phenotype map** via limited deep-sequencing of proteins

Authors: Eugene Serebryany, Victor Y. Zhao, Kibum Park, Amir Bitran, Sunia A. Trauger, Bogdan Budnik, Eugene I. Shakhnovich

Abstract: Non-native conformations drive protein misfolding diseases, complicate bioengineering efforts, and fuel molecular evolution. No current experimental technique is well-suited for elucidating them and their phenotypic effects. Especially intractable are the transient conformations populated by intrinsically disordered proteins. We describe an approach to systematically discover, stabilize, and purif… ▽ More Non-native conformations drive protein misfolding diseases, complicate bioengineering efforts, and fuel molecular evolution. No current experimental technique is well-suited for elucidating them and their phenotypic effects. Especially intractable are the transient conformations populated by intrinsically disordered proteins. We describe an approach to systematically discover, stabilize, and purify native and non-native conformations, generated in vitro or in vivo, and directly link conformations to molecular, organismal, or evolutionary phenotypes. This approach involves high-throughput disulfide scanning (HTDS) of the entire protein. To reveal which disulfides trap which chromatographically resolvable conformers, we devised a deep-sequencing method for double-Cys variant libraries of proteins that precisely and simultaneously locates both Cys residues within each polypeptide. HTDS of the abundant E. coli periplasmic chaperone HdeA revealed distinct classes of disordered hydrophobic conformers with variable cytotoxicity depending on where the backbone was cross-linked. HTDS can bridge conformational and phenotypic landscapes for many proteins that function in disulfide-permissive environments. △ Less

Submitted 29 January, 2023; v1 submitted 12 April, 2022; originally announced April 2022.

arXiv:2104.12955 [pdf]

Local vaccination and systemic tumor suppression via irradiation and manganese adjuvant in mice

Authors: Chunyang Lu, **g Qian, Jianfeng Lv, **tao Han, Xiaoyi Sun, Junyi Chen, Siwei Ding, Zhusong Mei, Yulan Liang, Yuqi Ma, Ye Zhao, Chen Lin, Yanying Zhao, Yixing Geng, Wenjun Ma, Yugang Wang, Xueqing Yan, Gen Yang

Abstract: Presently 4T-1 luc cells were irradiated with proton under ultra-high dose rate FLASH or with gamma-ray with conventional dose rate, and then subcutaneous vaccination with or without Mn immuno-enhancing adjuvant into the mice for three times. One week later, we injected untreated 4T-1 luc cells on the other side of the vaccinated mice, and found that the untreated 4T-1 luc cells injected later nea… ▽ More Presently 4T-1 luc cells were irradiated with proton under ultra-high dose rate FLASH or with gamma-ray with conventional dose rate, and then subcutaneous vaccination with or without Mn immuno-enhancing adjuvant into the mice for three times. One week later, we injected untreated 4T-1 luc cells on the other side of the vaccinated mice, and found that the untreated 4T-1 luc cells injected later nearly totally did not grow tumor (1/17) while controls without previous vaccination all grow tumors (18/18). The result is very interesting and the findings may help to explore in situ tumor vaccination as well as new combined radiotherapy strategies to effectively ablate primary and disseminated tumors. To our limited knowledge, this is the first paper reporting the high efficiency induction of systemic vaccination suppressing the metastasized/disseminated tumor progression. △ Less

Submitted 26 April, 2021; originally announced April 2021.

Comments: 16 pages, 3 figures and 1 table

arXiv:2103.15142 [pdf]

COSINE: A Web Server for Clonal and Subclonal Structure Inference and Evolution in Cancer Genomics

Authors: Xiguo Yuan, Yuan Zhao, Yang Guo, Linmei Ge, Wei Liu, Shiyu Wen, Qi Li, Zhangbo Wan, Peina Zheng, Tao Guo, Zhida Li, Martin Peifer, Yupeng Cun

Abstract: Cancers evolve from mutation of a single cell with sequential clonal and subclonal expansion of somatic mutation acquisition. Inferring clonal and subclonal structures from bulk or single cell tumor genomic sequencing data has a huge impact on cancer evolution studies. Clonal state and mutational order can provide detailed insight into tumor origin and its future development. In the past decade, a… ▽ More Cancers evolve from mutation of a single cell with sequential clonal and subclonal expansion of somatic mutation acquisition. Inferring clonal and subclonal structures from bulk or single cell tumor genomic sequencing data has a huge impact on cancer evolution studies. Clonal state and mutational order can provide detailed insight into tumor origin and its future development. In the past decade, a variety of methods have been developed for subclonal reconstruction using bulk tumor sequencing data. As these methods have been developed in different programming languages and using different input data formats, their use and comparison can be problematic. Therefore, we established a web server for clonal and subclonal structure inference and evolution of cancer genomic data (COSINE), which included 12 popular subclonal reconstruction methods. We decomposed each method via a detailed workflow of single processing steps with a user-friendly interface. To the best of our knowledge, this is the first web server providing online subclonal inference, including the most popular subclonal reconstruction methods. COSINE is freely accessible at www.clab-cosine.net or http://bio.rj.run:48996/cun-web. △ Less

Submitted 28 March, 2021; originally announced March 2021.

arXiv:2102.09548 [pdf, other]

Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development

Authors: Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik

Abstract: Therapeutics machine learning is an emerging field with incredible opportunities for innovatiaon and impact. However, advancement in this field requires formulation of meaningful learning tasks and careful curation of datasets. Here, we introduce Therapeutics Data Commons (TDC), the first unifying platform to systematically access and evaluate machine learning across the entire range of therapeuti… ▽ More Therapeutics machine learning is an emerging field with incredible opportunities for innovatiaon and impact. However, advancement in this field requires formulation of meaningful learning tasks and careful curation of datasets. Here, we introduce Therapeutics Data Commons (TDC), the first unifying platform to systematically access and evaluate machine learning across the entire range of therapeutics. To date, TDC includes 66 AI-ready datasets spread across 22 learning tasks and spanning the discovery and development of safe and effective medicines. TDC also provides an ecosystem of tools and community resources, including 33 data functions and types of meaningful data splits, 23 strategies for systematic model evaluation, 17 molecule generation oracles, and 29 public leaderboards. All resources are integrated and accessible via an open Python library. We carry out extensive experiments on selected datasets, demonstrating that even the strongest algorithms fall short of solving key therapeutics challenges, including real dataset distributional shifts, multi-scale modeling of heterogeneous data, and robust generalization to novel data points. We envision that TDC can facilitate algorithmic and scientific advances and considerably accelerate machine-learning model development, validation and transition into biomedical and clinical implementation. TDC is an open-science initiative available at https://tdcommons.ai. △ Less

Submitted 28 August, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: Published at NeurIPS 2021 Datasets and Benchmarks

arXiv:2101.05866 [pdf, ps, other]

Comparisons of Graph Neural Networks on Cancer Classification Leveraging a Joint of Phenotypic and Genetic Features

Authors: David Oniani, Chen Wang, Yiqing Zhao, Andrew Wen, Hongfang Liu, Feichen Shen

Abstract: Cancer is responsible for millions of deaths worldwide every year. Although significant progress hasbeen achieved in cancer medicine, many issues remain to be addressed for improving cancer therapy.Appropriate cancer patient stratification is the prerequisite for selecting appropriate treatment plan, ascancer patients are of known heterogeneous genetic make-ups and phenotypic differences. In thiss… ▽ More Cancer is responsible for millions of deaths worldwide every year. Although significant progress hasbeen achieved in cancer medicine, many issues remain to be addressed for improving cancer therapy.Appropriate cancer patient stratification is the prerequisite for selecting appropriate treatment plan, ascancer patients are of known heterogeneous genetic make-ups and phenotypic differences. In thisstudy, built upon deep phenotypic characterizations extractable from Mayo Clinic electronic healthrecords (EHRs) and genetic test reports for a collection of cancer patients, we evaluated variousgraph neural networks (GNNs) leveraging a joint of phenotypic and genetic features for cancer typeclassification. Models were applied and fine-tuned on the Mayo Clinic cancer disease dataset. Theassessment was done through the reported accuracy, precision, recall, and F1 values as well as throughF1 scores based on the disease class. Per our evaluation results, GNNs on average outperformed thebaseline models with mean statistics always being higher that those of the baseline models (0.849 vs0.772 for accuracy, 0.858 vs 0.794 for precision, 0.843 vs 0.759 for recall, and 0.843 vs 0.855 for F1score). Among GNNs, ChebNet, GraphSAGE, and TAGCN showed the best performance, while GATshowed the worst. We applied and compared eight GNN models including AGNN, ChebNet, GAT,GCN, GIN, GraphSAGE, SGC, and TAGCN on the Mayo Clinic cancer disease dataset and assessedtheir performance as well as compared them with each other and with more conventional machinelearning models such as decision tree, gradient boosting, multi-layer perceptron, naive bayes, andrandom forest which we used as the baselines. △ Less

Submitted 14 January, 2021; originally announced January 2021.

arXiv:2011.05595 [pdf]

Desires and Motivation: The Computational Rule, the Underlying Neural Circuitry, and the Relevant Clinical Disorders

Authors: Yu Liu, Yinghong Zhao, Mo Chen

Abstract: As organism is a dissipative system. The process from multi desires to exclusive motivation is of great importance among all sensory-action loops. In this paper we argued that a proper Desire-Motivation model should be a continuous dynamic map** from the dynamic desire vector to the sparse motivation vector. Meanwhile, it should at least have specific stability and adjustability of motivation in… ▽ More As organism is a dissipative system. The process from multi desires to exclusive motivation is of great importance among all sensory-action loops. In this paper we argued that a proper Desire-Motivation model should be a continuous dynamic map** from the dynamic desire vector to the sparse motivation vector. Meanwhile, it should at least have specific stability and adjustability of motivation intensity. Besides, the neuroscience evidences suggest that the Desire-Motivation model should have dynamic information acquisition and should be a recurrent neural network. A five-equation model is built based on the above arguments, namely the Recurrent Gating Desire-Motivation (RGDM) model. Additionally, a heuristic speculation based on the RGDM model about corresponding brain regions is carried out. It believes that the tonic and phasic firing of ventral tegmental area dopamine neurons should execute the respective and collective feedback functions of recurrent processing. The analysis about the RGMD model shows the expectations about individual personality from three dimensions, namely stability, intensity, and motivation decision speed. These three dimensions can be combined and create eight different personalities, which is correspondent to Jung's personality structure theorem. Furthermore, the RGDM model can be used to predict three different brand-new types of depressive disorder with different phenotypes. Moreover, it can also explain several other psychiatry disorders from new perspectives. △ Less

Submitted 11 November, 2020; originally announced November 2020.

arXiv:2008.06034 [pdf, other]

doi 10.1002/jcb.29909

Tetracycline as an inhibitor to the coronavirus SARS-CoV-2

Authors: Tom Y. Zhao, Neelesh A. Patankar

Abstract: The coronavirus SARS-CoV-2 remains an extant threat against public health on a global scale. Cell infection begins when the spike protein of SARS-CoV-2 binds with the cell receptor, angiotensin-converting enzyme 2 (ACE2). Here, we address the role of Tetracycline as an inhibitor for the receptor-binding domain (RBD) of the spike protein. Targeted molecular investigation show that Tetracycline bind… ▽ More The coronavirus SARS-CoV-2 remains an extant threat against public health on a global scale. Cell infection begins when the spike protein of SARS-CoV-2 binds with the cell receptor, angiotensin-converting enzyme 2 (ACE2). Here, we address the role of Tetracycline as an inhibitor for the receptor-binding domain (RBD) of the spike protein. Targeted molecular investigation show that Tetracycline binds more favorably to the RBD (-9.40 kcal/mol) compared to Chloroquine (-6.31 kcal/mol) or Doxycycline (-8.08 kcal/mol) and inhibits attachment to ACE2 to a greater degree (binding efficiency of 2.98 $\frac{\text{kcal}}{\text{mol}\cdot \text{nm}^2}$ for Tetracycline-RBD, 5.59 $\frac{\text{kcal}}{\text{mol}\cdot \text{nm}^2}$ for Chloroquine-RBD, 5.16 $\frac{\text{kcal}}{\text{mol}\cdot \text{nm}^2}$ for Doxycycline-RBD). Stronger Tetracycline inhibition is verified with nonequilibrium PMF calculations, for which the Tetracycline-RBD complex exhibits the lowest free energy profile along the dissociation pathway from ACE2. Tetracycline appears to target viral residues that are usually involved in significant hydrogen bonding with ACE2; this inhibition of cellular infection complements the anti-inflammatory and cytokine suppressing capability of Tetracycline, and may further reduce the duration of ICU stays and mechanical ventilation induced by the coronavirus SARS-CoV-2. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Journal ref: J Cell Biochem (2021) 1-8

arXiv:2007.02198 [pdf, other]

Scalable Bayesian Functional Connectivity Inference for Multi-Electrode Array Recordings

Authors: Yun Zhao, Richard Jiang, Zhenni Xu, Elmer Guzman, Paul K. Hansma, Linda Petzold

Abstract: Multi-electrode arrays (MEAs) can record extracellular action potentials (also known as 'spikes') from hundreds or thousands of neurons simultaneously. Inference of a functional network from a spike train is a fundamental and formidable computational task in neuroscience. With the advancement of MEA technology, it has become increasingly crucial to develop statistical tools for analyzing multiple… ▽ More Multi-electrode arrays (MEAs) can record extracellular action potentials (also known as 'spikes') from hundreds or thousands of neurons simultaneously. Inference of a functional network from a spike train is a fundamental and formidable computational task in neuroscience. With the advancement of MEA technology, it has become increasingly crucial to develop statistical tools for analyzing multiple neuronal activity as a network. In this paper, we propose a scalable Bayesian framework for inference of functional networks from MEA data. Our framework makes use of the hierarchical structure of networks of neurons. We split the large scale recordings into smaller local networks for network inference, which not only eases the computational burden from Bayesian sampling but also provides useful insights on regional connections in organoids and brains. We speed up the expensive Bayesian sampling process by using parallel computing. Experiments on both synthetic datasets and large-scale real-world MEA recordings show the effectiveness and efficiency of the scalable Bayesian framework. Inference of networks from controlled experiments exposing neural cultures to cadmium presents distinguishable results and further confirms the utility of our framework. △ Less

Submitted 4 July, 2020; originally announced July 2020.

Comments: in BIOKDD 2020

arXiv:2006.11843 [pdf]

Unsupervised Learning of Deep-Learned Features from Breast Cancer Images

Authors: Sanghoon Lee, Colton Farley, Simon Shim, Yanjun Zhao, Wook** Choi, Wook-Sung Yoo

Abstract: Detecting cancer manually in whole slide images requires significant time and effort on the laborious process. Recent advances in whole slide image analysis have stimulated the growth and development of machine learning-based approaches that improve the efficiency and effectiveness in the diagnosis of cancer diseases. In this paper, we propose an unsupervised learning approach for detecting cancer… ▽ More Detecting cancer manually in whole slide images requires significant time and effort on the laborious process. Recent advances in whole slide image analysis have stimulated the growth and development of machine learning-based approaches that improve the efficiency and effectiveness in the diagnosis of cancer diseases. In this paper, we propose an unsupervised learning approach for detecting cancer in breast invasive carcinoma (BRCA) whole slide images. The proposed method is fully automated and does not require human involvement during the unsupervised learning procedure. We demonstrate the effectiveness of the proposed approach for cancer detection in BRCA and show how the machine can choose the most appropriate clusters during the unsupervised learning procedure. Moreover, we present a prototype application that enables users to select relevant groups map** all regions related to the groups in whole slide images. △ Less

Submitted 21 June, 2020; originally announced June 2020.

Comments: 7 pages for IEEE BIBE

arXiv:1909.10137 [pdf]

Validation of image-guided cochlear implant programming techniques

Authors: Yiyuan Zhao, Jianing Wang, Rui Li, Robert F. Labadie, Benoit M. Dawant, Jack H. Noble

Abstract: Cochlear implants (CIs) are a standard treatment for patients who experience severe to profound hearing loss. Recent studies have shown that hearing outcome is correlated with intra-cochlear anatomy and electrode placement. Our group has developed image-guided CI programming (IGCIP) techniques that use image analysis methods to both segment the inner ear structures in pre- or post-implantation CT… ▽ More Cochlear implants (CIs) are a standard treatment for patients who experience severe to profound hearing loss. Recent studies have shown that hearing outcome is correlated with intra-cochlear anatomy and electrode placement. Our group has developed image-guided CI programming (IGCIP) techniques that use image analysis methods to both segment the inner ear structures in pre- or post-implantation CT images and localize the CI electrodes in post-implantation CT images. This permits to assist audiologists with CI programming by suggesting which among the contacts should be deactivated to reduce electrode interaction that is known to affect outcomes. Clinical studies have shown that IGCIP can improve hearing outcomes for CI recipients. However, the sensitivity of IGCIP with respect to the accuracy of the two major steps: electrode localization and intra-cochlear anatomy segmentation, is unknown. In this article, we create a ground truth dataset with conventional CT and micro-CT images of 35 temporal bone specimens to both rigorously characterize the accuracy of these two steps and assess how inaccuracies in these steps affect the overall results. Our study results show that when clinical pre- and post-implantation CTs are available, IGCIP produces results that are comparable to those obtained with the corresponding ground truth in 86.7% of the subjects tested. When only post-implantation CTs are available, this number is 83.3%. These results suggest that our current method is robust to errors in segmentation and localization but also that it can be improved upon. Keywords: cochlear implant, ground truth, segmentation, validation △ Less

Submitted 13 July, 2020; v1 submitted 22 September, 2019; originally announced September 2019.

Comments: 37 pages, 12 figures, 7 tables

arXiv:1908.04413 [pdf, other]

The Channel Attention based Context Encoder Network for Inner Limiting Membrane Detection

Authors: Hao Qiu, Zaiwang Gu, Lei Mou, Xiaoqian Mao, Liyang Fang, Yitian Zhao, Jiang Liu, Jun Cheng

Abstract: The optic disc segmentation is an important step for retinal image-based disease diagnosis such as glaucoma. The inner limiting membrane (ILM) is the first boundary in the OCT, which can help to extract the retinal pigment epithelium (RPE) through gradient edge information to locate the boundary of the optic disc. Thus, the ILM layer segmentation is of great importance for optic disc localization.… ▽ More The optic disc segmentation is an important step for retinal image-based disease diagnosis such as glaucoma. The inner limiting membrane (ILM) is the first boundary in the OCT, which can help to extract the retinal pigment epithelium (RPE) through gradient edge information to locate the boundary of the optic disc. Thus, the ILM layer segmentation is of great importance for optic disc localization. In this paper, we build a new optic disc centered dataset from 20 volunteers and manually annotated the ILM boundary in each OCT scan as ground-truth. We also propose a channel attention based context encoder network modified from the CE-Net to segment the optic disc. It mainly contains three phases: the encoder module, the channel attention based context encoder module, and the decoder module. Finally, we demonstrate that our proposed method achieves state-of-the-art disc segmentation performance on our dataset mentioned above. △ Less

Submitted 9 August, 2019; originally announced August 2019.

Comments: This paper has been accepted by the miccai workshop (OMIA-6)

arXiv:1908.04206 [pdf]

doi 10.1002/anie.202000489

SERS discrimination of single amino acid residue in single peptide by plasmonic nanocavities

Authors: Jian-An Huang, Mansoureh Z. Mousavi, Giorgia Giovannini, Yingqi Zhao, Aliaksandr Hubarevich, Denis Garoli, Francesco De Angelis

Abstract: Surface-enhanced Raman spectroscopy (SERS) is a sensitive label-free optical method that can provide fingerprint Raman spectra of biomolecules such as DNA, amino acids and proteins. While SERS of single DNA molecule has been recently demonstrated, Raman analysis of single protein sequence was not possible because the SERS spectra of proteins are usually dominated by signals of aromatic amino acid… ▽ More Surface-enhanced Raman spectroscopy (SERS) is a sensitive label-free optical method that can provide fingerprint Raman spectra of biomolecules such as DNA, amino acids and proteins. While SERS of single DNA molecule has been recently demonstrated, Raman analysis of single protein sequence was not possible because the SERS spectra of proteins are usually dominated by signals of aromatic amino acid residues. Here, we used electroplasmonic approach to trap single gold nanoparticle in a nanohole for generating a plasmonic nanocavity between the trapped nanoparticle and the nanopore wall. The giant field generated in the nanocavity was so sensitive and localized that it enables SERS discrimination of 10 distinct amino acids at single-molecule level. The obtained spectra are used to analyze the spectra of 2 biomarkers (Vasopressin and Oxytocin) made of a short sequence of 9 amino-acids. Significantly, we demonstrated identification of single non-aromatic amino acid residues in a single short peptide chain as well as discrimination between two peptides with sequences distinguishable in 2 specific amino-acids. Our result demonstrate the high sensitivity of our method to identify single amino acid residue in a protein chain and a potential for further applications in proteomics and single-protein sequencing. △ Less

Submitted 13 December, 2019; v1 submitted 9 August, 2019; originally announced August 2019.

Comments: Totally 22 pages, 12 figures and 3 tables including supporting information. arXiv admin note: text overlap with arXiv:1905.01856

Journal ref: Angewandte Chemie International Edition, 59, 11423-11431 (2020)

arXiv:1906.02241 [pdf, other]

A Deep Learning Framework for Classification of in vitro Multi-Electrode Array Recordings

Authors: Yun Zhao, Elmer Guzman, Morgane Audouard, Zhuowei Cheng, PaulK. Hansma, Kenneth S. Kosik, Linda Petzold

Abstract: Multi-Electrode Arrays (MEAs) have been widely used to record neuronal activities, which could be used in the diagnosis of gene defects and drug effects. In this paper, we address the problem of classifying in vitro MEA recordings of mouse and human neuronal cultures from different genotypes, where there is no easy way to directly utilize raw sequences as inputs to train an end-to-end classificati… ▽ More Multi-Electrode Arrays (MEAs) have been widely used to record neuronal activities, which could be used in the diagnosis of gene defects and drug effects. In this paper, we address the problem of classifying in vitro MEA recordings of mouse and human neuronal cultures from different genotypes, where there is no easy way to directly utilize raw sequences as inputs to train an end-to-end classification model. While carefully extracting some features by hand could partially solve the problem, this approach suffers from obvious drawbacks such as difficulty of generalizing. We propose a deep learning framework to address this challenge. Our approach correctly classifies neuronal culture data prepared from two different genotypes -- a mouse Knockout of the delta-catenin gene and human induced Pluripotent Stem Cell-derived neurons from Williams syndrome. By splitting the long recordings into short slices for training, and applying Consensus Prediction during testing, our deep learning approach improves the prediction accuracy by 16.69% compared with feature based Logistic Regression for mouse MEA recordings. We further achieve an accuracy of 95.91% using Consensus Prediction in one subset of mouse MEA recording data, which were all recorded at six days in vitro. As high-density MEA recordings become more widely available, this approach could be generalized for classification of neurons carrying different mutations and classification of drug responses. △ Less

Submitted 5 June, 2019; originally announced June 2019.

Comments: 14 pages, in ICDM 2019

arXiv:1811.09512 [pdf]

doi 10.1103/PhysRevE.101.042405

Critical slowing down and attractive manifold: a mechanism for dynamic robustness in yeast cell-cycle process

Authors: Yao Zhao, Dedi Wang, Zhiwen Zhang, Ying Lu, Xiao**g Yang, Qi Ouyang, Chao Tang, Fangting Li

Abstract: The biological processes that execute complex multiple functions, such as cell cycle, must ensure the order of sequential events and keep the dynamic robustness against various fluctuations. Here, we examine the dynamic mechanism and the fundamental structure to achieve these properties in the cell-cycle process of budding yeast Saccharomyces cerevisiae. We show that the budding yeast cell-cycle p… ▽ More The biological processes that execute complex multiple functions, such as cell cycle, must ensure the order of sequential events and keep the dynamic robustness against various fluctuations. Here, we examine the dynamic mechanism and the fundamental structure to achieve these properties in the cell-cycle process of budding yeast Saccharomyces cerevisiae. We show that the budding yeast cell-cycle process behaves like an excitable system containing three well-coupled saddle-node bifurcations to execute DNA replication and mitosis events. The yeast cell-cycle regulatory network can be separated into G1/S phase module, early M module and late M phase module, where the positive feedbacks in each module and the interactions among the modules play important role. If the cell-cycle process operates near the critical points of the saddle-node bifurcations, there is a critical slowing down or ghost effect. This can provide the cell-cycle process with a sufficient duration for each event and an attractive manifold for the state checking of the completion of DNA replication and mitosis; moreover, the fluctuation in the early module/event is forbidden to transmit to the latter module/event. Our results suggest both a fundamental structure of cell-cycle regulatory network and a hint for the evolution of eukaryotic cell-cycle processes, from the dynamic checking mechanism to the molecule checkpoint pathway. △ Less

Submitted 23 November, 2018; originally announced November 2018.

Comments: 27 pages, 12 figures

Journal ref: Phys. Rev. E 101, 042405 (2020)

arXiv:1806.01315 [pdf, other]

doi 10.1039/c8sm01832d

Cell Motility Dependence on Adhesive Wetting

Authors: Yuansheng Cao, Richa Karmakar, Elisabeth Ghabache, Edgar Gutierrez, Yanxiang Zhao, Alex Groisman, Herbert Levine, Brian A. Camley, Wouter-Jan Rappel

Abstract: Adhesive cell-substrate interactions are crucial for cell motility and are responsible for the necessary traction that propels cells. These interactions can also change the shape of the cell, analogous to liquid droplet wetting on adhesive substrates. To address how these shape changes affect cell migration and cell speed we model motility using deformable, 2D cross-sections of cells in which adhe… ▽ More Adhesive cell-substrate interactions are crucial for cell motility and are responsible for the necessary traction that propels cells. These interactions can also change the shape of the cell, analogous to liquid droplet wetting on adhesive substrates. To address how these shape changes affect cell migration and cell speed we model motility using deformable, 2D cross-sections of cells in which adhesion and frictional forces between cell and substrate can be varied separately. Our simulations show that increasing the adhesion results in increased spreading of cells and larger cell speeds. We propose an analytical model which shows that the cell speed is inversely proportional to an effective height of the cell and that increasing this height results in increased internal shear stress. The numerical and analytical results are confirmed in experiments on motile eukaryotic cells. △ Less

Submitted 4 February, 2019; v1 submitted 4 June, 2018; originally announced June 2018.

arXiv:1805.08947 [pdf, ps, other]

doi 10.1063/1.5042575

Effect of time varying transmission rates on coupled dynamics of epidemic and awareness over multiplex network

Authors: Vikram Sagar, Yi Zhao, Abhijit Sen

Abstract: In the present work, a non-linear stochastic model is presented to study the effect of time variation of transmission rates on the co-evolution of epidemics and its corresponding awareness over a two layered multiplex network. In this model, the infection transmission rate of a given node in the epidemic layer depends upon its awareness probability in the awareness layer. Similarly, the infection… ▽ More In the present work, a non-linear stochastic model is presented to study the effect of time variation of transmission rates on the co-evolution of epidemics and its corresponding awareness over a two layered multiplex network. In this model, the infection transmission rate of a given node in the epidemic layer depends upon its awareness probability in the awareness layer. Similarly, the infection information transmission rate of a node in the awareness layer depends upon its infection probability in the epidemic layer. The spread of disease resulting from physical contacts is described in terms of SIS (Susceptible Infected Susceptible) process over the epidemic layer and the spread of information about the disease outbreak is described in terms of UAU (Unaware Aware Unaware) process over the virtual interaction mediated awareness layer. The time variation of the transmission rates and the resulting co-evolution of these mutually competing processes is studied in terms of a network topology depend parameter(α). Using a second order linear theory it has been shown that in the continuous time limit, the co-evolution of these processes can be described in terms of damped and driven harmonic oscillator equations. From the results of the Monte-Carlo simulation, it is shown that for the suitable choice of parameter(α), the two process can either exhibit sustained oscillatory or damped dynamics. The damped dynamics corresponds to the endemic state. Further, for the case of endemic state it is shown that the inclusion of awareness layer significantly lowers the disease transmission rate and reduces the size of epidemic. The endemic state infection probability of a given node corresponding to the damped dynamics is found to have dependence upon both the transmission rates as well as on both absolute intra-layer and relative inter-layer degree of the individual nodes. △ Less

Submitted 31 May, 2018; v1 submitted 22 May, 2018; originally announced May 2018.

Comments: 33 pages, 8 figures

arXiv:1804.01203 [pdf, other]

doi 10.1093/bioinformatics/bty267

An integration of fast alignment and maximum-likelihood methods for electron subtomogram averaging and classification

Authors: Yixiu Zhao, Xiangrui Zeng, Qiang Guo, Min Xu

Abstract: Motivation: Cellular Electron CryoTomography (CECT) is an emerging 3D imaging technique that visualizes subcellular organization of single cells at submolecular resolution and in near-native state. CECT captures large numbers of macromolecular complexes of highly diverse structures and abundances. However, the structural complexity and imaging limits complicate the systematic de novo structural re… ▽ More Motivation: Cellular Electron CryoTomography (CECT) is an emerging 3D imaging technique that visualizes subcellular organization of single cells at submolecular resolution and in near-native state. CECT captures large numbers of macromolecular complexes of highly diverse structures and abundances. However, the structural complexity and imaging limits complicate the systematic de novo structural recovery and recognition of these macromolecular complexes. Efficient and accurate reference-free subtomogram averaging and classification represent the most critical tasks for such analysis. Existing subtomogram alignment based methods are prone to the missing wedge effects and low signal-to-noise ratio (SNR). Moreover, existing maximum-likelihood based methods rely on integration operations, which are in principle computationally infeasible for accurate calculation. Results: Built on existing works, we propose an integrated method, Fast Alignment Maximum Likelihood method (FAML), which uses fast subtomogram alignment to sample sub-optimal rigid transformations. The transformations are then used to approximate integrals for maximum-likelihood update of subtomogram averages through expectation-maximization algorithm. Our tests on simulated and experimental subtomograms showed that, compared to our previously developed fast alignment method (FA), FAML is significantly more robust to noise and missing wedge effects with moderate increases of computation cost.Besides, FAML performs well with significantly fewer input subtomograms when the FA method fails. Therefore, FAML can serve as a key component for improved construction of initial structural models from macromolecules captured by CECT. △ Less

Submitted 3 April, 2018; originally announced April 2018.

Comments: 17 pages

Journal ref: Intelligent Systems for Molecular Biology (ISMB) 2018, Bioinformatics

arXiv:1801.02375 [pdf, other]

Proteins at air-water and oil-water interfaces in an all-atom model

Authors: Yani Zhao, Marek Cieplak

Abstract: We study the behavior of five proteins at the air-water and oil-water interfaces by all-atom molecular dynamics. The proteins are found to get distorted when pinned to the interface. This behavior is consistent with the phenomenological way of introducing the interfaces in a coarse-grained model through a force that depends on the hydropathy indices of the residues. Proteins couple to the oil-wate… ▽ More We study the behavior of five proteins at the air-water and oil-water interfaces by all-atom molecular dynamics. The proteins are found to get distorted when pinned to the interface. This behavior is consistent with the phenomenological way of introducing the interfaces in a coarse-grained model through a force that depends on the hydropathy indices of the residues. Proteins couple to the oil-water interface stronger than to the air- water one. They diffuse slower at the oil-water interface but do not depin from it, whereas depinning events are observed at the other interface. The reduction of the disulfide bonds slows the diffusion down. △ Less

Submitted 8 January, 2018; originally announced January 2018.

Comments: 13 pages, 10 figures

Journal ref: Physical Chemistry Chemical Physics, 19(36), 25197-25206 (2017)

arXiv:1711.10674 [pdf]

Direct Information Reweighted by Contact Templates: Improved RNA Contact Prediction by Combining Structural Features

Authors: Yiren Jian, Chen Zeng, Yunjie Zhao

Abstract: It is acknowledged that co-evolutionary nucleotide-nucleotide interactions are essential for RNA structures and functions. Currently, direct coupling analysis (DCA) infers nucleotide contacts in a sequence from its homologous sequence alignment across different species. DCA and similar approaches that use sequence information alone usually yield a low accuracy, especially when the available homolo… ▽ More It is acknowledged that co-evolutionary nucleotide-nucleotide interactions are essential for RNA structures and functions. Currently, direct coupling analysis (DCA) infers nucleotide contacts in a sequence from its homologous sequence alignment across different species. DCA and similar approaches that use sequence information alone usually yield a low accuracy, especially when the available homologous sequences are limited. Here we present a new method that incorporates a Restricted Boltzmann Machine (RBM) to augment the information on sequence co-variations with structural patterns in contact inference. We thus name our method DIRECT that stands for Direct Information REweighted by Contact Templates. Benchmark tests demonstrate that DIRECT produces a substantial enhancement of 13% in accuracy on average for contact prediction in comparison to the traditional DCA. These results suggest that DIRECT could be used for improving predictions of RNA tertiary structures and functions. The source codes and dataset of DIRECT are available at http:// http://zhao.phy.ccnu.edu.cn:8122/DIRECT/index.html. △ Less

Submitted 28 November, 2017; originally announced November 2017.

Showing 1–50 of 69 results for author: Zhao, Y