Search | arXiv e-print repository

Enhanced Gene Selection in Single-Cell Genomics: Pre-Filtering Synergy and Reinforced Optimization

Authors: Weiliang Zhang, Zhen Meng, Dongjie Wang, Min Wu, Kunpeng Liu, Yuanchun Zhou, Meng Xiao

Abstract: Recent advancements in single-cell genomics necessitate precision in gene panel selection to interpret complex biological data effectively. Those methods aim to streamline the analysis of scRNA-seq data by focusing on the most informative genes that contribute significantly to the specific analysis task. Traditional selection methods, which often rely on expert domain knowledge, embedded machine l… ▽ More Recent advancements in single-cell genomics necessitate precision in gene panel selection to interpret complex biological data effectively. Those methods aim to streamline the analysis of scRNA-seq data by focusing on the most informative genes that contribute significantly to the specific analysis task. Traditional selection methods, which often rely on expert domain knowledge, embedded machine learning models, or heuristic-based iterative optimization, are prone to biases and inefficiencies that may obscure critical genomic signals. Recognizing the limitations of traditional methods, we aim to transcend these constraints with a refined strategy. In this study, we introduce an iterative gene panel selection strategy that is applicable to clustering tasks in single-cell genomics. Our method uniquely integrates results from other gene selection algorithms, providing valuable preliminary boundaries or prior knowledge as initial guides in the search space to enhance the efficiency of our framework. Furthermore, we incorporate the stochastic nature of the exploration process in reinforcement learning (RL) and its capability for continuous optimization through reward-based feedback. This combination mitigates the biases inherent in the initial boundaries and harnesses RL's adaptability to refine and target gene panel selection dynamically. To illustrate the effectiveness of our method, we conducted detailed comparative experiments, case studies, and visualization analysis. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 25 pages

arXiv:2405.09699 [pdf]

Map** Differential Protein-Protein Interaction Networks using Affinity Purification Mass Spectrometry

Authors: Prashant Kaushal, Manisha R. Ummadi, Gwendolyn M. Jang, Yennifer Delgado, Sara K. Makanani, Sophie F. Blanc, Decan M. Winters, Jiewei Xu, Benjamin Polacco, Yuan Zhou, Erica Stevenson, Manon Eckhardt, Lorena Zuliani-Alvarez, Robyn Kaake, Danielle L. Swaney, Nevan Krogan, Mehdi Bouhaddou

Abstract: Proteins congregate into complexes to perform fundamental cellular functions. Phenotypic outcomes, in health and disease, are often mechanistically driven by the remodeling of protein complexes by protein coding mutations or cellular signaling changes in response to molecular cues. Here, we present an affinity purification mass spectrometry (APMS) proteomics protocol to quantify and visualize glob… ▽ More Proteins congregate into complexes to perform fundamental cellular functions. Phenotypic outcomes, in health and disease, are often mechanistically driven by the remodeling of protein complexes by protein coding mutations or cellular signaling changes in response to molecular cues. Here, we present an affinity purification mass spectrometry (APMS) proteomics protocol to quantify and visualize global changes in protein protein interaction (PPI) networks between pairwise conditions. We describe steps for expressing affinity tagged bait proteins in mammalian cells, identifying purified protein complexes, quantifying differential PPIs, and visualizing differential PPI networks. Specifically, this protocol details steps for designing affinity tagged bait gene constructs, transfection, affinity purification, mass spectrometry sample preparation, data acquisition, database search, data quality control, PPI confidence scoring, cross run normalization, statistical data analysis, and differential PPI visualization. Our protocol discusses caveats and limitations with applicability across cell types and biological areas. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 29 pages, 3 figures

arXiv:2405.06655 [pdf]

RNA Secondary Structure Prediction Using Transformer-Based Deep Learning Models

Authors: Yanlin Zhou, Tong Zhan, Yichao Wu, Bo Song, Chenxi Shi

Abstract: The Human Genome Project has led to an exponential increase in data related to the sequence, structure, and function of biomolecules. Bioinformatics is an interdisciplinary research field that primarily uses computational methods to analyze large amounts of biological macromolecule data. Its goal is to discover hidden biological patterns and related information. Furthermore, analysing additional r… ▽ More The Human Genome Project has led to an exponential increase in data related to the sequence, structure, and function of biomolecules. Bioinformatics is an interdisciplinary research field that primarily uses computational methods to analyze large amounts of biological macromolecule data. Its goal is to discover hidden biological patterns and related information. Furthermore, analysing additional relevant information can enhance the study of biological operating mechanisms. This paper discusses the fundamental concepts of RNA, RNA secondary structure, and its prediction.Subsequently, the application of machine learning technologies in predicting the structure of biological macromolecules is explored. This chapter describes the relevant knowledge of algorithms and computational complexity and presents a RNA tertiary structure prediction algorithm based on ResNet. To address the issue of the current scoring function's unsuitability for long RNA, a scoring model based on ResNet is proposed, and a structure prediction algorithm is designed. The chapter concludes by presenting some open and interesting challenges in the field of RNA tertiary structure prediction. △ Less

Submitted 14 April, 2024; originally announced May 2024.

arXiv:2404.18162 [pdf, other]

fMRI Exploration of Visual Quality Assessment

Authors: Yiming Zhang, Ying Hu, Xiongkuo Min, Yan Zhou, Guangtao Zhai

Abstract: Despite significant strides in visual quality assessment, the neural mechanisms underlying visual quality perception remain insufficiently explored. This study employed fMRI to examine brain activity during image quality assessment and identify differences in human processing of images with varying quality. Fourteen healthy participants underwent tasks assessing both image quality and content clas… ▽ More Despite significant strides in visual quality assessment, the neural mechanisms underlying visual quality perception remain insufficiently explored. This study employed fMRI to examine brain activity during image quality assessment and identify differences in human processing of images with varying quality. Fourteen healthy participants underwent tasks assessing both image quality and content classification while undergoing functional MRI scans. The collected behavioral data was statistically analyzed, and univariate and functional connectivity analyses were conducted on the imaging data. The findings revealed that quality assessment is a more complex task than content classification, involving enhanced activation in high-level cognitive brain regions for fine-grained visual analysis. Moreover, the research showed the brain's adaptability to different visual inputs, adopting different strategies depending on the input's quality. In response to high-quality images, the brain primarily uses specialized visual areas for precise analysis, whereas with low-quality images, it recruits additional resources including higher-order visual cortices and related cognitive and attentional networks to decode and recognize complex, ambiguous signals effectively. This study pioneers the intersection of neuroscience and image quality research, providing empirical evidence through fMRI linking image quality to neural processing. It contributes novel insights into the human visual system's response to diverse image qualities, thereby paving the way for advancements in objective image quality assessment algorithms. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.13097 [pdf, other]

DISC: Latent Diffusion Models with Self-Distillation from Separated Conditions for Prostate Cancer Grading

Authors: Man M. Ho, Elham Ghelichkhan, Yosep Chong, Yufei Zhou, Beatrice Knudsen, Tolga Tasdizen

Abstract: Latent Diffusion Models (LDMs) can generate high-fidelity images from noise, offering a promising approach for augmenting histopathology images for training cancer grading models. While previous works successfully generated high-fidelity histopathology images using LDMs, the generation of image tiles to improve prostate cancer grading has not yet been explored. Additionally, LDMs face challenges i… ▽ More Latent Diffusion Models (LDMs) can generate high-fidelity images from noise, offering a promising approach for augmenting histopathology images for training cancer grading models. While previous works successfully generated high-fidelity histopathology images using LDMs, the generation of image tiles to improve prostate cancer grading has not yet been explored. Additionally, LDMs face challenges in accurately generating admixtures of multiple cancer grades in a tile when conditioned by a tile mask. In this study, we train specific LDMs to generate synthetic tiles that contain multiple Gleason Grades (GGs) by leveraging pixel-wise annotations in input tiles. We introduce a novel framework named Self-Distillation from Separated Conditions (DISC) that generates GG patterns guided by GG masks. Finally, we deploy a training framework for pixel-level and slide-level prostate cancer grading, where synthetic tiles are effectively utilized to improve the cancer grading performance of existing models. As a result, this work surpasses previous works in two domains: 1) our LDMs enhanced with DISC produce more accurate tiles in terms of GG patterns, and 2) our training scheme, incorporating synthetic data, significantly improves the generalization of the baseline model for prostate cancer grading, particularly in challenging cases of rare GG5, demonstrating the potential of generative models to enhance cancer grading when data is limited. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: Abstract accepted for ISBI 2024. Extended version to be presented at SynData4CV @ CVPR 2024. See more at https://minhmanho.github.io/disc/

arXiv:2404.06167 [pdf, other]

scCDCG: Efficient Deep Structural Clustering for single-cell RNA-seq via Deep Cut-informed Graph Embedding

Authors: ** Xu, Zhiyuan Ning, Meng Xiao, Guihai Feng, Xin Li, Yuanchun Zhou, Pengfei Wang

Abstract: Single-cell RNA sequencing (scRNA-seq) is essential for unraveling cellular heterogeneity and diversity, offering invaluable insights for bioinformatics advancements. Despite its potential, traditional clustering methods in scRNA-seq data analysis often neglect the structural information embedded in gene expression profiles, crucial for understanding cellular correlations and dependencies. Existin… ▽ More Single-cell RNA sequencing (scRNA-seq) is essential for unraveling cellular heterogeneity and diversity, offering invaluable insights for bioinformatics advancements. Despite its potential, traditional clustering methods in scRNA-seq data analysis often neglect the structural information embedded in gene expression profiles, crucial for understanding cellular correlations and dependencies. Existing strategies, including graph neural networks, face challenges in handling the inefficiency due to scRNA-seq data's intrinsic high-dimension and high-sparsity. Addressing these limitations, we introduce scCDCG (single-cell RNA-seq Clustering via Deep Cut-informed Graph), a novel framework designed for efficient and accurate clustering of scRNA-seq data that simultaneously utilizes intercellular high-order structural information. scCDCG comprises three main components: (i) A graph embedding module utilizing deep cut-informed techniques, which effectively captures intercellular high-order structural information, overcoming the over-smoothing and inefficiency issues prevalent in prior graph neural network methods. (ii) A self-supervised learning module guided by optimal transport, tailored to accommodate the unique complexities of scRNA-seq data, specifically its high-dimension and high-sparsity. (iii) An autoencoder-based feature learning module that simplifies model complexity through effective dimension reduction and feature extraction. Our extensive experiments on 6 datasets demonstrate scCDCG's superior performance and efficiency compared to 7 established models, underscoring scCDCG's potential as a transformative tool in scRNA-seq data analysis. Our code is available at: https://github.com/XPgogogo/scCDCG. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: Accepted as a long paper for the research track at DASFAA 2024

arXiv:2402.19095 [pdf]

A Protein Structure Prediction Approach Leveraging Transformer and CNN Integration

Authors: Yanlin Zhou, Kai Tan, Xinyu Shen, Zheng He, Haotian Zheng

Abstract: Proteins are essential for life, and their structure determines their function. The protein secondary structure is formed by the folding of the protein primary structure, and the protein tertiary structure is formed by the bending and folding of the secondary structure. Therefore, the study of protein secondary structure is very helpful to the overall understanding of protein structure. Although t… ▽ More Proteins are essential for life, and their structure determines their function. The protein secondary structure is formed by the folding of the protein primary structure, and the protein tertiary structure is formed by the bending and folding of the secondary structure. Therefore, the study of protein secondary structure is very helpful to the overall understanding of protein structure. Although the accuracy of protein secondary structure prediction has continuously improved with the development of machine learning and deep learning, progress in the field of protein structure prediction, unfortunately, remains insufficient to meet the large demand for protein information. Therefore, based on the advantages of deep learning-based methods in feature extraction and learning ability, this paper adopts a two-dimensional fusion deep neural network model, DstruCCN, which uses Convolutional Neural Networks (CCN) and a supervised Transformer protein language model for single-sequence protein structure prediction. The training features of the two are combined to predict the protein Transformer binding site matrix, and then the three-dimensional structure is reconstructed using energy minimization. △ Less

Submitted 8 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.12405 [pdf, other]

scInterpreter: Training Large Language Models to Interpret scRNA-seq Data for Cell Type Annotation

Authors: Cong Li, Meng Xiao, Pengfei Wang, Guihai Feng, Xin Li, Yuanchun Zhou

Abstract: Despite the inherent limitations of existing Large Language Models in directly reading and interpreting single-cell omics data, they demonstrate significant potential and flexibility as the Foundation Model. This research focuses on how to train and adapt the Large Language Model with the capability to interpret and distinguish cell types in single-cell RNA sequencing data. Our preliminary researc… ▽ More Despite the inherent limitations of existing Large Language Models in directly reading and interpreting single-cell omics data, they demonstrate significant potential and flexibility as the Foundation Model. This research focuses on how to train and adapt the Large Language Model with the capability to interpret and distinguish cell types in single-cell RNA sequencing data. Our preliminary research results indicate that these foundational models excel in accurately categorizing known cell types, demonstrating the potential of the Large Language Models as effective tools for uncovering new biological insights. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: 4 pages, submitted to FCS

arXiv:2312.12094 [pdf, other]

CrossBind: Collaborative Cross-Modal Identification of Protein Nucleic-Acid-Binding Residues

Authors: Linglin **g, Sheng Xu, Yifan Wang, Yuzhe Zhou, Tao Shen, Zhigang Ji, Hui Fang, Zhen Li, Siqi Sun

Abstract: Accurate identification of protein nucleic-acid-binding residues poses a significant challenge with important implications for various biological processes and drug design. Many typical computational methods for protein analysis rely on a single model that could ignore either the semantic context of the protein or the global 3D geometric information. Consequently, these approaches may result in in… ▽ More Accurate identification of protein nucleic-acid-binding residues poses a significant challenge with important implications for various biological processes and drug design. Many typical computational methods for protein analysis rely on a single model that could ignore either the semantic context of the protein or the global 3D geometric information. Consequently, these approaches may result in incomplete or inaccurate protein analysis. To address the above issue, in this paper, we present CrossBind, a novel collaborative cross-modal approach for identifying binding residues by exploiting both protein geometric structure and its sequence prior knowledge extracted from a large-scale protein language model. Specifically, our multi-modal approach leverages a contrastive learning technique and atom-wise attention to capture the positional relationships between atoms and residues, thereby incorporating fine-grained local geometric knowledge, for better binding residue prediction. Extensive experimental results demonstrate that our approach outperforms the next best state-of-the-art methods, GraphSite and GraphBind, on DNA and RNA datasets by 10.8/17.3% in terms of the harmonic mean of precision and recall (F1-Score) and 11.9/24.8% in Matthews correlation coefficient (MCC), respectively. We release the code at https://github.com/BEAM-Labs/CrossBind. △ Less

Submitted 20 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI-24

arXiv:2311.09308 [pdf, other]

Divergences between Language Models and Human Brains

Authors: Yuchen Zhou, Emmy Liu, Graham Neubig, Michael J. Tarr, Leila Wehbe

Abstract: Do machines and humans process language in similar ways? Recent research has hinted in the affirmative, finding that brain signals can be effectively predicted using the internal representations of language models (LMs). Although such results are thought to reflect shared computational principles between LMs and human brains, there are also clear differences in how LMs and humans represent and use… ▽ More Do machines and humans process language in similar ways? Recent research has hinted in the affirmative, finding that brain signals can be effectively predicted using the internal representations of language models (LMs). Although such results are thought to reflect shared computational principles between LMs and human brains, there are also clear differences in how LMs and humans represent and use language. In this work, we systematically explore the divergences between human and machine language processing by examining the differences between LM representations and human brain responses to language as measured by Magnetoencephalography (MEG) across two datasets in which subjects read and listened to narrative stories. Using a data-driven approach, we identify two domains that are not captured well by LMs: social/emotional intelligence and physical commonsense. We then validate these domains with human behavioral experiments and show that fine-tuning LMs on these domains can improve their alignment with human brain responses. △ Less

Submitted 4 February, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.02204 [pdf, other]

Active risk aversion in SIS epidemics on networks

Authors: Anastasia Bizyaeva, Marcela Ordorica Arango, Yunxiu Zhou, Simon Levin, Naomi Ehrich Leonard

Abstract: We present and analyze an actively controlled Susceptible-Infected-Susceptible (actSIS) model of interconnected populations to study how risk aversion strategies, such as social distancing, affect network epidemics. A population using a risk aversion strategy reduces its contact rate with other populations when it perceives an increase in infection risk. The network actSIS model relies on two dist… ▽ More We present and analyze an actively controlled Susceptible-Infected-Susceptible (actSIS) model of interconnected populations to study how risk aversion strategies, such as social distancing, affect network epidemics. A population using a risk aversion strategy reduces its contact rate with other populations when it perceives an increase in infection risk. The network actSIS model relies on two distinct networks. One is a physical contact network that defines which populations come into contact with which other populations and thus how infection spreads. The other is a communication network, such as an online social network, that defines which populations observe the infection level of which other populations and thus how information spreads. We prove that the model, with these two networks and populations using risk aversion strategies, exhibits a transcritical bifurcation in which an endemic equilibrium emerges. For regular graphs, we prove that the endemic infection level is uniform across populations and reduced by the risk aversion strategy, relative to the network SIS endemic level. We show that when communication is sufficiently sparse, this initially stable equilibrium loses stability in a secondary bifurcation. Simulations show that a new stable solution emerges with nonuniform infection levels. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2310.10697 [pdf]

Synthetic IMU Datasets and Protocols Can Simplify Fall Detection Experiments and Optimize Sensor Configuration

Authors: Jie Tang, Bin He, Junkai Xu, Tian Tan, Zhipeng Wang, Yanmin Zhou, Shuo Jiang

Abstract: Falls represent a significant cause of injury among the elderly population. Extensive research has been devoted to the utilization of wearable IMU sensors in conjunction with machine learning techniques for fall detection. To address the challenge of acquiring costly training data, this paper presents a novel method that generates a substantial volume of synthetic IMU data with minimal real fall e… ▽ More Falls represent a significant cause of injury among the elderly population. Extensive research has been devoted to the utilization of wearable IMU sensors in conjunction with machine learning techniques for fall detection. To address the challenge of acquiring costly training data, this paper presents a novel method that generates a substantial volume of synthetic IMU data with minimal real fall experiments. First, unmarked 3D motion capture technology is employed to reconstruct human movements. Subsequently, utilizing the biomechanical simulation platform Opensim and forward kinematic methods, an ample amount of training data from various body segments can be custom generated. An LSTM model is trained, achieving testing accuracies of 91.99% and 86.62% on two distinct datasets of actual fall-related IMU data, demonstrated the comparable performance of models trained using genuine IMU data. Building upon the simulation framework, this paper further optimized the single IMU attachment position and multiple IMU combinations on fall detection. The proposed method simplifies fall detection data acquisition experiments, provides novel venue for generating low cost synthetic data in scenario where acquiring data for machine learning is challenging and paves the way for customizing machine learning configurations. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 11 pages, 7 figures

arXiv:2310.06578 [pdf, other]

Energy-Efficient Visual Search by Eye Movement and Low-Latency Spiking Neural Network

Authors: Yunhui Zhou, Dongqi Han, Yuguo Yu

Abstract: Human vision incorporates non-uniform resolution retina, efficient eye movement strategy, and spiking neural network (SNN) to balance the requirements in visual field size, visual resolution, energy cost, and inference latency. These properties have inspired interest in develo** human-like computer vision. However, existing models haven't fully incorporated the three features of human vision, an… ▽ More Human vision incorporates non-uniform resolution retina, efficient eye movement strategy, and spiking neural network (SNN) to balance the requirements in visual field size, visual resolution, energy cost, and inference latency. These properties have inspired interest in develo** human-like computer vision. However, existing models haven't fully incorporated the three features of human vision, and their learned eye movement strategies haven't been compared with human's strategy, making the models' behavior difficult to interpret. Here, we carry out experiments to examine human visual search behaviors and establish the first SNN-based visual search model. The model combines an artificial retina with spiking feature extraction, memory, and saccade decision modules, and it employs population coding for fast and efficient saccade decisions. The model can learn either a human-like or a near-optimal fixation strategy, outperform humans in search speed and accuracy, and achieve high energy efficiency through short saccade decision latency and sparse activation. It also suggests that the human search strategy is suboptimal in terms of search speed. Our work connects modeling of vision in neuroscience and machine learning and sheds light on develo** more energy-efficient computer vision algorithms. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2308.08561 [pdf]

doi 10.5281/zenodo.7983561

Implementation of The Future of Drug Discovery: QuantumBased Machine Learning Simulation (QMLS)

Authors: Yifan Zhou, Yew Kee Wong, Yan Shing Liang, Haichuan Qiu, Yu Xi Wu, Bin He

Abstract: The Research & Development (R&D) phase of drug development is a lengthy and costly process. To revolutionize this process, we introduce our new concept QMLS to shorten the whole R&D phase to three to six months and decrease the cost to merely fifty to eighty thousand USD. For Hit Generation, Machine Learning Molecule Generation (MLMG) generates possible hits according to the molecular structure of… ▽ More The Research & Development (R&D) phase of drug development is a lengthy and costly process. To revolutionize this process, we introduce our new concept QMLS to shorten the whole R&D phase to three to six months and decrease the cost to merely fifty to eighty thousand USD. For Hit Generation, Machine Learning Molecule Generation (MLMG) generates possible hits according to the molecular structure of the target protein while the Quantum Simulation (QS) filters molecules from the primary essay based on the reaction and binding effectiveness with the target protein. Then, For Lead Optimization, the resultant molecules generated and filtered from MLMG and QS are compared, and molecules that appear as a result of both processes will be made into dozens of molecular variations through Machine Learning Molecule Variation (MLMV), while others will only be made into a few variations. Lastly, all optimized molecules would undergo multiple rounds of QS filtering with a high standard for reaction effectiveness and safety, creating a few dozen pre-clinical-trail-ready drugs. This paper is based on our first paper, where we pitched the concept of machine learning combined with quantum simulations. In this paper we will go over the detailed design and framework of QMLS, including MLMG, MLMV, and QS. △ Less

Submitted 25 October, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: 13 pages, 6 figures

Journal ref: International Journal of Computer Science and Mobile Applications, Vol 11 Issue 5,May- 2023

arXiv:2308.06294 [pdf]

Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT

Authors: **gye Yang, Cong Liu, Wendy Deng, Da Wu, Chunhua Weng, Yunyun Zhou, Kai Wang

Abstract: We hypothesize that large language models (LLMs) based on the transformer architecture can enable automated detection of clinical phenotype terms, including terms not documented in the HPO. In this study, we developed two types of models: PhenoBCBERT, a BERT-based model, utilizing Bio+Clinical BERT as its pre-trained model, and PhenoGPT, a GPT-based model that can be initialized from diverse GPT m… ▽ More We hypothesize that large language models (LLMs) based on the transformer architecture can enable automated detection of clinical phenotype terms, including terms not documented in the HPO. In this study, we developed two types of models: PhenoBCBERT, a BERT-based model, utilizing Bio+Clinical BERT as its pre-trained model, and PhenoGPT, a GPT-based model that can be initialized from diverse GPT models, including open-source versions such as GPT-J, Falcon, and LLaMA, as well as closed-source versions such as GPT-3 and GPT-3.5. We compared our methods with PhenoTagger, a recently developed HPO recognition tool that combines rule-based and deep learning methods. We found that our methods can extract more phenotype concepts, including novel ones not characterized by HPO. We also performed case studies on biomedical literature to illustrate how new phenotype information can be recognized and extracted. We compared current BERT-based versus GPT-based models for phenotype tagging, in multiple aspects including model architecture, memory usage, speed, accuracy, and privacy protection. We also discussed the addition of a negation step and an HPO normalization layer to the transformer models for improved HPO term tagging. In conclusion, PhenoBCBERT and PhenoGPT enable the automated discovery of phenotype terms from clinical notes and biomedical literature, facilitating automated downstream tasks to derive new biological insights on human diseases. △ Less

Submitted 9 November, 2023; v1 submitted 10 August, 2023; originally announced August 2023.

arXiv:2308.06219 [pdf]

Acoustofluidic Engineering Functional Vessel-on-a-Chip

Authors: Yue Wu, Yuwen Zhao, Khayrul Islam, Yuyuan Zhou, Saeed Omidi, Yevgeny Berdichevsky, Yaling Liu

Abstract: Construction of in vitro vascular models is of great significance to various biomedical research, such as pharmacokinetics and hemodynamics, thus is an important direction in tissue engineering. In this work, a standing surface acoustic wave field was constructed to spatially arrange suspended endothelial cells into a designated patterning. The cell patterning was maintained after the acoustic fie… ▽ More Construction of in vitro vascular models is of great significance to various biomedical research, such as pharmacokinetics and hemodynamics, thus is an important direction in tissue engineering. In this work, a standing surface acoustic wave field was constructed to spatially arrange suspended endothelial cells into a designated patterning. The cell patterning was maintained after the acoustic field was withdrawn by the solidified hydrogel. Then, interstitial flow was provided to activate vessel tube formation. Thus, a functional vessel-on-a-chip was engineered with specific vessel geometry. Vascular function, including perfusability and vascular barrier function, was characterized by beads loading and dextran diffusion, respectively. A computational atomistic simulation model was proposed to illustrate how solutes cross vascular lipid bilayer. The reported acoustofluidic methodology is capable of facile and reproducible fabrication of functional vessel network with specific geometry. It is promising to facilitate the development of both fundamental research and regenerative therapy. △ Less

Submitted 17 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

arXiv:2307.08576 [pdf]

A Study on the Performance of Generative Pre-trained Transformer (GPT) in Simulating Depressed Individuals on the Standardized Depressive Symptom Scale

Authors: Si** Cai, Nanfeng Zhang, Jiaying Zhu, Yanjie Liu, Yong** Zhou

Abstract: Background: Depression is a common mental disorder with societal and economic burden. Current diagnosis relies on self-reports and assessment scales, which have reliability issues. Objective approaches are needed for diagnosing depression. Objective: Evaluate the potential of GPT technology in diagnosing depression. Assess its ability to simulate individuals with depression and investigate the inf… ▽ More Background: Depression is a common mental disorder with societal and economic burden. Current diagnosis relies on self-reports and assessment scales, which have reliability issues. Objective approaches are needed for diagnosing depression. Objective: Evaluate the potential of GPT technology in diagnosing depression. Assess its ability to simulate individuals with depression and investigate the influence of depression scales. Methods: Three depression-related assessment tools (HAMD-17, SDS, GDS-15) were used. Two experiments simulated GPT responses to normal individuals and individuals with depression. Compare GPT's responses with expected results, assess its understanding of depressive symptoms, and performance differences under different conditions. Results: GPT's performance in depression assessment was evaluated. It aligned with scoring criteria for both individuals with depression and normal individuals. Some performance differences were observed based on depression severity. GPT performed better on scales with higher sensitivity. Conclusion: GPT accurately simulates individuals with depression and normal individuals during depression-related assessments. Deviations occur when simulating different degrees of depression, limiting understanding of mild and moderate cases. GPT performs better on scales with higher sensitivity, indicating potential for develo** more effective depression scales. GPT has important potential in depression assessment, supporting clinicians and patients. △ Less

Submitted 17 July, 2023; originally announced July 2023.

arXiv:2307.00511 [pdf]

SUGAR: Spherical Ultrafast Graph Attention Framework for Cortical Surface Registration

Authors: Jianxun Ren, Ning An, Youjia Zhang, Danyang Wang, Zhenyu Sun, Cong Lin, Weigang Cui, Weiwei Wang, Ying Zhou, Wei Zhang, Qingyu Hu, ** Zhang, Dan Hu, Danhong Wang, Hesheng Liu

Abstract: Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a lea… ▽ More Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a learning-based method that exceeds the state-of-the-art conventional methods simultaneously in computational efficiency, registration accuracy, and distortion control, despite the theoretically greater representational capabilities of deep learning approaches. To address the challenge, we present SUGAR, a unified unsupervised deep-learning framework for both rigid and non-rigid registration. SUGAR incorporates a U-Net-based spherical graph attention network and leverages the Euler angle representation for deformation. In addition to the similarity loss, we introduce fold and multiple distortion losses, to preserve topology and minimize various types of distortions. Furthermore, we propose a data augmentation strategy specifically tailored for spherical surface registration, enhancing the registration performance. Through extensive evaluation involving over 10,000 scans from 7 diverse datasets, we showed that our framework exhibits comparable or superior registration performance in accuracy, distortion, and test-retest reliability compared to conventional and learning-based methods. Additionally, SUGAR achieves remarkable sub-second processing times, offering a notable speed-up of approximately 12,000 times in registering 9,000 subjects from the UK Biobank dataset in just 32 minutes. This combination of high registration performance and accelerated processing time may greatly benefit large-scale neuroimaging studies. △ Less

Submitted 2 July, 2023; originally announced July 2023.

arXiv:2306.07505 [pdf]

Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Authors: Lan Wang, Ruiling He, Lili Zhao, Jia Wang, Zhengzi Geng, Tao Ren, Guo Zhang, Peng Zhang, Kaiqiang Tang, Chaofei Gao, Fei Chen, Liting Zhang, Yonghe Zhou, Xin Li, Fanbin He, Hui Huan, Wenjuan Wang, Yunxiao Liang, Juan Tang, Fang Ai, Tingyu Wang, Liyun Zheng, Zhongwei Zhao, Jiansong Ji, Wei Liu , et al. (22 additional authors not shown)

Abstract: Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with… ▽ More Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with compensated advanced chronic liver disease. 305 patients were enrolled from 12 hospitals, and finally 265 patients were included, with 1136 liver stiffness measurement (LSM) images and 1042 spleen stiffness measurement (SSM) images generated by 2D-SWE. We leveraged deep learning methods to uncover associations between image features and patient risk, and thus conducted models to predict GEV and HRV. Results: A multi-modality Deep Learning Risk Prediction model (DLRP) was constructed to assess GEV and HRV, based on LSM and SSM images, and clinical information. Validation analysis revealed that the AUCs of DLRP were 0.91 for GEV (95% CI 0.90 to 0.93, p < 0.05) and 0.88 for HRV (95% CI 0.86 to 0.89, p < 0.01), which were significantly and robustly better than canonical risk indicators, including the value of LSM and SSM. Moreover, DLPR was better than the model using individual parameters, including LSM and SSM images. In HRV prediction, the 2D-SWE images of SSM outperform LSM (p < 0.01). Conclusion: DLRP shows excellent performance in predicting GEV and HRV over canonical risk indicators LSM and SSM. Additionally, the 2D-SWE images of SSM provided more information for better accuracy in predicting HRV than the LSM. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2306.05555 [pdf, ps, other]

Impact of resource distributions on the competition of species in stream environment

Authors: Tung D. Nguyen, Yixiang Wu, Tingting Tang, Amy Veprauskas, Ying Zhou, Behzad Djafari Rouhani, Zhisheng Shuai

Abstract: Our earlier work in \cite{nguyen2022population} shows that concentrating the resources on the upstream end tends to maximize the total biomass in a metapopulation model for a stream species. In this paper, we continue our research direction by further considering a Lotka-Voletrra competition patch model for two stream species. We show that the species whose resource allocations maximize the total… ▽ More Our earlier work in \cite{nguyen2022population} shows that concentrating the resources on the upstream end tends to maximize the total biomass in a metapopulation model for a stream species. In this paper, we continue our research direction by further considering a Lotka-Voletrra competition patch model for two stream species. We show that the species whose resource allocations maximize the total biomass has competitive advantage. △ Less

Submitted 27 July, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: 32 pages

MSC Class: 92D25; 92D40; 34C12; 34D23; 37C65

arXiv:2306.05257 [pdf, other]

doi 10.1093/bib/bbad235

Comprehensive evaluation of deep and graph learning on drug-drug interactions prediction

Authors: Xuan Lin, Lichang Dai, Yafang Zhou, Zu-Guo Yu, Wen Zhang, Jian-Yu Shi, Dong-Sheng Cao, Li Zeng, Haowen Chen, Bosheng Song, Philip S. Yu, Xiangxiang Zeng

Abstract: Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction… ▽ More Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction through traditional clinical trials and experiments is an expensive and time-consuming process. To correctly apply the advanced AI and deep learning, the developer and user meet various challenges such as the availability and encoding of data resources, and the design of computational methods. This review summarizes chemical structure based, network based, NLP based and hybrid methods, providing an updated and accessible guide to the broad researchers and development community with different domain knowledge. We introduce widely-used molecular representation and describe the theoretical frameworks of graph neural network models for representing molecular structures. We present the advantages and disadvantages of deep and graph learning methods by performing comparative experiments. We discuss the potential technical challenges and highlight future directions of deep and graph learning models for accelerating DDIs prediction. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: Accepted by Briefings in Bioinformatics

arXiv:2304.06176 [pdf]

Surface-guided computing to analyze subcellular morphology and membrane-associated signals in 3D

Authors: Felix Y. Zhou, Andrew Weems, Gabriel M. Gihana, Bingying Chen, Bo-Jui Chang, Meghan Driscoll, Gaudenz Danuser

Abstract: Signal transduction and cell function are governed by the spatiotemporal organization of membrane-associated molecules. Despite significant advances in visualizing molecular distributions by 3D light microscopy, cell biologists still have limited quantitative understanding of the processes implicated in the regulation of molecular signals at the whole cell scale. In particular, complex and transie… ▽ More Signal transduction and cell function are governed by the spatiotemporal organization of membrane-associated molecules. Despite significant advances in visualizing molecular distributions by 3D light microscopy, cell biologists still have limited quantitative understanding of the processes implicated in the regulation of molecular signals at the whole cell scale. In particular, complex and transient cell surface morphologies challenge the complete sampling of cell geometry, membrane-associated molecular concentration and activity and the computing of meaningful parameters such as the cofluctuation between morphology and signals. Here, we introduce u-Unwrap3D, a framework to remap arbitrarily complex 3D cell surfaces and membrane-associated signals into equivalent lower dimensional representations. The map**s are bidirectional, allowing the application of image processing operations in the data representation best suited for the task and to subsequently present the results in any of the other representations, including the original 3D cell surface. Leveraging this surface-guided computing paradigm, we track segmented surface motifs in 2D to quantify the recruitment of Septin polymers by blebbing events; we quantify actin enrichment in peripheral ruffles; and we measure the speed of ruffle movement along topographically complex cell surfaces. Thus, u-Unwrap3D provides access to spatiotemporal analyses of cell biological parameters on unconstrained 3D surface geometries and signals. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: 49 pages, 10 figures

arXiv:2303.06965 [pdf, other]

doi 10.1038/s42256-023-00764-9

Bridging the Gap between Chemical Reaction Pretraining and Conditional Molecule Generation with a Unified Model

Authors: Bo Qiang, Yiran Zhou, Yuheng Ding, Ningfeng Liu, Song Song, Liangren Zhang, Bo Huang, Zhenming Liu

Abstract: Chemical reactions are the fundamental building blocks of drug design and organic chemistry research. In recent years, there has been a growing need for a large-scale deep-learning framework that can efficiently capture the basic rules of chemical reactions. In this paper, we have proposed a unified framework that addresses both the reaction representation learning and molecule generation tasks, w… ▽ More Chemical reactions are the fundamental building blocks of drug design and organic chemistry research. In recent years, there has been a growing need for a large-scale deep-learning framework that can efficiently capture the basic rules of chemical reactions. In this paper, we have proposed a unified framework that addresses both the reaction representation learning and molecule generation tasks, which allows for a more holistic approach. Inspired by the organic chemistry mechanism, we develop a novel pretraining framework that enables us to incorporate inductive biases into the model. Our framework achieves state-of-the-art results on challenging downstream tasks. By possessing chemical knowledge, our generative framework overcome the limitations of current molecule generation models that rely on a small number of reaction templates. In the extensive experiments, our model generates synthesizable drug-like structures of high quality. Overall, our work presents a significant step toward a large-scale deep-learning framework for a variety of reaction-based applications. △ Less

Submitted 7 March, 2024; v1 submitted 13 March, 2023; originally announced March 2023.

arXiv:2212.10883 [pdf, other]

Detecting Temporal shape changes with the Euler Characteristic Transform

Authors: Lewis Marsh, Felix Y. Zhou, Xiao Qin, Xin Lu, Helen M. Byrne, Heather A. Harrington

Abstract: Organoids are multi-cellular structures which are cultured in vitro from stem cells to resemble specific organs (e.g., brain, liver) in their three-dimensional composition. Dynamic changes in the shape and composition of these model systems can be used to understand the effect of mutations and treatments in health and disease. In this paper, we propose a new technique in the field of topological d… ▽ More Organoids are multi-cellular structures which are cultured in vitro from stem cells to resemble specific organs (e.g., brain, liver) in their three-dimensional composition. Dynamic changes in the shape and composition of these model systems can be used to understand the effect of mutations and treatments in health and disease. In this paper, we propose a new technique in the field of topological data analysis for DEtecting Temporal shape changes with the Euler Characteristic Transform (DETECT). DETECT is a rotationally invariant signature of dynamically changing shapes. We demonstrate our method on a data set of segmented videos of mouse small intestine organoid experiments and show that it outperforms classical shape descriptors. We verify our method on a synthetic organoid data set and illustrate how it generalises to 3D. We conclude that DETECT offers rigorous quantification of organoids and opens up computationally scalable methods for distinguishing different growth regimes and assessing treatment effects. △ Less

Submitted 22 December, 2022; v1 submitted 21 December, 2022; originally announced December 2022.

arXiv:2209.13865 [pdf, other]

Zero-Shot 3D Drug Design by Sketching and Generating

Authors: Siyu Long, Yi Zhou, Xinyu Dai, Hao Zhou

Abstract: Drug design is a crucial step in the drug discovery cycle. Recently, various deep learning-based methods design drugs by generating novel molecules from scratch, avoiding traversing large-scale drug libraries. However, they depend on scarce experimental data or time-consuming docking simulation, leading to overfitting issues with limited training data and slow generation speed. In this study, we p… ▽ More Drug design is a crucial step in the drug discovery cycle. Recently, various deep learning-based methods design drugs by generating novel molecules from scratch, avoiding traversing large-scale drug libraries. However, they depend on scarce experimental data or time-consuming docking simulation, leading to overfitting issues with limited training data and slow generation speed. In this study, we propose the zero-shot drug design method DESERT (Drug dEsign by SkEtching and geneRaTing). Specifically, DESERT splits the design process into two stages: sketching and generating, and bridges them with the molecular shape. The two-stage fashion enables our method to utilize the large-scale molecular database to reduce the need for experimental data and docking simulation. Experiments show that DESERT achieves a new state-of-the-art at a fast speed. △ Less

Submitted 4 October, 2022; v1 submitted 28 September, 2022; originally announced September 2022.

Comments: NeurIPS 2022 camera-ready

arXiv:2207.01813 [pdf, other]

Stochastic Variational Methods in Generalized Hidden Semi-Markov Models to Characterize Functionality in Random Heteropolymers

Authors: Yun Zhou, Boying Gong, Tao Jiang, Ting Xu, Haiyan Huang

Abstract: Recent years have seen substantial advances in the development of biofunctional materials using synthetic polymers. The growing problem of elusive sequence-functionality relations for most biomaterials has driven researchers to seek more effective tools and analysis methods. In this study, statistical models are used to study sequence features of the recently reported random heteropolymers (RHP),… ▽ More Recent years have seen substantial advances in the development of biofunctional materials using synthetic polymers. The growing problem of elusive sequence-functionality relations for most biomaterials has driven researchers to seek more effective tools and analysis methods. In this study, statistical models are used to study sequence features of the recently reported random heteropolymers (RHP), which transport protons across lipid bilayers selectively and rapidly like natural proton channels. We utilized the probabilistic graphical model framework and developed a generalized hidden semi-Markov model (GHSMM-RHP) to extract the function-determining sequence features, including the transmembrane segments within a chain and the sequence heterogeneity among different chains. We developed stochastic variational methods for efficient inference on parameter estimation and predictions, and empirically studied their computational performance from a comparative perspective on Bayesian (i.e., stochastic variational Bayes) versus frequentist (i.e., stochastic variational expectation-maximization) frameworks that have been studied separately before. The real data results agree well with the laboratory experiments, and suggest GHSMM-RHP's potential in predicting protein-like behavior at the polymer-chain level. △ Less

Submitted 5 July, 2022; originally announced July 2022.

arXiv:2203.09268 [pdf, other]

doi 10.1007/978-3-031-16446-0_40

Progressive Subsampling for Oversampled Data - Application to Quantitative MRI

Authors: Stefano B. Blumberg, Hongxiang Lin, Francesco Grussu, Yukun Zhou, Matteo Figini, Daniel C. Alexander

Abstract: We present PROSUB: PROgressive SUBsampling, a deep learning based, automated methodology that subsamples an oversampled data set (e.g. multi-channeled 3D images) with minimal loss of information. We build upon a recent dual-network approach that won the MICCAI MUlti-DIffusion (MUDI) quantitative MRI measurement sampling-reconstruction challenge, but suffers from deep learning training instability,… ▽ More We present PROSUB: PROgressive SUBsampling, a deep learning based, automated methodology that subsamples an oversampled data set (e.g. multi-channeled 3D images) with minimal loss of information. We build upon a recent dual-network approach that won the MICCAI MUlti-DIffusion (MUDI) quantitative MRI measurement sampling-reconstruction challenge, but suffers from deep learning training instability, by subsampling with a hard decision boundary. PROSUB uses the paradigm of recursive feature elimination (RFE) and progressively subsamples measurements during deep learning training, improving optimization stability. PROSUB also integrates a neural architecture search (NAS) paradigm, allowing the network architecture hyperparameters to respond to the subsampling process. We show PROSUB outperforms the winner of the MUDI MICCAI challenge, producing large improvements >18% MSE on the MUDI challenge sub-tasks and qualitative improvements on downstream processes useful for clinical applications. We also show the benefits of incorporating NAS and analyze the effect of PROSUB's components. As our method generalizes to other problems beyond MRI measurement selection-reconstruction, our code is https://github.com/sbb-gh/PROSUB △ Less

Submitted 11 October, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

Comments: Accepted In: Medical Image Computing and Computer Assisted Intervention (MICCAI) 2022

arXiv:2201.02273 [pdf, other]

PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences

Authors: Sarwan Ali, Babatunde Bello, Prakash Chourasia, Ria Thazhe Punathil, Yi**g Zhou, Murray Patterson

Abstract: COVID-19 pandemic, is still unknown and is an important open question. There are speculations that bats are a possible origin. Likewise, there are many closely related (corona-) viruses, such as SARS, which was found to be transmitted through civets. The study of the different hosts which can be potential carriers and transmitters of deadly viruses to humans is crucial to understanding, mitigating… ▽ More COVID-19 pandemic, is still unknown and is an important open question. There are speculations that bats are a possible origin. Likewise, there are many closely related (corona-) viruses, such as SARS, which was found to be transmitted through civets. The study of the different hosts which can be potential carriers and transmitters of deadly viruses to humans is crucial to understanding, mitigating and preventing current and future pandemics. In coronaviruses, the surface (S) protein, or spike protein, is an important part of determining host specificity since it is the point of contact between the virus and the host cell membrane. In this paper, we classify the hosts of over five thousand coronaviruses from their spike protein sequences, segregating them into clusters of distinct hosts among avians, bats, camels, swines, humans and weasels, to name a few. We propose a feature embedding based on the well-known position-weight matrix (PWM), which we call PWM2Vec, and use to generate feature vectors from the spike protein sequences of these coronaviruses. While our embedding is inspired by the success of PWMs in biological applications such as determining protein function, or identifying transcription factor binding sites, we are the first (to the best of our knowledge) to use PWMs in the context of host classification from viral sequences to generate a fixed-length feature vector representation. The results on the real world data show that in using PWM2Vec, we are able to perform comparably well as compared to baseline models. We also measure the importance of different amino acids using information gain to show the amino acids which are important for predicting the host of a given coronavirus. △ Less

Submitted 6 January, 2022; originally announced January 2022.

arXiv:2106.13397 [pdf, other]

Pheno-Mapper: An Interactive Toolbox for the Visual Exploration of Phenomics Data

Authors: Youjia Zhou, Methun Kamruzzaman, Patrick Schnable, Bala Krishnamoorthy, Ananth Kalyanaraman, Bei Wang

Abstract: High-throughput technologies to collect field data have made observations possible at scale in several branches of life sciences. The data collected can range from the molecular level (genotypes) to physiological (phenotypic traits) and environmental observations (e.g., weather, soil conditions). These vast swathes of data, collectively referred to as phenomics data, represent a treasure trove of… ▽ More High-throughput technologies to collect field data have made observations possible at scale in several branches of life sciences. The data collected can range from the molecular level (genotypes) to physiological (phenotypic traits) and environmental observations (e.g., weather, soil conditions). These vast swathes of data, collectively referred to as phenomics data, represent a treasure trove of key scientific knowledge on the dynamics of the underlying biological system. However, extracting information and insights from these complex datasets remains a significant challenge owing to their multidimensionality and lack of prior knowledge about their complex structure. In this paper, we present Pheno-Mapper, an interactive toolbox for the exploratory analysis and visualization of large-scale phenomics data. Our approach uses the mapper framework to perform a topological analysis of the data, and subsequently render visual representations with built-in data analysis and machine learning capabilities. We demonstrate the utility of this new tool on real-world plant (e.g., maize) phenomics datasets. In comparison to existing approaches, the main advantage of Pheno-Mapper is that it provides rich, interactive capabilities in the exploratory analysis of phenomics data, and it integrates visual analytics with data analysis and machine learning in an easily extensible way. In particular, Pheno-Mapper allows the interactive selection of subpopulations guided by a topological summary of the data and applies data mining and machine learning to these selected subpopulations for in-depth exploration. △ Less

Submitted 6 July, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

Comments: This is a preprint version. For a published version, please refer to ACM DOI: 10.1145/3459930.3469511

arXiv:2106.12608 [pdf, other]

Clinical Named Entity Recognition using Contextualized Token Representations

Authors: Yichao Zhou, Chelsea Ju, J. Harry Caufield, Kevin Shih, Calvin Chen, Yizhou Sun, Kai-Wei Chang, Peipei **, Wei Wang

Abstract: The clinical named entity recognition (CNER) task seeks to locate and classify clinical terminologies into predefined categories, such as diagnostic procedure, disease disorder, severity, medication, medication dosage, and sign symptom. CNER facilitates the study of side-effect on medications including identification of novel phenomena and human-focused information extraction. Existing approaches… ▽ More The clinical named entity recognition (CNER) task seeks to locate and classify clinical terminologies into predefined categories, such as diagnostic procedure, disease disorder, severity, medication, medication dosage, and sign symptom. CNER facilitates the study of side-effect on medications including identification of novel phenomena and human-focused information extraction. Existing approaches in extracting the entities of interests focus on using static word embeddings to represent each word. However, one word can have different interpretations that depend on the context of the sentences. Evidently, static word embeddings are insufficient to integrate the diverse interpretation of a word. To overcome this challenge, the technique of contextualized word embedding has been introduced to better capture the semantic meaning of each word based on its context. Two of these language models, ELMo and Flair, have been widely used in the field of Natural Language Processing to generate the contextualized word embeddings on domain-generic documents. However, these embeddings are usually too general to capture the proximity among vocabularies of specific domains. To facilitate various downstream applications using clinical case reports (CCRs), we pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair) using the clinical-related corpus from the PubMed Central. Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models. △ Less

Submitted 23 June, 2021; originally announced June 2021.

Comments: 1 figure, 6 tables

arXiv:2103.07104 [pdf, other]

doi 10.1038/s43588-021-00130-y

Vector-based Pedestrian Navigation in Cities

Authors: Christian Bongiorno, Yulun Zhou, Marta Kryven, David Theurel, Alessandro Rizzo, Paolo Santi, Joshua Tenenbaum, Carlo Ratti

Abstract: How do pedestrians choose their paths within city street networks? Researchers have tried to shed light on this matter through strictly controlled experiments, but an ultimate answer based on real-world mobility data is still lacking. Here, we analyze salient features of human path planning through a statistical analysis of a massive dataset of GPS traces, which reveals that (1) people increasingl… ▽ More How do pedestrians choose their paths within city street networks? Researchers have tried to shed light on this matter through strictly controlled experiments, but an ultimate answer based on real-world mobility data is still lacking. Here, we analyze salient features of human path planning through a statistical analysis of a massive dataset of GPS traces, which reveals that (1) people increasingly deviate from the shortest path when the distance between origin and destination increases, and (2) chosen paths are statistically different when origin and destination are swapped. We posit that direction to goal is a main driver of path planning and develop a vector-based navigation model that is a statistically better predictor of human paths than a model based on minimizing distance with stochastic effects. Our findings generalize across two major US cities with different street networks, hinting to the fact that vector-based navigation might be a universal property of human path planning. △ Less

Submitted 23 October, 2021; v1 submitted 12 March, 2021; originally announced March 2021.

MSC Class: 91D10

arXiv:2011.11020 [pdf, other]

Cryo-ZSSR: multiple-image super-resolution based on deep internal learning

Authors: Qinwen Huang, Ye Zhou, Xiaochen Du, Reed Chen, Jianyou Wang, Cynthia Rudin, Alberto Bartesaghi

Abstract: Single-particle cryo-electron microscopy (cryo-EM) is an emerging imaging modality capable of visualizing proteins and macro-molecular complexes at near-atomic resolution. The low electron-doses used to prevent sample radiation damage, result in images where the power of the noise is 100 times greater than the power of the signal. To overcome the low-SNRs, hundreds of thousands of particle project… ▽ More Single-particle cryo-electron microscopy (cryo-EM) is an emerging imaging modality capable of visualizing proteins and macro-molecular complexes at near-atomic resolution. The low electron-doses used to prevent sample radiation damage, result in images where the power of the noise is 100 times greater than the power of the signal. To overcome the low-SNRs, hundreds of thousands of particle projections acquired over several days of data collection are averaged in 3D to determine the structure of interest. Meanwhile, recent image super-resolution (SR) techniques based on neural networks have shown state of the art performance on natural images. Building on these advances, we present a multiple-image SR algorithm based on deep internal learning designed specifically to work under low-SNR conditions. Our approach leverages the internal image statistics of cryo-EM movies and does not require training on ground-truth data. When applied to a single-particle dataset of apoferritin, we show that the resolution of 3D structures obtained from SR micrographs can surpass the limits imposed by the imaging system. Our results indicate that the combination of low magnification imaging with image SR has the potential to accelerate cryo-EM data collection without sacrificing resolution. △ Less

Submitted 22 November, 2020; originally announced November 2020.

Comments: 11 pages, 4 figures

arXiv:2007.01424 [pdf, ps, other]

Active Control and Sustained Oscillations in actSIS Epidemic Dynamics

Authors: Yunxiu Zhou, Simon A. Levin, Naomi E. Leonard

Abstract: An actively controlled Susceptible-Infected-Susceptible (actSIS) contagion model is presented for studying epidemic dynamics with continuous-time feedback control of infection rates. Our work is inspired by the observation that epidemics can be controlled through decentralized disease-control strategies such as quarantining, sheltering in place, social distancing, etc., where individuals actively… ▽ More An actively controlled Susceptible-Infected-Susceptible (actSIS) contagion model is presented for studying epidemic dynamics with continuous-time feedback control of infection rates. Our work is inspired by the observation that epidemics can be controlled through decentralized disease-control strategies such as quarantining, sheltering in place, social distancing, etc., where individuals actively modify their contact rates with others in response to observations of infection levels in the population. Accounting for a time lag in observations and categorizing individuals into distinct sub-populations based on their risk profiles, we show that the actSIS model manifests qualitatively different features as compared with the SIS model. In a homogeneous population of risk-averters, the endemic equilibrium is always reduced, although the transient infection level can exhibit overshoot or undershoot. In a homogeneous population of risk-tolerating individuals, the system exhibits bistability, which can also lead to reduced infection. For a heterogeneous population comprised of risk-tolerators and risk-averters, we prove conditions on model parameters for the existence of a Hopf bifurcation and sustained oscillations in the infected population. △ Less

Submitted 2 July, 2020; originally announced July 2020.

arXiv:2006.14685 [pdf, other]

A Computational Model of Protein Induced Membrane Morphology with Geodesic Curvature Driven Protein-Membrane Interface

Authors: Y. C. Zhou, David Argudo, Frank Marcoline, Michael Grabe

Abstract: Continuum or hybrid modeling of bilayer membrane morphological dynamics induced by embedded proteins necessitates the identification of protein-membrane interfaces and coupling of deformations of two surfaces. In this article we developed (i) a minimal total geodesic curvature model to describe these interfaces, and (ii) a numerical one-one map** between two surface through a conformal map** o… ▽ More Continuum or hybrid modeling of bilayer membrane morphological dynamics induced by embedded proteins necessitates the identification of protein-membrane interfaces and coupling of deformations of two surfaces. In this article we developed (i) a minimal total geodesic curvature model to describe these interfaces, and (ii) a numerical one-one map** between two surface through a conformal map** of each surface to the common middle annulus. Our work provides the first computational tractable approach for determining the interfaces between bilayer and embedded proteins. The one-one map** allows a convenient coupling of the morphology of two surfaces. We integrated these two new developments into the energetic model of protein-membrane interactions, and developed the full set of numerical methods for the coupled system. Numerical examples are presented to demonstrate (1) the efficiency and robustness of our methods in locating the curves with minimal total geodesic curvature on highly complicated protein surfaces, (2) the usefulness of these interfaces as interior boundaries for membrane deformation, and (3) the rich morphology of bilayer surfaces for different protein-membrane interfaces. △ Less

Submitted 25 June, 2020; originally announced June 2020.

MSC Class: 53B10; 65D18; 92C15

arXiv:2006.14617 [pdf, other]

doi 10.1016/j.jcp.2020.109725

Enriched Gradient Recovery for Interface Solutions of the Poisson-Boltzmann Equation

Authors: George Borleske, Y. C. Zhou

Abstract: Accurate calculation of electrostatic potential and gradient on the molecular surface is highly desirable for the continuum and hybrid modeling of large scale deformation of biomolecules in solvent. In this article a new numerical method is proposed to calculate these quantities on the dielectric interface from the numerical solutions of the Poisson-Boltzmann equation. Our method reconstructs a po… ▽ More Accurate calculation of electrostatic potential and gradient on the molecular surface is highly desirable for the continuum and hybrid modeling of large scale deformation of biomolecules in solvent. In this article a new numerical method is proposed to calculate these quantities on the dielectric interface from the numerical solutions of the Poisson-Boltzmann equation. Our method reconstructs a potential field locally in the least square sense on the polynomial basis enriched with Green's functions, the latter characterize the Coulomb potential induced by charges near the position of reconstruction. This enrichment resembles the decomposition of electrostatic potential into singular Coulomb component and the regular reaction field in the Generalized Born methods. Numerical experiments demonstrate that the enrichment recovery produces drastically more accurate and stable potential gradients on molecular surfaces compared to classical recovery techniques. △ Less

Submitted 25 June, 2020; originally announced June 2020.

MSC Class: 53B10; 65D18; 92C15

arXiv:2005.10831 [pdf]

Repurpose Open Data to Discover Therapeutics for COVID-19 using Deep Learning

Authors: Xiangxiang Zeng, Xiang Song, Tengfei Ma, Xiaoqin Pan, Yadi Zhou, Yuan Hou, Zheng Zhang, George Karypis, Feixiong Cheng

Abstract: There have been more than 850,000 confirmed cases and over 48,000 deaths from the human coronavirus disease 2019 (COVID-19) pandemic, caused by novel severe acute respiratory syndrome coronavirus (SARS-CoV-2), in the United States alone. However, there are currently no proven effective medications against COVID-19. Drug repurposing offers a promising way for the development of prevention and treat… ▽ More There have been more than 850,000 confirmed cases and over 48,000 deaths from the human coronavirus disease 2019 (COVID-19) pandemic, caused by novel severe acute respiratory syndrome coronavirus (SARS-CoV-2), in the United States alone. However, there are currently no proven effective medications against COVID-19. Drug repurposing offers a promising way for the development of prevention and treatment strategies for COVID-19. This study reports an integrative, network-based deep learning methodology to identify repurposable drugs for COVID-19 (termed CoV-KGE). Specifically, we built a comprehensive knowledge graph that includes 15 million edges across 39 types of relationships connecting drugs, diseases, genes, pathways, and expressions, from a large scientific corpus of 24 million PubMed publications. Using Amazon AWS computing resources, we identified 41 repurposable drugs (including indomethacin, toremifene and niclosamide) whose therapeutic association with COVID-19 were validated by transcriptomic and proteomic data in SARS-CoV-2 infected human cells and data from ongoing clinical trials. While this study, by no means recommends specific drugs, it demonstrates a powerful deep learning methodology to prioritize existing drugs for further investigation, which holds the potential of accelerating therapeutic development for COVID-19. △ Less

Submitted 21 May, 2020; originally announced May 2020.

MSC Class: I.2.1

arXiv:2005.04224 [pdf]

doi 10.1080/17538947.2020.1809723

Taking the pulse of COVID-19: A spatiotemporal perspective

Authors: Chaowei Yang, Dexuan Sha, Qian Liu, Yun Li, Hai Lan, Weihe Wendy Guan, Tao Hu, Zhenlong Li, Zhiran Zhang, John Hoot Thompson, Zifu Wang, David Wong, Shiyang Ruan, Manzhu Yu, Douglas Richardson, Luyao Zhang, Ruizhi Hou, You Zhou, Cheng Zhong, Yifei Tian, Fayez Beaini, Kyla Carte, Colin Flynn, Wei Liu, Dieter Pfoser , et al. (10 additional authors not shown)

Abstract: The sudden outbreak of the Coronavirus disease (COVID-19) swept across the world in early 2020, triggering the lockdowns of several billion people across many countries, including China, Spain, India, the U.K., Italy, France, Germany, and most states of the U.S. The transmission of the virus accelerated rapidly with the most confirmed cases in the U.S., and New York City became an epicenter of the… ▽ More The sudden outbreak of the Coronavirus disease (COVID-19) swept across the world in early 2020, triggering the lockdowns of several billion people across many countries, including China, Spain, India, the U.K., Italy, France, Germany, and most states of the U.S. The transmission of the virus accelerated rapidly with the most confirmed cases in the U.S., and New York City became an epicenter of the pandemic by the end of March. In response to this national and global emergency, the NSF Spatiotemporal Innovation Center brought together a taskforce of international researchers and assembled implemented strategies to rapidly respond to this crisis, for supporting research, saving lives, and protecting the health of global citizens. This perspective paper presents our collective view on the global health emergency and our effort in collecting, analyzing, and sharing relevant data on global policy and government responses, geospatial indicators of the outbreak and evolving forecasts; in develo** research capabilities and mitigation measures with global scientists, promoting collaborative research on outbreak dynamics, and reflecting on the dynamic responses from human societies. △ Less

Submitted 8 May, 2020; originally announced May 2020.

Comments: 27 pages, 18 figures. International Journal of Digital Earth (2020)

arXiv:2002.09640 [pdf, other]

doi 10.3390/healthcare9010061

The Outbreak Evaluation of COVID-19 in Wuhan District of China

Authors: Yimin Zhou, Zuguo Chen, Xiangdong Wu, Zengwu Tian, Liang Cheng, Lingjian Ye

Abstract: There were 27 novel coronavirus pneumonia cases found in Wuhan, China in December 2019, named as 2019-nCoV temporarily and COVID-19 formally by WHO on 11 February, 2020. In December 2019 and January 2020, COVID-19 has spread in large scale among the population, which brought terrible disaster to the life and property of the Chinese people. In this paper, we will first analyze the feature and patte… ▽ More There were 27 novel coronavirus pneumonia cases found in Wuhan, China in December 2019, named as 2019-nCoV temporarily and COVID-19 formally by WHO on 11 February, 2020. In December 2019 and January 2020, COVID-19 has spread in large scale among the population, which brought terrible disaster to the life and property of the Chinese people. In this paper, we will first analyze the feature and pattern of the virus transmission, and discuss the key impact factors and uncontrollable factors of epidemic transmission based on public data. Then the virus transmission can be modelled and used for the inflexion and extinction period of epidemic development so as to provide theoretical support for the Chinese government in the decision-making of epidemic prevention and recovery of economic production. Further, this paper demonstrates the effectiveness of the prevention methods taken by the Chinese government such as multi-level administrative region isolation. It is of great importance and practical significance for the world to deal with public health emergencies. △ Less

Submitted 22 February, 2020; originally announced February 2020.

Comments: 7 pages, 18 figures

Journal ref: Healthcare. Multidisciplinary Digital Publishing Institute, 2021, 9(1): 61

arXiv:1910.01724 [pdf, other]

Sparse Identification of Contrast Gain Control in the Fruit Fly Photoreceptor and Amacrine Cell Layer

Authors: Aurel A. Lazar, Nikul H. Ukani, Yiyin Zhou

Abstract: The fruit fly's natural visual environment is often characterized by light intensities ranging across several orders of magnitude and by rapidly varying contrast across space and time. Fruit fly photoreceptors robustly transduce and, in conjunction with amacrine cells, process visual scenes and provide the resulting signal to downstream targets. Here we model the first step of visual processing in… ▽ More The fruit fly's natural visual environment is often characterized by light intensities ranging across several orders of magnitude and by rapidly varying contrast across space and time. Fruit fly photoreceptors robustly transduce and, in conjunction with amacrine cells, process visual scenes and provide the resulting signal to downstream targets. Here we model the first step of visual processing in the photoreceptor-amacrine cell layer. We propose a novel divisive normalization processor (DNP) for modeling the computation taking place in the photoreceptor-amacrine cell layer. The DNP explicitly models the photoreceptor feedforward and temporal feedback processing paths and the spatio-temporal feedback path of the amacrine cells. We then formally characterize the contrast gain control of the DNP and provide sparse identification algorithms that can efficiently identify each the feedforward and feedback DNP components. The algorithms presented here are the first demonstration of tractable and robust identification of the components of a divisive normalization processor. The sparse identification algorithms can be readily employed in experimental settings, and their effectiveness is demonstrated with several examples. △ Less

Submitted 3 October, 2019; originally announced October 2019.

arXiv:1909.06504 [pdf, other]

The human monogamy behavior can influence the transmission of AIDS

Authors: Chentong Li, Jiawei liu, Yicang Zhou

Abstract: In this letter, we mainly consider an MSM (men have sex with men) network to analysis how monogamy behavior can influence the transmission of HIV. By calculating and analyzing the basic reproductive number of that network, we find the condition for when the monogamy rate can have a positive influence on controlling the transmission of HIV. Numerical simulations are also done to illustrate that mon… ▽ More In this letter, we mainly consider an MSM (men have sex with men) network to analysis how monogamy behavior can influence the transmission of HIV. By calculating and analyzing the basic reproductive number of that network, we find the condition for when the monogamy rate can have a positive influence on controlling the transmission of HIV. Numerical simulations are also done to illustrate that monogamy can influence the transmission process of HIV. △ Less

Submitted 13 September, 2019; originally announced September 2019.

arXiv:1810.02037 [pdf, other]

A statistical normalization method and differential expression analysis for RNA-seq data between different species

Authors: Yan Zhou, Jiadi Zhu, Tiejun Tong, Junhui Wang, Bingqing Lin, Jun Zhang

Abstract: Background: High-throughput techniques bring novel tools but also statistical challenges to genomic research. Identifying genes with differential expression between different species is an effective way to discover evolutionarily conserved transcriptional responses. To remove systematic variation between different species for a fair comparison, the normalization procedure serves as a crucial pre-p… ▽ More Background: High-throughput techniques bring novel tools but also statistical challenges to genomic research. Identifying genes with differential expression between different species is an effective way to discover evolutionarily conserved transcriptional responses. To remove systematic variation between different species for a fair comparison, the normalization procedure serves as a crucial pre-processing step that adjusts for the varying sample sequencing depths and other confounding technical effects. Results: In this paper, we propose a scale based normalization (SCBN) method by taking into account the available knowledge of conserved orthologous genes and hypothesis testing framework. Considering the different gene lengths and unmapped genes between different species, we formulate the problem from the perspective of hypothesis testing and search for the optimal scaling factor that minimizes the deviation between the empirical and nominal type I errors. Conclusions: Simulation studies show that the proposed method performs significantly better than the existing competitor in a wide range of settings. An RNA-seq dataset of different species is also analyzed and it coincides with the conclusion that the proposed method outperforms the existing method. For practical applications, we have also developed an R package named "SCBN" and the software is available at http://www.bioconductor.org/packages/devel/bioc/html/SCBN.html. △ Less

Submitted 3 October, 2018; originally announced October 2018.

arXiv:1805.02995 [pdf, other]

A Spiking Neural Dynamical Drift-Diffusion Model on Collective Decision Making with Self-Organized Criticality

Authors: Yanlin Zhou, Chen Peng, Qing Hui

Abstract: This article proposes a novel collective decision making scheme to solve the multi-agent drift-diffusion-model problem with the help of spiking neural networks. The exponential integrate-and-fire model is used here to capture the individual dynamics of each agent in the system, and we name this new model as Exponential Decision Making (EDM) model. We demonstrate analytically and experimentally tha… ▽ More This article proposes a novel collective decision making scheme to solve the multi-agent drift-diffusion-model problem with the help of spiking neural networks. The exponential integrate-and-fire model is used here to capture the individual dynamics of each agent in the system, and we name this new model as Exponential Decision Making (EDM) model. We demonstrate analytically and experimentally that the gating variable for instantaneous activation follows Boltzmann probability distribution, and the collective system reaches meta-stable critical states under the Markov chain premises. With mean field analysis, we derive the global criticality from local dynamics and achieve a power law distribution. Critical behavior of EDM exhibits the convergence dynamics of Boltzmann distribution, and we conclude that the EDM model inherits the property of self-organized criticality, that the system will eventually evolve toward criticality. △ Less

Submitted 7 May, 2018; originally announced May 2018.

Comments: 8 pages

arXiv:1803.03335 [pdf, ps, other]

On Curvature Driven Rotational Diffusion of Protein on Membrane Surface

Authors: Y. C. Zhou

Abstract: Morphological dynamics of bilayer membrane is intrinsically coupled to the translational and orientational localization of membrane proteins. In this paper we are concerned with the orientational localization of membrane proteins in the absence of protein interaction and correlation. Entropic energy depending on the angular distribution function and the curvature energy depending on the principal… ▽ More Morphological dynamics of bilayer membrane is intrinsically coupled to the translational and orientational localization of membrane proteins. In this paper we are concerned with the orientational localization of membrane proteins in the absence of protein interaction and correlation. Entropic energy depending on the angular distribution function and the curvature energy depending on the principal curvature vectors are introduced to assemble an energy functional for the coupled system. Application of the Onsager's variational principle gives rise to a generalized Smoluchowskii equation governing the temporal and angular variations of the protein orientation. We prove the existence of the stationary solution of the equation as fixed points of a continuous nonlinear nonlocal map, and for biologically relevant conditions we obtain the uniqueness of the solution. To approximate the stationary solution in the Fourier space we construct an efficient numerical method that reduces the expansion and relates the coefficients to the modified Bessel functions of the first kind. Existence and uniqueness of the numerical solution are justified for biologically relevant conditions. △ Less

Submitted 7 March, 2018; originally announced March 2018.

MSC Class: 35A15; 35K15; 35Q99; 60J60

arXiv:1801.09858 [pdf]

doi 10.1002/hbm.24891

Decoding and map** task states of the human brain via deep learning

Authors: Xiaoxiao Wang, Xiao Liang, Zhoufan Jiang, Benedictor Alexander Nguchu, Yawen Zhou, Yanming Wang, Huijuan Wang, Yu Li, Yuying Zhu, Feng Wu, Jia-Hong Gao, Benching Qiu

Abstract: Support vector machine (SVM) based multivariate pattern analysis (MVPA) has delivered promising performance in decoding specific task states based on functional magnetic resonance imaging (fMRI) of the human brain. Conventionally, the SVM-MVPA requires careful feature selection/extraction according to expert knowledge. In this study, we propose a deep neural network (DNN) for directly decoding mul… ▽ More Support vector machine (SVM) based multivariate pattern analysis (MVPA) has delivered promising performance in decoding specific task states based on functional magnetic resonance imaging (fMRI) of the human brain. Conventionally, the SVM-MVPA requires careful feature selection/extraction according to expert knowledge. In this study, we propose a deep neural network (DNN) for directly decoding multiple brain task states from fMRI signals of the brain without any burden for feature handcrafts. We trained and tested the DNN classifier using task fMRI data from the Human Connectome Project's S1200 dataset (N=1034). In tests to verify its performance, the proposed classification method identified seven tasks with an average accuracy of 93.7%. We also showed the general applicability of the DNN for transfer learning to small datasets (N=43), a situation encountered in typical neuroscience research. The proposed method achieved an average accuracy of 89.0% and 94.7% on a working memory task and a motor classification task, respectively, higher than the accuracy of 69.2% and 68.6% obtained by the SVM-MVPA. A network visualization analysis showed that the DNN automatically detected features from areas of the brain related to each task. Without incurring the burden of handcrafting the features, the proposed deep decoding method can classify brain task states highly accurately, and is a powerful tool for fMRI researchers. △ Less

Submitted 4 December, 2019; v1 submitted 30 January, 2018; originally announced January 2018.

Comments: 27 pages, 8 figures, 4 table

arXiv:1711.00001

Gene Ontology (GO) Prediction using Machine Learning Methods

Authors: Haoze Wu, Yangyu Zhou

Abstract: We applied machine learning to predict whether a gene is involved in axon regeneration. We extracted 31 features from different databases and trained five machine learning models. Our optimal model, a Random Forest Classifier with 50 submodels, yielded a test score of 85.71%, which is 4.1% higher than the baseline score. We concluded that our models have some predictive capability. Similar methodo… ▽ More We applied machine learning to predict whether a gene is involved in axon regeneration. We extracted 31 features from different databases and trained five machine learning models. Our optimal model, a Random Forest Classifier with 50 submodels, yielded a test score of 85.71%, which is 4.1% higher than the baseline score. We concluded that our models have some predictive capability. Similar methodology and features could be applied to predict other Gene Ontology (GO) terms. △ Less

Submitted 26 September, 2019; v1 submitted 30 October, 2017; originally announced November 2017.

Comments: The results in this paper result from a biased test set, and is therefore not reliable

arXiv:1706.05783 [pdf, other]

Sparse Functional Identification of Complex Cells from Spike Times and the Decoding of Visual Stimuli

Authors: Aurel A. Lazar, Nikul H. Ukani, Yiyin Zhou

Abstract: We investigate the sparse functional identification of complex cells and the decoding of visual stimuli encoded by an ensemble of complex cells. The reconstruction algorithm of both temporal and spatio-temporal stimuli is formulated as a rank minimization problem that significantly reduces the number of sampling measurements (spikes) required for decoding. We also establish the duality between spa… ▽ More We investigate the sparse functional identification of complex cells and the decoding of visual stimuli encoded by an ensemble of complex cells. The reconstruction algorithm of both temporal and spatio-temporal stimuli is formulated as a rank minimization problem that significantly reduces the number of sampling measurements (spikes) required for decoding. We also establish the duality between sparse decoding and functional identification, and provide algorithms for identification of low-rank dendritic stimulus processors. The duality enables us to efficiently evaluate our functional identification algorithms by reconstructing novel stimuli in the input space. Finally, we demonstrate that our identification algorithms substantially outperform the generalized quadratic model, the non-linear input model and the widely used spike-triggered covariance algorithm. △ Less

Submitted 19 June, 2017; originally announced June 2017.

arXiv:1704.03941 [pdf, ps, other]

Analysis of pacemaker activity in a two-component model of some brainstem neurons

Authors: Henry C. Tuckwell, Ying Zhou, Nicholas J. Penington

Abstract: Serotonergic, noradrenergic and dopaminergic brainstem (including midbrain) neurons, often exhibit spontaneous and fairly regular spiking with frequencies of order a few Hz, though dopaminergic and noradrenergic neurons only exhibit such pacemaker-type activity in vitro or in vivo under special conditions. A large number of ion channel types contribute to such spiking so that detailed modeling o… ▽ More Serotonergic, noradrenergic and dopaminergic brainstem (including midbrain) neurons, often exhibit spontaneous and fairly regular spiking with frequencies of order a few Hz, though dopaminergic and noradrenergic neurons only exhibit such pacemaker-type activity in vitro or in vivo under special conditions. A large number of ion channel types contribute to such spiking so that detailed modeling of spike generation leads to the requirement of solving very large systems of differential equations. It is useful to have simplified mathematical models of spiking in such neurons so that, for example, features of inputs and output spike trains can be incorporated including stochastic effects for possible use in network models. In this article we investigate a simple two-component conductance-based model of the Hodgkin-Huxley type. Solutions are computed numerically and with suitably chosen parameters mimic features of pacemaker-type spiking in the above types of neurons. The effects of varying parameters is investigated in detail, it being found that there is extreme sensitivity to eight of them. Transitions from non-spiking to spiking are examined for two of these, the half-activation potential for an activation variable and the added (depolarizing) current and contrasted with the behavior of the classical Hodgkin-Huxley system. The plateaux levels between spikes can be adjusted, by changing a set of voltage parameters, to agree with experimental observations. Experiment has shown that in, in vivo, dopaminergic and noradrenergic neurons' pacemaker activity can be induced by the removal of excitatory inputs or the introduction of inhibitory ones. These properties are confirmed by mimicking opposite such changes in the model, which resulted in a change from pacemaker activity to bursting-type phenomena. △ Less

Submitted 15 April, 2017; v1 submitted 12 April, 2017; originally announced April 2017.

Comments: arXiv admin note: substantial text overlap with arXiv:1508.05468

arXiv:1611.00610 [pdf, other]

Variational Methods for Biomolecular Modeling

Authors: Guo-Wei Wei, Y. C. Zhou

Abstract: Structure, function and dynamics of many biomolecular systems can be characterized by the energetic variational principle and the corresponding systems of partial differential equations (PDEs). This principle allows us to focus on the identification of essential energetic components, the optimal parametrization of energies, and the efficient computational implementation of energy variation or mini… ▽ More Structure, function and dynamics of many biomolecular systems can be characterized by the energetic variational principle and the corresponding systems of partial differential equations (PDEs). This principle allows us to focus on the identification of essential energetic components, the optimal parametrization of energies, and the efficient computational implementation of energy variation or minimization. Given the fact that complex biomolecular systems are structurally non-uniform and their interactions occur through contact interfaces, their free energies are associated with various interfaces as well, such as solute-solvent interface, molecular binding interface, lipid domain interface, and membrane surfaces. This fact motivates the inclusion of interface geometry, particular its curvatures, to the parametrization of free energies. Applications of such interface geometry based energetic variational principles are illustrated through three concrete topics: the multiscale modeling of biomolecular electrostatics and solvation that includes the curvature energy of the molecular surface, the formation of microdomains on lipid membrane due to the geometric and molecular mechanics at the lipid interface, and the mean curvature driven protein localization on membrane surfaces. By further implicitly representing the interface using a phase field function over the entire domain, one can simulate the dynamics of the interface and the corresponding energy variation by evolving the phase field function, achieving significant reduction of the number of degrees of freedom and computational complexity. Strategies for improving the efficiency of computational implementations and for extending applications to coarse-graining or multiscale molecular simulations are outlined. △ Less

Submitted 31 October, 2016; originally announced November 2016.

Comments: 36 pages

arXiv:1611.00103 [pdf, other]

doi 10.1016/j.jcp.2017.05.029

Geodesic curvature driven surface microdomain formation

Authors: Melissa R. Adkins, Y. C. Zhou

Abstract: Lipid bilayer membranes are not uniform and clusters of lipids in a more ordered state exist within the generally disorder lipid milieu of the membrane. These clusters of ordered lipids microdomains are now referred to as lipid rafts. Recent reports attribute the formation of these microdomains to the geometrical and molecular mechanical mismatch of lipids of different species on the boundary. Her… ▽ More Lipid bilayer membranes are not uniform and clusters of lipids in a more ordered state exist within the generally disorder lipid milieu of the membrane. These clusters of ordered lipids microdomains are now referred to as lipid rafts. Recent reports attribute the formation of these microdomains to the geometrical and molecular mechanical mismatch of lipids of different species on the boundary. Here we introduce the geodesic curvature to characterize the geometry of the domain boundary, and develop a geodesic curvature energy model to describe the formation of these microdomains as a result of energy minimization. Our model accepts the intrinsic geodesic curvature of any binary lipid mixture as an input, and will produce microdomains of the given geodesic curvature as demonstrated by three sets of numerical simulations. Our results are in contrast to the surface phase separation predicted by the classical surface Cahn-Hilliard equation, which tends to generate large domains as a result of the minimizing line tension. Our model provides a direct and quantified description of the structure inhomogeneity of lipid bilayer membrane, and can be coupled to the investigations of biological processes on membranes for which such inhomogeneity plays essential roles. △ Less

Submitted 31 October, 2016; originally announced November 2016.

Comments: 31 pages

arXiv:1606.08902 [pdf, other]

Curvature-driven molecular flows on membrane surfaces

Authors: Michael Mikucki, Y. C. Zhou

Abstract: Morphological change of bilayer membrane in vivo is not a spontaneous procedure but modulated by various types of proteins in general. Most of these modulations are associated with the localization of related proteins in the crowded lipid environment in bilayer membrane. This work presents an mathematical model for the localization of multiple species of diffusion molecules on membrane surfaces. W… ▽ More Morphological change of bilayer membrane in vivo is not a spontaneous procedure but modulated by various types of proteins in general. Most of these modulations are associated with the localization of related proteins in the crowded lipid environment in bilayer membrane. This work presents an mathematical model for the localization of multiple species of diffusion molecules on membrane surfaces. We start with the energetic description of the distributions of molecules on curved membrane surface, by assembling the bending energy of bilayer membrane and the entropic energy of diffusive molecules. We introduce the spontaneous curvature of molecules in membrane, and define the spontaneous curvature of bilayer membrane as a function of the molecule concentrations on membrane surfaces. This connection gives rise to a drift-diffusion equation to govern the gradient flows of the surface molecule concentrations. We recast the energetic formulation and the related governing equations in the Eulerian framework by using a phase field function that defines the membrane morphology. Computational simulations with the proposed mathematical model and related numerical techniques predict the molecular localization on membrane surfaces at locations with preferred mean curvature. △ Less

Submitted 28 June, 2016; originally announced June 2016.

MSC Class: 35Q92; 92C40; 65M70

Showing 1–50 of 62 results for author: Zhou, Y