-
AntibodyFlow: Normalizing Flow Model for Designing Antibody Complementarity-Determining Regions
Authors:
Bohao Xu,
Yanbo Wang,
Wenyu Chen,
Shimin Shan
Abstract:
Therapeutic antibodies have been extensively studied in drug discovery and development in the past decades. Antibodies are specialized protective proteins that bind to antigens in a lock-to-key manner. The binding strength/affinity between an antibody and a specific antigen is heavily determined by the complementarity-determining regions (CDRs) on the antibodies. Existing machine learning methods…
▽ More
Therapeutic antibodies have been extensively studied in drug discovery and development in the past decades. Antibodies are specialized protective proteins that bind to antigens in a lock-to-key manner. The binding strength/affinity between an antibody and a specific antigen is heavily determined by the complementarity-determining regions (CDRs) on the antibodies. Existing machine learning methods cast in silico development of CDRs as either sequence or 3D graph (with a single chain) generation tasks and have achieved initial success. However, with CDR loops having specific geometry shapes, learning the 3D geometric structures of CDRs remains a challenge. To address this issue, we propose AntibodyFlow, a 3D flow model to design antibody CDR loops. Specifically, AntibodyFlow first constructs the distance matrix, then predicts amino acids conditioned on the distance matrix. Also, AntibodyFlow conducts constraint learning and constrained generation to ensure valid 3D structures. Experimental results indicate that AntibodyFlow outperforms the best baseline consistently with up to 16.0% relative improvement in validity rate and 24.3% relative reduction in geometric graph level error (root mean square deviation, RMSD).
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Beyond ESM2: Graph-Enhanced Protein Sequence Modeling with Efficient Clustering
Authors:
Shujian Jiao,
Bingxuan Li,
Lei Wang,
Xiao** Zhang,
Wei Chen,
Jiajie Peng,
Zhongyu Wei
Abstract:
Proteins are essential to life's processes, underpinning evolution and diversity. Advances in sequencing technology have revealed millions of proteins, underscoring the need for sophisticated pre-trained protein models for biological analysis and AI development. Facebook's ESM2, the most advanced protein language model to date, leverages a masked prediction task for unsupervised learning, crafting…
▽ More
Proteins are essential to life's processes, underpinning evolution and diversity. Advances in sequencing technology have revealed millions of proteins, underscoring the need for sophisticated pre-trained protein models for biological analysis and AI development. Facebook's ESM2, the most advanced protein language model to date, leverages a masked prediction task for unsupervised learning, crafting amino acid representations with notable biochemical accuracy. Yet, it lacks in delivering functional protein insights, signaling an opportunity for enhancing representation quality.Our study addresses this gap by incorporating protein family classification into ESM2's training.This approach, augmented with Community Propagation-Based Clustering Algorithm, improves global protein representations, while a contextual prediction task fine-tunes local amino acid accuracy. Significantly, our model achieved state-of-the-art results in several downstream experiments, demonstrating the power of combining global and local methodologies to substantially boost protein representation quality.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Density estimation for ordinal biological sequences and its applications
Authors:
Wei-Chia Chen,
Juannan Zhou,
David M. McCandlish
Abstract:
Biological sequences do not come at random. Instead, they appear with particular frequencies that reflect properties of the associated system or phenomenon. Knowing how biological sequences are distributed in sequence space is thus a natural first step toward understanding the underlying mechanisms. Here we propose a new method for inferring the probability distribution from which a sample of biol…
▽ More
Biological sequences do not come at random. Instead, they appear with particular frequencies that reflect properties of the associated system or phenomenon. Knowing how biological sequences are distributed in sequence space is thus a natural first step toward understanding the underlying mechanisms. Here we propose a new method for inferring the probability distribution from which a sample of biological sequences were drawn for the case where the sequences are composed of elements that admit a natural ordering. Our method is based on Bayesian field theory, a physics-based machine learning approach, and can be regarded as a nonparametric extension of the traditional maximum entropy estimate. As an example, we use it to analyze the aneuploidy data pertaining to gliomas from The Cancer Genome Atlas project. In addition, we demonstrate two follow-up analyses that can be performed with the resulting probability distribution. One of them is to investigate the associations among the sequence sites. This provides us a way to infer the governing biological grammar. The other is to study the global geometry of the probability landscape, which allows us to look at the problem from an evolutionary point of view. It can be seen that this methodology enables us to learn from a sample of sequences about how a biological system or phenomenon in the real world works.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Optimize Individualized Energy Delivery for Septic Patients Using Predictive Deep Learning Models: A Real World Study
Authors:
Lu Wang,
Li Chang,
Ruipeng Zhang,
Kexun Li,
Yu Wang,
Wei Chen,
Xuanlin Feng,
Mingwei Sun,
Qi Wang,
Charles Damien Lu,
Jun Zeng,
Hua Jiang
Abstract:
Background and Objectives: We aim to establish deep learning models to optimize the individualized energy delivery for septic patients. Methods and Study Design: We conducted a study of adult septic patients in Intensive Care Unit (ICU), collecting 47 indicators for 14 days. After data cleaning and preprocessing, we used stats to explore energy delivery in deceased and surviving patients. We filte…
▽ More
Background and Objectives: We aim to establish deep learning models to optimize the individualized energy delivery for septic patients. Methods and Study Design: We conducted a study of adult septic patients in Intensive Care Unit (ICU), collecting 47 indicators for 14 days. After data cleaning and preprocessing, we used stats to explore energy delivery in deceased and surviving patients. We filtered out nutrition-related features and divided the data into three metabolic phases: acute early, acute late, and rehabilitation. Models were built using data before September 2020 and validated on the rest. We then established optimal energy target models for each phase using deep learning. Results: A total of 277 patients and 3115 data were included in this study. The models indicated that the optimal energy targets in the three phases were 900kcal/d, 2300kcal/d, and 2000kcal/d, respectively. Excessive energy intake increased mortality rapidly in the early period of the acute phase. Insufficient energy in the late period of the acute phase significantly raised the mortality of septic patients. For the rehabilitation phase, too much or too little energy delivery both associated with high mortality. Conclusion: Our study established time-series prediction models for septic patients to optimize energy delivery in the ICU. This approach indicated the feasibility of develo** nutritional tools for critically ill patients. We recommended permissive underfeeding only in the early acute phase. Later, increased energy intake may improve survival and settle energy debts caused by underfeeding.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
Attention-Based CNN-BiLSTM for Sleep State Classification of Spatiotemporal Wide-Field Calcium Imaging Data
Authors:
Xiaohui Zhang,
Eric C. Landsness,
Hanyang Miao,
Wei Chen,
Michelle Tang,
Lindsey M. Brier,
Joseph P. Culver,
**-Moo Lee,
Mark A. Anastasio
Abstract:
Background: Wide-field calcium imaging (WFCI) with genetically encoded calcium indicators allows for spatiotemporal recordings of neuronal activity in mice. When applied to the study of sleep, WFCI data are manually scored into the sleep states of wakefulness, non-REM (NREM) and REM by use of adjunct EEG and EMG recordings. However, this process is time-consuming, invasive and often suffers from l…
▽ More
Background: Wide-field calcium imaging (WFCI) with genetically encoded calcium indicators allows for spatiotemporal recordings of neuronal activity in mice. When applied to the study of sleep, WFCI data are manually scored into the sleep states of wakefulness, non-REM (NREM) and REM by use of adjunct EEG and EMG recordings. However, this process is time-consuming, invasive and often suffers from low inter- and intra-rater reliability. Therefore, an automated sleep state classification method that operates on spatiotemporal WFCI data is desired. New Method: A hybrid network architecture consisting of a convolutional neural network (CNN) to extract spatial features of image frames and a bidirectional long short-term memory network (BiLSTM) with attention mechanism to identify temporal dependencies among different time points was proposed to classify WFCI data into states of wakefulness, NREM and REM sleep. Results: Sleep states were classified with an accuracy of 84% and Cohen's kappa of 0.64. Gradient-weighted class activation maps revealed that the frontal region of the cortex carries more importance when classifying WFCI data into NREM sleep while posterior area contributes most to the identification of wakefulness. The attention scores indicated that the proposed network focuses on short- and long-range temporal dependency in a state-specific manner. Comparison with Existing Method: On a 3-hour WFCI recording, the CNN-BiLSTM achieved a kappa of 0.67, comparable to a kappa of 0.65 corresponding to the human EEG/EMG-based scoring. Conclusions: The CNN-BiLSTM effectively classifies sleep states from spatiotemporal WFCI data and will enable broader application of WFCI in sleep.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization
Authors:
Jiahao Qiu,
Hui Yuan,
**ghong Zhang,
Wentao Chen,
Huazheng Wang,
Mengdi Wang
Abstract:
While modern biotechnologies allow synthesizing new proteins and function measurements at scale, efficiently exploring a protein sequence space and engineering it remains a daunting task due to the vast sequence space of any given protein. Protein engineering is typically conducted through an iterative process of adding mutations to the wild-type or lead sequences, recombination of mutations, and…
▽ More
While modern biotechnologies allow synthesizing new proteins and function measurements at scale, efficiently exploring a protein sequence space and engineering it remains a daunting task due to the vast sequence space of any given protein. Protein engineering is typically conducted through an iterative process of adding mutations to the wild-type or lead sequences, recombination of mutations, and running new rounds of screening. To enhance the efficiency of such a process, we propose a tree search-based bandit learning method, which expands a tree starting from the initial sequence with the guidance of a bandit machine learning model. Under simplified assumptions and a Gaussian Process prior, we provide theoretical analysis and a Bayesian regret bound, demonstrating that the combination of local search and bandit learning method can efficiently discover a near-optimal design. The full algorithm is compatible with a suite of randomized tree search heuristics, machine learning models, pre-trained embeddings, and bandit techniques. We test various instances of the algorithm across benchmark protein datasets using simulated screens. Experiment results demonstrate that the algorithm is both sample-efficient and able to find top designs using reasonably small mutation counts.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Microfluidics for Hydrodynamics Investigations of Sand Dollar Larvae
Authors:
Wesley A. Chen,
Bryant A. Lopez,
Haley B. Obenshain,
Moses Villeda,
Brian T. Le,
Brenda AAB. Ametepe,
Ariana Lee,
Douglas A. Pace,
Siavash Ahrar
Abstract:
The life cycle of most marine invertebrates includes a planktonic larval stage before metamorphosis to bottom-dwelling adulthood. During larval stage, ciliary-mediated activity enables feeding (capture unicellular algae) and transport of materials (oxygen) required for the larva's growth, development, and successful metamorphosis. Investigating the underlying hydrodynamics of these behaviors is va…
▽ More
The life cycle of most marine invertebrates includes a planktonic larval stage before metamorphosis to bottom-dwelling adulthood. During larval stage, ciliary-mediated activity enables feeding (capture unicellular algae) and transport of materials (oxygen) required for the larva's growth, development, and successful metamorphosis. Investigating the underlying hydrodynamics of these behaviors is valuable for addressing fundamental biological questions (e.g., phenotypic plasticity) and advancing engineering applications. In this work, we combined microfluidics and fluorescence microscopy as a miniaturized PIV (mPIV) to study ciliary-medicated hydrodynamics during suspension feeding in sand dollar larvae (Dendraster excentricus). First, we confirmed the approach's feasibility by examining the underlying hydrodynamics (vortex patterns) for low- and high-fed larvae. Next, ciliary hydrodynamics were tracked from 11 days post-fertilization (DPF) to 20 DPF for 21 low-fed larvae. Microfluidics enabled the examination of baseline activities (without external flow) and behaviors in the presence of environmental cues (external flow). A library of qualitative vortex patterns and quantitative hydrodynamics was generated and shared as a stand alone repository. Results from mPIV (velocities) were used to examine the role of ciliary activity in transporting materials (oxygen). Given the laminar flow and the viscosity-dominated environments surrounding the larvae, overcoming the diffusive boundary layer is critical for the organism's survival. Peclet number analysis for oxygen transport suggested that ciliary velocities help overcome the diffusion dominated transport (max Pe numbers between 30-60). Microfluidics serving as mPIV provided a scalable and accessible approach for investigating the ciliary hydrodynamics of marine organisms.
△ Less
Submitted 29 December, 2023;
originally announced January 2024.
-
DeepROCK: Error-controlled interaction detection in deep neural networks
Authors:
Winston Chen,
William Stafford Noble,
Yang Young Lu
Abstract:
The complexity of deep neural networks (DNNs) makes them powerful but also makes them challenging to interpret, hindering their applicability in error-intolerant domains. Existing methods attempt to reason about the internal mechanism of DNNs by identifying feature interactions that influence prediction outcomes. However, such methods typically lack a systematic strategy to prioritize interactions…
▽ More
The complexity of deep neural networks (DNNs) makes them powerful but also makes them challenging to interpret, hindering their applicability in error-intolerant domains. Existing methods attempt to reason about the internal mechanism of DNNs by identifying feature interactions that influence prediction outcomes. However, such methods typically lack a systematic strategy to prioritize interactions while controlling confidence levels, making them difficult to apply in practice for scientific discovery and hypothesis validation. In this paper, we introduce a method, called DeepROCK, to address this limitation by using knockoffs, which are dummy variables that are designed to mimic the dependence structure of a given set of features while being conditionally independent of the response. Together with a novel DNN architecture involving a pairwise-coupling layer, DeepROCK jointly controls the false discovery rate (FDR) and maximizes statistical power. In addition, we identify a challenge in correctly controlling FDR using off-the-shelf feature interaction importance measures. DeepROCK overcomes this challenge by proposing a calibration procedure applied to existing interaction importance measures to make the FDR under control at a target level. Finally, we validate the effectiveness of DeepROCK through extensive experiments on simulated and real datasets.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Improved Cryo-EM Pose Estimation and 3D Classification through Latent-Space Disentanglement
Authors:
Weijie Chen,
Yuhang Wang,
Lin Yao
Abstract:
Due to the extremely low signal-to-noise ratio (SNR) and unknown poses (projection angles and image shifts) in cryo-electron microscopy (cryo-EM) experiments, reconstructing 3D volumes from 2D images is very challenging. In addition to these challenges, heterogeneous cryo-EM reconstruction requires conformational classification. In popular cryo-EM reconstruction algorithms, poses and conformation…
▽ More
Due to the extremely low signal-to-noise ratio (SNR) and unknown poses (projection angles and image shifts) in cryo-electron microscopy (cryo-EM) experiments, reconstructing 3D volumes from 2D images is very challenging. In addition to these challenges, heterogeneous cryo-EM reconstruction requires conformational classification. In popular cryo-EM reconstruction algorithms, poses and conformation classification labels must be predicted for every input cryo-EM image, which can be computationally costly for large datasets. An emerging class of methods adopted the amortized inference approach. In these methods, only a subset of the input dataset is needed to train neural networks for the estimation of poses and conformations. Once trained, these neural networks can make pose/conformation predictions and 3D reconstructions at low cost for the entire dataset during inference. Unfortunately, when facing heterogeneous reconstruction tasks, it is hard for current amortized-inference-based methods to effectively estimate the conformational distribution and poses from entangled latent variables. Here, we propose a self-supervised variational autoencoder architecture called "HetACUMN" based on amortized inference. We employed an auxiliary conditional pose prediction task by inverting the order of encoder-decoder to explicitly enforce the disentanglement of conformation and pose predictions. Results on simulated datasets show that HetACUMN generated more accurate conformational classifications than other amortized or non-amortized methods. Furthermore, we show that HetACUMN is capable of performing heterogeneous 3D reconstructions of a real experimental dataset.
△ Less
Submitted 22 April, 2024; v1 submitted 9 August, 2023;
originally announced August 2023.
-
FFF: Fragments-Guided Flexible Fitting for Building Complete Protein Structures
Authors:
Weijie Chen,
Xinyan Wang,
Yuhang Wang
Abstract:
Cryo-electron microscopy (cryo-EM) is a technique for reconstructing the 3-dimensional (3D) structure of biomolecules (especially large protein complexes and molecular assemblies). As the resolution increases to the near-atomic scale, building protein structures de novo from cryo-EM maps becomes possible. Recently, recognition-based de novo building methods have shown the potential to streamline t…
▽ More
Cryo-electron microscopy (cryo-EM) is a technique for reconstructing the 3-dimensional (3D) structure of biomolecules (especially large protein complexes and molecular assemblies). As the resolution increases to the near-atomic scale, building protein structures de novo from cryo-EM maps becomes possible. Recently, recognition-based de novo building methods have shown the potential to streamline this process. However, it cannot build a complete structure due to the low signal-to-noise ratio (SNR) problem. At the same time, AlphaFold has led to a great breakthrough in predicting protein structures. This has inspired us to combine fragment recognition and structure prediction methods to build a complete structure. In this paper, we propose a new method named FFF that bridges protein structure prediction and protein structure recognition with flexible fitting. First, a multi-level recognition network is used to capture various structural features from the input 3D cryo-EM map. Next, protein structural fragments are generated using pseudo peptide vectors and a protein sequence alignment method based on these extracted features. Finally, a complete structural model is constructed using the predicted protein fragments via flexible fitting. Based on our benchmark tests, FFF outperforms the baseline methods for building complete protein structures.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Transferable Graph Neural Fingerprint Models for Quick Response to Future Bio-Threats
Authors:
Wei Chen,
Yihui Ren,
Ai Kagawa,
Matthew R. Carbone,
Samuel Yen-Chi Chen,
Xiaohui Qu,
Shinjae Yoo,
Austin Clyde,
Arvind Ramanathan,
Rick L. Stevens,
Hubertus J. J. van Dam,
Deyu Lu
Abstract:
Fast screening of drug molecules based on the ligand binding affinity is an important step in the drug discovery pipeline. Graph neural fingerprint is a promising method for develo** molecular docking surrogates with high throughput and great fidelity. In this study, we built a COVID-19 drug docking dataset of about 300,000 drug candidates on 23 coronavirus protein targets. With this dataset, we…
▽ More
Fast screening of drug molecules based on the ligand binding affinity is an important step in the drug discovery pipeline. Graph neural fingerprint is a promising method for develo** molecular docking surrogates with high throughput and great fidelity. In this study, we built a COVID-19 drug docking dataset of about 300,000 drug candidates on 23 coronavirus protein targets. With this dataset, we trained graph neural fingerprint docking models for high-throughput virtual COVID-19 drug screening. The graph neural fingerprint models yield high prediction accuracy on docking scores with the mean squared error lower than $0.21$ kcal/mol for most of the docking targets, showing significant improvement over conventional circular fingerprint methods. To make the neural fingerprints transferable for unknown targets, we also propose a transferable graph neural fingerprint method trained on multiple targets. With comparable accuracy to target-specific graph neural fingerprint models, the transferable model exhibits superb training and data efficiency. We highlight that the impact of this study extends beyond COVID-19 dataset, as our approach for fast virtual ligand screening can be easily adapted and integrated into a general machine learning-accelerated pipeline to battle future bio-threats.
△ Less
Submitted 14 September, 2023; v1 submitted 17 July, 2023;
originally announced August 2023.
-
Emergent Bio-Functional Similarities in a Cortical-Spike-Train-Decoding Spiking Neural Network Facilitate Predictions of Neural Computation
Authors:
Tengjun Liu,
Yansong Chua,
Yiwei Zhang,
Yuxiao Ning,
Pengfu Liu,
Guihua Wan,
Zijun Wan,
Shaomin Zhang,
Weidong Chen
Abstract:
Despite its better bio-plausibility, goal-driven spiking neural network (SNN) has not achieved applicable performance for classifying biological spike trains, and showed little bio-functional similarities compared to traditional artificial neural networks. In this study, we proposed the motorSRNN, a recurrent SNN topologically inspired by the neural motor circuit of primates. By employing the moto…
▽ More
Despite its better bio-plausibility, goal-driven spiking neural network (SNN) has not achieved applicable performance for classifying biological spike trains, and showed little bio-functional similarities compared to traditional artificial neural networks. In this study, we proposed the motorSRNN, a recurrent SNN topologically inspired by the neural motor circuit of primates. By employing the motorSRNN in decoding spike trains from the primary motor cortex of monkeys, we achieved a good balance between classification accuracy and energy consumption. The motorSRNN communicated with the input by capturing and cultivating more cosine-tuning, an essential property of neurons in the motor cortex, and maintained its stability during training. Such training-induced cultivation and persistency of cosine-tuning was also observed in our monkeys. Moreover, the motorSRNN produced additional bio-functional similarities at the single-neuron, population, and circuit levels, demonstrating biological authenticity. Thereby, ablation studies on motorSRNN have suggested long-term stable feedback synapses contribute to the training-induced cultivation in the motor cortex. Besides these novel findings and predictions, we offer a new framework for building authentic models of neural computation.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Integrative Pan-Cancer Analysis of RNMT: a Potential Prognostic and Immunological Biomarker
Authors:
Shuqiang Huang,
Cuiyu Tan,
**zhen Zheng,
Zhugu Huang,
Zhihong Li,
Ziyin Lv,
Wanru Chen
Abstract:
Background: RNA guanine-7 methyltransferase (RNMT) is one of the main regulators of N7-methylguanosine, and the deregulation of RNMT correlated with tumor development and immune metabolism. However, the specific function of RNMT in pan-cancer remains unclear.
Methods: RNMT expression in different cancers was analyzed using multiple databases, including Cancer Cell Line Encyclopedia (CCLE), Genot…
▽ More
Background: RNA guanine-7 methyltransferase (RNMT) is one of the main regulators of N7-methylguanosine, and the deregulation of RNMT correlated with tumor development and immune metabolism. However, the specific function of RNMT in pan-cancer remains unclear.
Methods: RNMT expression in different cancers was analyzed using multiple databases, including Cancer Cell Line Encyclopedia (CCLE), Genotype-Tissue Expression Project (GTEx), and The Cancer Genome Atlas (TCGA). Cox regression analysis and Kaplan-Meier analysis were used to estimate the correlation of RNMT expression to prognosis. The data was also used to research the relationship between RNMT expression and common immunoregulators, tumor mutation burden (TMB), microsatellite instability (MSI), mismatch repair (MMR), and DNA methyltransferase (DNMT). Additionally, the cBioPortal website was used to evaluate the characteristics of RNMT alteration. The TISDB database was used to obtain the expression of different subtypes. The Tumor Immune Estimation Resource (TIMER) database was used to analyze the association between RNMT and tumor immune infiltration. Gene set enrichment analysis (GSEA) was used to identify the relevant pathways.
Results: RNMT was ubiquitously highly expressed across cancers and survival analysis revealed that its expression was highly associated with the clinical prognosis of various cancer types. Remarkably, RNMT participates in immune regulation and plays a crucial part in the tumor microenvironment. A positive association was found between RNMT expression and six immune cell types expression in colon adenocarcinoma, kidney renal clear cell carcinoma, and liver hepatocellular carcinoma. Moreover, RNMT expression was highly associated with immunoregulators in most cancer types, and correlated to TMB, MSI, MMR, and DNMT. Finally, GSEA indicated that RNMT may correlate with tumor immunity.
△ Less
Submitted 21 March, 2024; v1 submitted 18 October, 2022;
originally announced October 2022.
-
The Brain-Inspired Decoder for Natural Visual Image Reconstruction
Authors:
Wenyi Li,
Shengjie Zheng,
Yufan Liao,
Rongqi Hong,
Weiliang Chen,
Chenggnag He,
Xiaojian Li
Abstract:
Decoding images from brain activity has been a challenge. Owing to the development of deep learning, there are available tools to solve this problem. The decoded image, which aims to map neural spike trains to low-level visual features and high-level semantic information space. Recently, there are a few studies of decoding from spike trains, however, these studies pay less attention to the foundat…
▽ More
Decoding images from brain activity has been a challenge. Owing to the development of deep learning, there are available tools to solve this problem. The decoded image, which aims to map neural spike trains to low-level visual features and high-level semantic information space. Recently, there are a few studies of decoding from spike trains, however, these studies pay less attention to the foundations of neuroscience and there are few studies that merged receptive field into visual image reconstruction. In this paper, we propose a deep learning neural network architecture with biological properties to reconstruct visual image from spike trains. As far as we know, we implemented a method that integrated receptive field property matrix into loss function at the first time. Our model is an end-to-end decoder from neural spike trains to images. We not only merged Gabor filter into auto-encoder which used to generate images but also proposed a loss function with receptive field properties. We evaluated our decoder on two datasets which contain macaque primary visual cortex neural spikes and salamander retina ganglion cells (RGCs) spikes. Our results show that our method can effectively combine receptive field features to reconstruct images, providing a new approach to visual reconstruction based on neural information.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
Uncertainty-Aware Self-supervised Neural Network for Liver $T_{1ρ}$ Map** with Relaxation Constraint
Authors:
Chaoxing Huang,
Yurui Qian,
Simon Chun Ho Yu,
Jian Hou,
Baiyan Jiang,
Queenie Chan,
Vincent Wai-Sun Wong,
Winnie Chiu-Wing Chu,
Weitian Chen
Abstract:
$T_{1ρ}$ map** is a promising quantitative MRI technique for the non-invasive assessment of tissue properties. Learning-based approaches can map $T_{1ρ}$ from a reduced number of $T_{1ρ}$ weighted images, but requires significant amounts of high quality training data. Moreover, existing methods do not provide the confidence level of the $T_{1ρ}…
▽ More
$T_{1ρ}$ map** is a promising quantitative MRI technique for the non-invasive assessment of tissue properties. Learning-based approaches can map $T_{1ρ}$ from a reduced number of $T_{1ρ}$ weighted images, but requires significant amounts of high quality training data. Moreover, existing methods do not provide the confidence level of the $T_{1ρ}$ estimation. To address these problems, we proposed a self-supervised learning neural network that learns a $T_{1ρ}$ map** using the relaxation constraint in the learning process. Epistemic uncertainty and aleatoric uncertainty are modelled for the $T_{1ρ}$ quantification network to provide a Bayesian confidence estimation of the $T_{1ρ}$ map**. The uncertainty estimation can also regularize the model to prevent it from learning imperfect data. We conducted experiments on $T_{1ρ}$ data collected from 52 patients with non-alcoholic fatty liver disease. The results showed that our method outperformed the existing methods for $T_{1ρ}$ quantification of the liver using as few as two $T_{1ρ}$-weighted images. Our uncertainty estimation provided a feasible way of modelling the confidence of the self-supervised learning based $T_{1ρ}$ estimation, which is consistent with the reality in liver $T_{1ρ}$ imaging.
△ Less
Submitted 25 October, 2022; v1 submitted 7 July, 2022;
originally announced July 2022.
-
Accelerated functional brain aging in major depressive disorder: evidence from a large scale fMRI analysis of Chinese participants
Authors:
Yunsong Luo,
Wenyu Chen,
Jiang Qiu,
Tao Jia
Abstract:
Major depressive disorder (MDD) is one of the most common mental health conditions that has been intensively investigated for its association with brain atrophy and mortality. Recent studies reveal that the deviation between the predicted and the chronological age can be a marker of accelerated brain aging to characterize MDD. However, current conclusions are usually drawn based on structural MRI…
▽ More
Major depressive disorder (MDD) is one of the most common mental health conditions that has been intensively investigated for its association with brain atrophy and mortality. Recent studies reveal that the deviation between the predicted and the chronological age can be a marker of accelerated brain aging to characterize MDD. However, current conclusions are usually drawn based on structural MRI information collected from Caucasian participants. The universality of this biomarker needs to be further validated by subjects with different ethnic/racial backgrounds and by different types of data. Here we make use of the REST-meta-MDD, a large scale resting-state fMRI dataset collected from multiple cohort participants in China. We develop a stacking machine learning model based on 1101 healthy controls, which estimates a subject's chronological age from fMRI with promising accuracy. The trained model is then applied to 1276 MDD patients from 24 sites. We observe that MDD patients exhibit a $+4.43$ years ($\text{$p$} < 0.0001$, $\text{Cohen's $d$} = 0.35$, $\text{95\% CI}:1.86 - 3.91$) higher brain-predicted age difference (brain-PAD) compared to controls. In the MDD subgroup, we observe a statistically significant $+2.09$ years ($\text{$p$} < 0.05$, $\text{Cohen's $d$} = 0.134483$) brain-PAD in antidepressant users compared to medication-free patients. The statistical relationship observed is further checked by three different machine learning algorithms. The positive brain-PAD observed in participants in China confirms the presence of accelerated brain aging in MDD patients. The utilization of functional brain connectivity for age estimation verifies existing findings from a new dimension.
△ Less
Submitted 8 May, 2022;
originally announced May 2022.
-
Hard Sample Aware Noise Robust Learning for Histopathology Image Classification
Authors:
Chuang Zhu,
Wenkai Chen,
Ting Peng,
Ying Wang,
Mulan **
Abstract:
Deep learning-based histopathology image classification is a key technique to help physicians in improving the accuracy and promptness of cancer diagnosis. However, the noisy labels are often inevitable in the complex manual annotation process, and thus mislead the training of the classification model. In this work, we introduce a novel hard sample aware noise robust learning method for histopatho…
▽ More
Deep learning-based histopathology image classification is a key technique to help physicians in improving the accuracy and promptness of cancer diagnosis. However, the noisy labels are often inevitable in the complex manual annotation process, and thus mislead the training of the classification model. In this work, we introduce a novel hard sample aware noise robust learning method for histopathology image classification. To distinguish the informative hard samples from the harmful noisy ones, we build an easy/hard/noisy (EHN) detection model by using the sample training history. Then we integrate the EHN into a self-training architecture to lower the noise rate through gradually label correction. With the obtained almost clean dataset, we further propose a noise suppressing and hard enhancing (NSHE) scheme to train the noise robust model. Compared with the previous works, our method can save more clean samples and can be directly applied to the real-world noisy dataset scenario without using a clean subset. Experimental results demonstrate that the proposed scheme outperforms the current state-of-the-art methods in both the synthetic and real-world noisy datasets. The source code and data are available at https://github.com/bupt-ai-cz/HSA-NRL/.
△ Less
Submitted 5 December, 2021;
originally announced December 2021.
-
A Pathologist-Annotated Dataset for Validating Artificial Intelligence: A Project Description and Pilot Study
Authors:
Sarah N Dudgeon,
Si Wen,
Matthew G Hanna,
Rajarsi Gupta,
Mohamed Amgad,
Manasi Sheth,
Hetal Marble,
Richard Huang,
Markus D Herrmann,
Clifford H. Szu,
Darick Tong,
Bruce Werness,
Evan Szu,
Denis Larsimont,
Anant Madabhushi,
Evangelos Hytopoulos,
Weijie Chen,
Rajendra Singh,
Steven N. Hart,
Joel Saltz,
Roberto Salgado,
Brandon D Gallas
Abstract:
Purpose: In this work, we present a collaboration to create a validation dataset of pathologist annotations for algorithms that process whole slide images (WSIs). We focus on data collection and evaluation of algorithm performance in the context of estimating the density of stromal tumor infiltrating lymphocytes (sTILs) in breast cancer. Methods: We digitized 64 glass slides of hematoxylin- and eo…
▽ More
Purpose: In this work, we present a collaboration to create a validation dataset of pathologist annotations for algorithms that process whole slide images (WSIs). We focus on data collection and evaluation of algorithm performance in the context of estimating the density of stromal tumor infiltrating lymphocytes (sTILs) in breast cancer. Methods: We digitized 64 glass slides of hematoxylin- and eosin-stained ductal carcinoma core biopsies prepared at a single clinical site. We created training materials and workflows to crowdsource pathologist image annotations on two modes: an optical microscope and two digital platforms. The workflows collect the ROI type, a decision on whether the ROI is appropriate for estimating the density of sTILs, and if appropriate, the sTIL density value for that ROI. Results: The pilot study yielded an abundant number of cases with nominal sTIL infiltration. Furthermore, we found that the sTIL densities are correlated within a case, and there is notable pathologist variability. Consequently, we outline plans to improve our ROI and case sampling methods. We also outline statistical methods to account for ROI correlations within a case and pathologist variability when validating an algorithm. Conclusion: We have built workflows for efficient data collection and tested them in a pilot study. As we prepare for pivotal studies, we will consider what it will take for the dataset to be fit for a regulatory purpose: study size, patient population, and pathologist training and qualifications. To this end, we will elicit feedback from the FDA via the Medical Device Development Tool program and from the broader digital pathology and AI community. Ultimately, we intend to share the dataset, statistical methods, and lessons learned.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
Constructions and Comparisons of Pooling Matrices for Pooled Testing of COVID-19
Authors:
Yi-Jheng Lin,
Che-Hao Yu,
Tzu-Hsuan Liu,
Cheng-Shang Chang,
Wen-Tsuen Chen
Abstract:
In comparison with individual testing, group testing (also known as pooled testing) is more efficient in reducing the number of tests and potentially leading to tremendous cost reduction. As indicated in the recent article posted on the US FDA website, the group testing approach for COVID-19 has received a lot of interest lately. There are two key elements in a group testing technique: (i) the poo…
▽ More
In comparison with individual testing, group testing (also known as pooled testing) is more efficient in reducing the number of tests and potentially leading to tremendous cost reduction. As indicated in the recent article posted on the US FDA website, the group testing approach for COVID-19 has received a lot of interest lately. There are two key elements in a group testing technique: (i) the pooling matrix that directs samples to be pooled into groups, and (ii) the decoding algorithm that uses the group test results to reconstruct the status of each sample. In this paper, we propose a new family of pooling matrices from packing the pencil of lines (PPoL) in a finite projective plane. We compare their performance with various pooling matrices proposed in the literature, including 2D-pooling, P-BEST, and Tapestry, using the two-stage definite defectives (DD) decoding algorithm. By conducting extensive simulations for a range of prevalence rates up to 5%, our numerical results show that there is no pooling matrix with the lowest relative cost in the whole range of the prevalence rates. To optimize the performance, one should choose the right pooling matrix, depending on the prevalence rate. The family of PPoL matrices can dynamically adjust their column weights according to the prevalence rates and could be a better alternative than using a fixed pooling matrix.
△ Less
Submitted 15 June, 2021; v1 submitted 30 September, 2020;
originally announced October 2020.
-
A Biologically Plausible Audio-Visual Integration Model for Continual Learning
Authors:
Wenjie Chen,
Fengtong Du,
Ye Wang,
Lihong Cao
Abstract:
The problem of catastrophic forgetting has a history of more than 30 years and has not been completely solved yet. Since the human brain has natural ability to perform continual lifelong learning, learning from the brain may provide solutions to this problem. In this paper, we propose a novel biologically plausible audio-visual integration model (AVIM) based on the assumption that the integration…
▽ More
The problem of catastrophic forgetting has a history of more than 30 years and has not been completely solved yet. Since the human brain has natural ability to perform continual lifelong learning, learning from the brain may provide solutions to this problem. In this paper, we propose a novel biologically plausible audio-visual integration model (AVIM) based on the assumption that the integration of audio and visual perceptual information in the medial temporal lobe during learning is crucial to form concepts and make continual learning possible. Specifically, we use multi-compartment Hodgkin-Huxley neurons to build the model and adopt the calcium-based synaptic tagging and capture as the model's learning rule. Furthermore, we define a new continual learning paradigm to simulate the possible continual learning process in the human brain. We then test our model under this new paradigm. Our experimental results show that the proposed AVIM can achieve state-of-the-art continual learning performance compared with other advanced methods such as OWM, iCaRL and GEM. Moreover, it can generate stable representations of objects during learning. These results support our assumption that concept formation is essential for continuous lifelong learning and suggest the proposed AVIM is a possible concept formation mechanism.
△ Less
Submitted 20 July, 2021; v1 submitted 17 July, 2020;
originally announced July 2020.
-
MetaSleepLearner: A Pilot Study on Fast Adaptation of Bio-signals-Based Sleep Stage Classifier to New Individual Subject Using Meta-Learning
Authors:
Nannapas Banluesombatkul,
Pichayoot Ouppaphan,
Pitshaporn Leelaarporn,
Payongkit Lakhan,
Busarakum Chaitusaney,
Nattapong Jaimchariyatam,
Ekapol Chuangsuwanich,
Wei Chen,
Huy Phan,
Nat Dilokthanakul,
Theerawit Wilaiprasitporn
Abstract:
Identifying bio-signals based-sleep stages requires time-consuming and tedious labor of skilled clinicians. Deep learning approaches have been introduced in order to challenge the automatic sleep stage classification conundrum. However, the difficulties can be posed in replacing the clinicians with the automatic system due to the differences in many aspects found in individual bio-signals, causing…
▽ More
Identifying bio-signals based-sleep stages requires time-consuming and tedious labor of skilled clinicians. Deep learning approaches have been introduced in order to challenge the automatic sleep stage classification conundrum. However, the difficulties can be posed in replacing the clinicians with the automatic system due to the differences in many aspects found in individual bio-signals, causing the inconsistency in the performance of the model on every incoming individual. Thus, we aim to explore the feasibility of using a novel approach, capable of assisting the clinicians and lessening the workload. We propose the transfer learning framework, entitled MetaSleepLearner, based on Model Agnostic Meta-Learning (MAML), in order to transfer the acquired sleep staging knowledge from a large dataset to new individual subjects. The framework was demonstrated to require the labelling of only a few sleep epochs by the clinicians and allow the remainder to be handled by the system. Layer-wise Relevance Propagation (LRP) was also applied to understand the learning course of our approach. In all acquired datasets, in comparison to the conventional approach, MetaSleepLearner achieved a range of 5.4\% to 17.7\% improvement with statistical difference in the mean of both approaches. The illustration of the model interpretation after the adaptation to each subject also confirmed that the performance was directed towards reasonable learning. MetaSleepLearner outperformed the conventional approaches as a result from the fine-tuning using the recordings of both healthy subjects and patients. This is the first work that investigated a non-conventional pre-training method, MAML, resulting in a possibility for human-machine collaboration in sleep stage classification and easing the burden of the clinicians in labelling the sleep stages through only several epochs rather than an entire recording.
△ Less
Submitted 10 November, 2020; v1 submitted 8 April, 2020;
originally announced April 2020.
-
Sex Differences in Severity and Mortality Among Patients With COVID-19: Evidence from Pooled Literature Analysis and Insights from Integrated Bioinformatic Analysis
Authors:
Xiyi Wei,
Yu-Tian Xiao,
Jian Wang,
Rui Chen,
Wei Zhang,
Yue Yang,
Daojun Lv,
Chao Qin,
Di Gu,
Bo Zhang,
Weidong Chen,
Jianquan Hou,
Ninghong Song,
Guohua Zeng,
Shancheng Ren
Abstract:
Objective: To conduct a meta-analysis of current studies that examined sex differences in severity and mortality in patients with COVID-19, and identify potential mechanisms underpinning these differences. Methods: We performed a systematic review to collate data from observational studies examining associations of sex differences with clinical outcomes of COVID-19. PubMed, Web of Science and four…
▽ More
Objective: To conduct a meta-analysis of current studies that examined sex differences in severity and mortality in patients with COVID-19, and identify potential mechanisms underpinning these differences. Methods: We performed a systematic review to collate data from observational studies examining associations of sex differences with clinical outcomes of COVID-19. PubMed, Web of Science and four preprint servers were searched for relevant studies. Data were extracted and analyzed using meta-analysis where possible, with summary data presented otherwise. Publicly available bulk RNA sequencing (RNA-seq), single-cell RNA sequencing (scRNA-seq), and chromatin immunoprecipitation sequencing (ChIP-seq) data were analyzed to explore the potential mechanisms underlying the observed association. Results: 39 studies met inclusion criteria, representing 77932 patients, of which 41510 (53.3%) were males. Men were at a markedly increased risk of develo** severe cases compared with women. Furthermore, the pooled odds ratio (OR) of mortality for male group compared with the female group indicated significant higher mortality rate for male. Data from scRNA-seq suggest that men have a higher amount of ACE2-expressing pulmonary alveolar type II cells than women. Sex-based immunological differences exist. The expression of androgen receptor (AR) is positively correlated with ACE2, and there is evidence that AR may directly regulate the expression of ACE2. Conclusions: This meta-analysis detected an increased severity and mortality rate in the male populations with COVID-19, which might be attributable to the sex-based differences in cellular compositions and immunological microenvironments of the lung. The host cell receptor ACE2 is likely regulated by AR signaling pathway, which is identified as a potential target for prevention and treatment of SARS-Cov-2 infections in men.
△ Less
Submitted 30 March, 2020;
originally announced March 2020.
-
DHX36-mediated G-quadruplex unfolding is ATP-independent?
Authors:
Hai-Lei Guo,
Wei-Fei Chen,
Stephane Rety,
Na-Nv Liu,
Ze-Yu Song,
Yan-Xue Dai,
Xi-Miao Hou,
Shuo-Xing Dou,
Xu-Guang Xi
Abstract:
Chen et al. solved the crystal structure of bovine DHX36 bound to a DNA with a G-quadruplex (G4) and a single-stranded DNA segment. They believed that the mechanism they proposed may represent a general model for describing how a G4-unfolding helicase recognizes and unfolds G4 DNA. Their conclusion is interesting, however, we noticed that their linear DNA substrate (DNAMyc) that harbors a Myc-prom…
▽ More
Chen et al. solved the crystal structure of bovine DHX36 bound to a DNA with a G-quadruplex (G4) and a single-stranded DNA segment. They believed that the mechanism they proposed may represent a general model for describing how a G4-unfolding helicase recognizes and unfolds G4 DNA. Their conclusion is interesting, however, we noticed that their linear DNA substrate (DNAMyc) that harbors a Myc-promoter-derived G4-forming sequence was directly used without pre-folding. This raises the question whether the structure they obtained really reflects DHX36-mediated G4 recognition and unfolding, or just only represents a DHX36-binding-induced quasi-folded G4 structure. By a combination of polymerase extension, DMS footprinting, stopped-flow, and smFRET assays, we obtained clear evidences that do not support their ATP-independent one-base translocation structural model. We further revealed that the oscillation of FRET signal they observed should correspond to a repetitive G4 binding, but not unfolding, by DHX36.
△ Less
Submitted 22 September, 2019;
originally announced September 2019.
-
Predicting Clinical Outcome of Stroke Patients with Tractographic Feature
Authors:
Po-Yu Kao,
Jefferson W. Chen,
B. S. Manjunath
Abstract:
The volume of stroke lesion is the gold standard for predicting the clinical outcome of stroke patients. However, the presence of stroke lesion may cause neural disruptions to other brain regions, and these potentially damaged regions may affect the clinical outcome of stroke patients. In this paper, we introduce the tractographic feature to capture these potentially damaged regions and predict th…
▽ More
The volume of stroke lesion is the gold standard for predicting the clinical outcome of stroke patients. However, the presence of stroke lesion may cause neural disruptions to other brain regions, and these potentially damaged regions may affect the clinical outcome of stroke patients. In this paper, we introduce the tractographic feature to capture these potentially damaged regions and predict the modified Rankin Scale (mRS), which is a widely used outcome measure in stroke clinical trials. The tractographic feature is built from the stroke lesion and average connectome information from a group of normal subjects. The tractographic feature takes into account different functional regions that may be affected by the stroke, thus complementing the commonly used stroke volume features. The proposed tractographic feature is tested on a public stroke benchmark Ischemic Stroke Lesion Segmentation 2017 and achieves higher accuracy than the stroke volume and the state-of-the-art feature on predicting the mRS grades of stroke patients. In addition, the tractographic feature also yields a lower average absolute error than the commonly used stroke volume feature.
△ Less
Submitted 19 September, 2019; v1 submitted 22 July, 2019;
originally announced July 2019.
-
High-resolution Markov state models for the dynamics of Trp-cage miniprotein constructed over slow folding modes identified by state-free reversible VAMPnets
Authors:
Hythem Sidky,
Wei Chen,
Andrew L. Ferguson
Abstract:
State-free reversible VAMPnets (SRVs) are a neural network-based framework capable of learning the leading eigenfunctions of the transfer operator of a dynamical system from trajectory data. In molecular dynamics simulations, these data-driven collective variables (CVs) capture the slowest modes of the dynamics and are useful for enhanced sampling and free energy estimation. In this work, we emplo…
▽ More
State-free reversible VAMPnets (SRVs) are a neural network-based framework capable of learning the leading eigenfunctions of the transfer operator of a dynamical system from trajectory data. In molecular dynamics simulations, these data-driven collective variables (CVs) capture the slowest modes of the dynamics and are useful for enhanced sampling and free energy estimation. In this work, we employ SRV coordinates as a feature set for Markov state model (MSM) construction. Compared to the current state of the art, MSMs constructed from SRV coordinates are more robust to the choice of input features, exhibit faster implied timescale convergence, and permit the use of shorter lagtimes to construct higher kinetic resolution models. We apply this methodology to study the folding kinetics and conformational landscape of the Trp-cage miniprotein. Folding and unfolding mean first passage times are in good agreement with prior literature, and a nine macrostate model is presented. The unfolded ensemble comprises a central kinetic hub with interconversions to several metastable unfolded conformations and which serves as the gateway to the folded ensemble. The folded ensemble comprises the native state, a partially unfolded intermediate "loop" state, and a previously unreported short-lived intermediate that we were able to resolve due to the high time-resolution of the SRV-MSM. We propose SRVs as an excellent candidate for integration into modern MSM construction pipelines.
△ Less
Submitted 11 June, 2019;
originally announced June 2019.
-
Influence of Initial Residual Stress on Growth and Pattern Creation for a Layered Aorta
Authors:
Yangkun Du,
Chaofeng Lü,
Michel Destrade,
Weiqiu Chen
Abstract:
Residual stress is ubiquitous and indispensable in most biological and artificial materials, where it sustains and optimizes many biological and functional mechanisms. The theory of volume growth, starting from a stress-free initial state, is widely used to explain the creation and evolution of growth-induced residual stress and the resulting changes in shape, and to model how growing bio-tissues…
▽ More
Residual stress is ubiquitous and indispensable in most biological and artificial materials, where it sustains and optimizes many biological and functional mechanisms. The theory of volume growth, starting from a stress-free initial state, is widely used to explain the creation and evolution of growth-induced residual stress and the resulting changes in shape, and to model how growing bio-tissues such as arteries and solid tumors develop a strategy of pattern creation according to geometrical and material parameters. This modelling provides promising avenues for designing and directing some appropriate morphology of a given tissue or organ and achieve some targeted biomedical function. In this paper, we rely on a modified, augmented theory to reveal how we can obtain growth-induced residual stress and pattern evolution of a layered artery by starting from an existing, non-zero initial residual stress state. We use experimentally determined residual stress distributions of aged bi-layered human aortas and quantify their influence by a magnitude factor. Our results show that initial residual stress has a more significant impact on residual stress accumulation and the subsequent evolution of patterns than geometry and material parameters. Additionally, we provide an essential explanation for growth-induced patterns driven by differential growth coupled to an initial residual stress. Finally, we show that initial residual stress is a readily available way to control growth-induced pattern creation for tissues and thus may provide a promising inspiration for biomedical engineering.
△ Less
Submitted 3 June, 2019;
originally announced June 2019.
-
Density estimation on small datasets
Authors:
Wei-Chia Chen,
Ammar Tareen,
Justin B. Kinney
Abstract:
How might a smooth probability distribution be estimated, with accurately quantified uncertainty, from a limited amount of sampled data? Here we describe a field-theoretic approach that addresses this problem remarkably well in one dimension, providing an exact nonparametric Bayesian posterior without relying on tunable parameters or large-data approximations. Strong non-Gaussian constraints, whic…
▽ More
How might a smooth probability distribution be estimated, with accurately quantified uncertainty, from a limited amount of sampled data? Here we describe a field-theoretic approach that addresses this problem remarkably well in one dimension, providing an exact nonparametric Bayesian posterior without relying on tunable parameters or large-data approximations. Strong non-Gaussian constraints, which require a non-perturbative treatment, are found to play a major role in reducing distribution uncertainty. A software implementation of this method is provided.
△ Less
Submitted 29 August, 2018; v1 submitted 5 April, 2018;
originally announced April 2018.
-
Complex network analysis of brain functional connectivity under a multi-step cognitive task
Authors:
Shi-Min Cai,
Wei Chen,
Dong-Bai Liu,
Ming Tang,
Xun Chen
Abstract:
Functional brain network has been widely studied to understand the relationship between brain organization and behavior. In this paper, we aim to explore the functional connectivity of brain network under a \emph{multi-step} cognitive task involving with consecutive behaviors, and further understand the effect of behaviors on the brain organization. The functional brain networks are constructed ba…
▽ More
Functional brain network has been widely studied to understand the relationship between brain organization and behavior. In this paper, we aim to explore the functional connectivity of brain network under a \emph{multi-step} cognitive task involving with consecutive behaviors, and further understand the effect of behaviors on the brain organization. The functional brain networks are constructed base on a high spatial and temporal resolution fMRI dataset and analyzed via complex network based approach. We find that at voxel level the functional brain network shows robust small-worldness and scale-free characteristics, while its assortativity and rich-club organization are slightly restricted to order of behaviors performed. More interestingly, the functional connectivity of brain network in activated ROIs strongly correlates with behaviors and behaves obvious differences restricted to order of behaviors performed. These empirical results suggest that the brain organization has the generic properties of small-worldness and scale-free characteristics, and its diverse function connectivity emerging from activated ROIs is strongly driven by these behavioral activities via the plasticity of brain.
△ Less
Submitted 28 November, 2017;
originally announced November 2017.
-
Cascade and Parallel Convolutional Recurrent Neural Networks on EEG-based Intention Recognition for Brain Computer Interface
Authors:
Dalin Zhang,
Lina Yao,
Xiang Zhang,
Sen Wang,
Weitong Chen,
Robert Boots
Abstract:
Brain-Computer Interface (BCI) is a system empowering humans to communicate with or control the outside world with exclusively brain intentions. Electroencephalography (EEG) based BCIs are promising solutions due to their convenient and portable instruments. Motor imagery EEG (MI-EEG) is a kind of most widely focused EEG signals, which reveals a subjects movement intentions without actual actions.…
▽ More
Brain-Computer Interface (BCI) is a system empowering humans to communicate with or control the outside world with exclusively brain intentions. Electroencephalography (EEG) based BCIs are promising solutions due to their convenient and portable instruments. Motor imagery EEG (MI-EEG) is a kind of most widely focused EEG signals, which reveals a subjects movement intentions without actual actions. Despite the extensive research of MI-EEG in recent years, it is still challenging to interpret EEG signals effectively due to the massive noises in EEG signals (e.g., low signal noise ratio and incomplete EEG signals), and difficulties in capturing the inconspicuous relationships between EEG signals and certain brain activities. Most existing works either only consider EEG as chain-like sequences neglecting complex dependencies between adjacent signals or performing simple temporal averaging over EEG sequences. In this paper, we introduce both cascade and parallel convolutional recurrent neural network models for precisely identifying human intended movements by effectively learning compositional spatio-temporal representations of raw EEG streams. The proposed models grasp the spatial correlations between physically neighboring EEG signals by converting the chain like EEG sequences into a 2D mesh like hierarchy. An LSTM based recurrent network is able to extract the subtle temporal dependencies of EEG data streams. Extensive experiments on a large-scale MI-EEG dataset (108 subjects, 3,145,160 EEG records) have demonstrated that both models achieve high accuracy near 98.3% and outperform a set of baseline methods and most recent deep learning based EEG recognition models, yielding a significant accuracy increase of 18% in the cross-subject validation scenario.
△ Less
Submitted 10 June, 2021; v1 submitted 22 August, 2017;
originally announced August 2017.
-
DIMM-SC: A Dirichlet mixture model for clustering droplet-based single cell transcriptomic data
Authors:
Zhe Sun,
Ting Wang,
Ke Deng,
Xiao-Feng Wang,
Robert Lafyatis,
Ying Ding,
Ming Hu,
Wei Chen
Abstract:
Motivation: Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the t…
▽ More
Motivation: Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the technology advances, statistical methods and computational tools are still lacking for analyzing droplet-based scRNA-Seq data. Particularly, model-based approaches for clustering large-scale single cell transcriptomic data are still under-explored. Methods: We developed DIMM-SC, a Dirichlet Mixture Model for clustering droplet-based Single Cell transcriptomic data. This approach explicitly models UMI count data from scRNA-Seq experiments and characterizes variations across different cell clusters via a Dirichlet mixture prior. An expectation-maximization algorithm is used for parameter inference. Results: We performed comprehensive simulations to evaluate DIMM-SC and compared it with existing clustering methods such as K-means, CellTree and Seurat. In addition, we analyzed public scRNA-Seq datasets with known cluster labels and in-house scRNA-Seq datasets from a study of systemic sclerosis with prior biological knowledge to benchmark and validate DIMM-SC. Both simulation studies and real data applications demonstrated that overall, DIMM-SC achieves substantially improved clustering accuracy and much lower clustering variability compared to other existing clustering methods. More importantly, as a model-based approach, DIMM-SC is able to quantify the clustering uncertainty for each single cell, facilitating rigorous statistical inference and biological interpretations, which are typically unavailable from existing clustering methods.
△ Less
Submitted 6 April, 2017;
originally announced April 2017.
-
Bi-phase age-related brain gray matter magnetic resonance T1rho relaxation time change
Authors:
Yao T Li,
Huang Hua,
Zhizheng Zhang,
Puxuan Lu,
Weitian Chen,
Yixiang J Wang
Abstract:
Objectives: To investigate normative value and age-related change of brain magnetic resonance T1rho relaxation at 1.5 T. Methods: 20 males (age: 40.7+/-15.5 years, range: 22-68 years) and 22 females (age: 38.5 +/-14.8 years, range: 21-62 years), were scanned at 1.5 Tesla using 3D fluid suppressed turbo spin echo sequence. Regions-of-interests (ROIs) were obtained by atlas-based tissue segmentation…
▽ More
Objectives: To investigate normative value and age-related change of brain magnetic resonance T1rho relaxation at 1.5 T. Methods: 20 males (age: 40.7+/-15.5 years, range: 22-68 years) and 22 females (age: 38.5 +/-14.8 years, range: 21-62 years), were scanned at 1.5 Tesla using 3D fluid suppressed turbo spin echo sequence. Regions-of-interests (ROIs) were obtained by atlas-based tissue segmentation and T1rho was calculated by fitting the mean value to monoexponential model. Correlation between T1rho relaxation of brain gray matter regions and age was investigated. Results: A regional difference among individual gray matter areas was noted; with hippocampus (98.37+/-5.37 msec) and amygdala (94.95+/-4.34 msec) have the highest measurement, while pallidum (83.81+/-5.49) and putamen (83.93+4.76) the lowest measurement. T1rho values decreased slowly (mean slope: -0.256) and significantly (p<0.05) with age in gray matter for subjects younger than 40 years old, while for subjects older than 40 years old there was no significant correlation between T1rho relaxation and age. Conclusion: T1rho relaxation demonstrates a bi-phase change with age in adults of 22-68 years.
△ Less
Submitted 25 October, 2016;
originally announced October 2016.
-
Improved liver T1rho measurement precision with a breathhold black blood single shot fast spin echo acquisition: a validation study in healthy volunteers
Authors:
Yi-Xiang Wang,
Min Deng,
GladsG Lo,
Queenie Chan,
**g Yuan,
Weitian Chen
Abstract:
Purpose: To explore the usability and normal T1rho value of liver parenchyma with a novel single breathhold black blood single shot fast spin echo acquisition based liver imaging sequence. Materials and Methods: In total 19 health subjects (10 males, 9 females; mean age: 37.4 yrs; range: 23-54 yrs) participated in the study. 11 subjects had liver scanned twice in the same session to access scan-re…
▽ More
Purpose: To explore the usability and normal T1rho value of liver parenchyma with a novel single breathhold black blood single shot fast spin echo acquisition based liver imaging sequence. Materials and Methods: In total 19 health subjects (10 males, 9 females; mean age: 37.4 yrs; range: 23-54 yrs) participated in the study. 11 subjects had liver scanned twice in the same session to access scan-rescan repeatability. 12 subjects had liver scanned twice in two sessions with 7-10 days' interval to access scan-rescan reproducibility. MR was performed with a 3.0 T scanner with dual transmitter. The MR sequence allows simultaneous acquisition of 4 spin lock times (TSLs: 0ms, 10 ms, 30 ms, 50ms) in 10 second. Inherent black blood effect of fast spin echo and double inversion recovery were utilized to achieve blood signal suppression. Results: The technique demonstrated good image quality and minimal artifacts. For liver parenchyma, Bland-Altman plot showed the scan-rescan repeatability mean difference was 0.025 ms (95% limits of agreement: -1.163 to 1.213 ms), and intraclass correlation coefficient (ICC) was 0.977. The scan-rescan reproducibility mean difference was -0.075 ms (95% limits of agreement: -3.280 to 3.310 ms), and ICC was 0.820 which is better than the ICC of 0.764 of a previous bright blood multi-breath hold gradient echo acquisition technique. The liver T1rho value was 39.9 +/- 2.4 ms (range: 36.1 - 44.2 ms), which is lower than the value of 42.8=/-2.1 ms acquired with the previous bright blood technique. Conclusion: This study validated the application of a single breathhold black blood single shot fast spin echo acquisition based for human liver T1rho imaging. The lower liver parenchyma T1rho value and higher scan rescan reproducibility may improve of the sensitivity of this technique.
△ Less
Submitted 25 October, 2016;
originally announced October 2016.
-
Parallel STEPS: Large Scale Stochastic Spatial Reaction-Diffusion Simulation with High Performance Computers
Authors:
Weiliang Chen,
Erik De Schutter
Abstract:
Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of simulated models and morphologies have exceeded the capacity of any serial implementation. This led to development of parallel solutions that benefit from the boost in performance of modern large-scale supercomputers. In this pa…
▽ More
Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of simulated models and morphologies have exceeded the capacity of any serial implementation. This led to development of parallel solutions that benefit from the boost in performance of modern large-scale supercomputers. In this paper, we describe an MPI-based, parallel Operator-Splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its usage in real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario a parallel simulation with 2000 processes achieves more than 3600 times of speedup relative to its serial SSA counterpart and more than 20 times of speedup relative to parallel simulation with 100 processes. While simulation performance is affected by unbalanced loading, a substantial speedup can still be observed without any special treatment.
△ Less
Submitted 7 October, 2016;
originally announced October 2016.
-
Accurate Reaction-Diffusion Operator Splitting on Tetrahedral Meshes for Parallel Stochastic Molecular Simulations
Authors:
I. Hepburn,
W. Chen,
E. De Schutter
Abstract:
Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, meaning that the serial limit has already been reached in sub-cellular models. This calls for parallel simulations that can take advantage of the power of modern supercomputers; however exact methods are known to be i…
▽ More
Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, meaning that the serial limit has already been reached in sub-cellular models. This calls for parallel simulations that can take advantage of the power of modern supercomputers; however exact methods are known to be inherently serial. We introduce an operator splitting implementation for irregular grids with a novel method to improve accuracy, and demonstrate potential for scalable parallel simulations in an initial MPI version. We foresee that this groundwork will enable larger scale, whole-cell stochastic simulations in the near future.
△ Less
Submitted 9 December, 2015;
originally announced December 2015.
-
Zigzag Stacks and m-Regular Linear Stacks
Authors:
William Y. C. Chen,
Qiang-Hui Guo,
Lisa H. Sun,
Jian Wang
Abstract:
The contact map of a protein fold is a graph that represents the patterns of contacts in the fold. It is known that the contact map can be decomposed into stacks and queues. RNA secondary structures are special stacks in which the degree of each vertex is at most one and each arc has length at least two. Waterman and Smith derived a formula for the number of RNA secondary structures of length $n$…
▽ More
The contact map of a protein fold is a graph that represents the patterns of contacts in the fold. It is known that the contact map can be decomposed into stacks and queues. RNA secondary structures are special stacks in which the degree of each vertex is at most one and each arc has length at least two. Waterman and Smith derived a formula for the number of RNA secondary structures of length $n$ with exactly $k$ arcs. Höner zu Siederdissen et al. developed a folding algorithm for extended RNA secondary structures in which each vertex has maximum degree two. An equation for the generating function of extended RNA secondary structures was obtained by Müller and Nebel by using a context-free grammar approach, which leads to an asymptotic formula. In this paper, we consider $m$-regular linear stacks, where each arc has length at least $m$ and the degree of each vertex is bounded by two. Extended RNA secondary structures are exactly $2$-regular linear stacks. For any $m\geq 2$, we obtain an equation for the generating function of the $m$-regular linear stacks. For given $m$, we can deduce a recurrence relation and an asymptotic formula for the number of $m$-regular linear stacks on $n$ vertices. To establish the equation, we use the reduction operation of Chen, Deng and Du to transform an $m$-regular linear stack to an $m$-reduced zigzag (or alternating) stack. Then we find an equation for $m$-reduced zigzag stacks leading to an equation for $m$-regular linear stacks.
△ Less
Submitted 4 June, 2014;
originally announced June 2014.
-
Determining the accuracy of spatial gradient sensing using statistical mechanics
Authors:
Bo Hu,
Wen Chen,
Wouter-Jan Rappel,
Herbert Levine
Abstract:
Many eukaryotic cells are able to sense chemical gradients by directly measuring spatial concentration differences. The precision of such gradient sensing is limited by fluctuations in the binding of diffusing particles to specific receptors on the cell surface. Here, we explore the physical limits of the spatial sensing mechanism by modeling the chemotactic cell as an Ising spin chain subject to…
▽ More
Many eukaryotic cells are able to sense chemical gradients by directly measuring spatial concentration differences. The precision of such gradient sensing is limited by fluctuations in the binding of diffusing particles to specific receptors on the cell surface. Here, we explore the physical limits of the spatial sensing mechanism by modeling the chemotactic cell as an Ising spin chain subject to a spatially varying field. This allows us to derive the maximum likelihood estimators of the gradient parameters as well as explicit expressions for their asymptotic uncertainties. The accuracy increases with the cell's size and our results demonstrate that this accuracy be further increased by introducing a non-zero cooperativity between neighboring receptors. Thus, consistent with recent experimental data, it is possible for small bacteria to perform spatial measurements of gradients.
△ Less
Submitted 2 June, 2010;
originally announced June 2010.
-
Energetics of Protein-DNA Interactions
Authors:
Jason E Donald,
William W Chen,
Eugene I Shakhnovich
Abstract:
Protein-DNA interactions are vital for many processes in living cells, especially transcriptional regulation and DNA modification. To further our understanding of these important processes on the microscopic level, it is necessary that theoretical models describe the macromolecular interaction energetics accurately. While several methods have been proposed, there has not been a careful compariso…
▽ More
Protein-DNA interactions are vital for many processes in living cells, especially transcriptional regulation and DNA modification. To further our understanding of these important processes on the microscopic level, it is necessary that theoretical models describe the macromolecular interaction energetics accurately. While several methods have been proposed, there has not been a careful comparison of how well the different methods are able to predict biologically important quantities such as the correct DNA binding sequence, total binding free energy, and free energy changes caused by DNA mutation. In addition to carrying out the comparison, we present two important theoretical models developed initially in protein folding that have not yet been tried on protein-DNA interactions. In the process, we find that the results of these knowledge-based potentials show a strong dependence on the interaction distance and the derivation method. Finally, we present a knowledge-based potential that gives comparable or superior results to the best of the other methods, including the molecular mechanics force field AMBER99.
△ Less
Submitted 30 November, 2006;
originally announced November 2006.
-
All-atom ab initio folding of a diverse set of proteins
Authors:
Jae Shick Yang,
William W. Chen,
Jeffrey Skolnick,
Eugene I. Shakhnovich
Abstract:
Natural proteins fold to a unique, thermodynamically dominant state. Modeling of the folding process and prediction of the native fold of proteins are two major unsolved problems in biophysics. Here, we show successful all-atom ab initio folding of a representative diverse set of proteins, using a minimalist transferable energy model that consists of two-body atom-atom interactions, hydrogen-bon…
▽ More
Natural proteins fold to a unique, thermodynamically dominant state. Modeling of the folding process and prediction of the native fold of proteins are two major unsolved problems in biophysics. Here, we show successful all-atom ab initio folding of a representative diverse set of proteins, using a minimalist transferable energy model that consists of two-body atom-atom interactions, hydrogen-bonding, and a local sequence energy term that models sequence-specific chain stiffness. Starting from a random coil, the native-like structure was observed during replica exchange Monte Carlo (REMC) simulation for most proteins regardless of their structural classes; the lowest energy structure was close to native- in the range of 2-6 A root-mean-square deviation (RMSD). Our results demonstrate that the successful all-atom folding of a protein chain to its native state is governed by only a few crucial energetic terms.
△ Less
Submitted 27 November, 2006;
originally announced November 2006.
-
Information and Protein Interfaces
Authors:
William W. Chen,
Paul J. Choi,
Jason E. Donald,
Eugene I. Shakhnovich
Abstract:
To confer high specificity and affinity in binding, contacts at interfaces between two interacting macromolecules are expected to exhibit pair preferences for types of atoms or residues. Here we quantify these preferences by measuring the mutual information of contacts for 895 protein-protein interfaces. The information content is significant and is highest at the atomic resolution. A simple phe…
▽ More
To confer high specificity and affinity in binding, contacts at interfaces between two interacting macromolecules are expected to exhibit pair preferences for types of atoms or residues. Here we quantify these preferences by measuring the mutual information of contacts for 895 protein-protein interfaces. The information content is significant and is highest at the atomic resolution. A simple phenomenological theory reveals a connection between information at interfaces and the free energy spectrum of association. The connection is presented in the form of a relation between mutual information and the energy gap of the native bound state to off-target bound states. Measurement of information content in designed lattice interfaces show the predicted scaling behavior to the energy gap. Our theory also suggests that mutual information in contacts emerges by a selection mechanism, and that strong selection, or high conservation, of residues should lead to correspondingly high mutual information. Amino acids which contribute more heavily to information content are then expected to be more conserved. We verify this by showing a statistically significant correlation between the conservation of each of the twenty amino acids and their individual contribution to the information content at protein-protein interfaces
△ Less
Submitted 16 April, 2006;
originally announced April 2006.
-
Entropic stabilization of proteins and its proteomic consequences
Authors:
Igor N. Berezovsky,
William W. Chen,
Paul J. Choi,
Eugene I. Shakhnovich
Abstract:
We report here a new entropic mechanism of protein thermostability due to residual dynamics of rotamer isomerization in native state. All-atom simulations show that Lysines have much greater number of accessible rotamers than Arginines in folded states of proteins. This finding suggests that Lysines would preferentially entropically stabilize the native state. Indeed we show in computational exp…
▽ More
We report here a new entropic mechanism of protein thermostability due to residual dynamics of rotamer isomerization in native state. All-atom simulations show that Lysines have much greater number of accessible rotamers than Arginines in folded states of proteins. This finding suggests that Lysines would preferentially entropically stabilize the native state. Indeed we show in computational experiments that Arginine-to-Lysine amino acid substitutions result in noticeable stabilization of proteins. We then hypothesize that if evolution uses this physical mechanisms in its strategies of thermophilic adaptation then hyperthermostable organisms would have much greater content of Lysines in their proteomes than of comparable in size and similarly charged Arginines.. Consistent with that, high-throughput comparative analysis of complete proteomes shows extremely strong bias towards Arginine-to-Lysine replacement in hyperthermophilic organisms and overall much greater content of Lysines than Arginines in hyperthermophiles. This finding cannot be explained by GC compositional biases. Our study provides an example of how analysis of a delicate physical mechanism of thermostability helps to resolve a puzzle in comparative genomics as to why aminoacid compositions of hyperthermophilic proteomes are significantly biased towards Lysines but not Arginines
△ Less
Submitted 21 June, 2005;
originally announced June 2005.
-
An Exact Model of Fluctuations in Gene Expression
Authors:
William W. Chen,
Jeremy L. England,
Eugene I. Shakhnovich
Abstract:
Fluctuations in the measured mRNA levels of unperturbed cells under fixed conditions have often been viewed as an impediment to the extraction of information from expression profiles. Here, we argue that such expression fluctuations should themselves be studied as a source of valuable information about the underlying dynamics of genetic networks. By analyzing microarray data taken from Saccharom…
▽ More
Fluctuations in the measured mRNA levels of unperturbed cells under fixed conditions have often been viewed as an impediment to the extraction of information from expression profiles. Here, we argue that such expression fluctuations should themselves be studied as a source of valuable information about the underlying dynamics of genetic networks. By analyzing microarray data taken from Saccharomyces cerevisiae, we demonstrate that correlations in expression fluctuations have a highly statistically significant dependence on gene function, and furthermore exhibit a remarkable scale-free network structure. We therefore present what we view to be the simplest phenomenological model of a genetic network which can account for the presence of biological information in transcript level fluctuations. We proceed to exactly solve this model using a path integral technique and derive several quantitative predictions. Finally, we propose several experiments by which these predictions might be rigorously tested.
△ Less
Submitted 10 February, 2004;
originally announced February 2004.
-
Noise in Genotype Selection Model
Authors:
Bao-Quan Ai,
Wei Chen,
Xian-Ju Wang,
Guo-Tao Liu,
De-Hua Wen,
Liang-Gang Liu
Abstract:
We study the steady state properties of a genotype selection model in presence of correlated Gaussian white noise. The effect of the noise on the genotype selection model is discussed. It is found that correlated noise can break the balance of gene selection and induce the phase transition which can makes us select one type gene haploid from a gene group.
We study the steady state properties of a genotype selection model in presence of correlated Gaussian white noise. The effect of the noise on the genotype selection model is discussed. It is found that correlated noise can break the balance of gene selection and induce the phase transition which can makes us select one type gene haploid from a gene group.
△ Less
Submitted 25 June, 2003;
originally announced June 2003.
-
Noise in an insect outbreak model
Authors:
Bao-Quan Ai,
Wei Chen,
Xian-Ju Wang,
Guo-Tao Liu,
De-Hua Wen,
Hui-Zhang Xie,
Liang-Gang Liu
Abstract:
We study the steady state properties of an insect (spruce budworm) outbreak model in the presence of Gaussian white noise. Based on the corresponding Fokker-Planck equation the steady state solution of the probability distribution function and its extrema have been investigated. It was found that fluctuations of the insect birth rate reduces the population of the insects while fluctuations of pr…
▽ More
We study the steady state properties of an insect (spruce budworm) outbreak model in the presence of Gaussian white noise. Based on the corresponding Fokker-Planck equation the steady state solution of the probability distribution function and its extrema have been investigated. It was found that fluctuations of the insect birth rate reduces the population of the insects while fluctuations of predation rate and the noise correlation can prevent the population of the insects from going into extinction. Noise in the model can induce a phase transition.
△ Less
Submitted 25 June, 2003;
originally announced June 2003.