Search | arXiv e-print repository

Progress Towards Decoding Visual Imagery via fNIRS

Authors: Michel Adamic, Wellington Avelino, Anna Brandenberger, Bryan Chiang, Hunter Davis, Stephen Fay, Andrew Gregory, Aayush Gupta, Raphael Hotter, Grace Jiang, Fiona Leng, Stephen Polcyn, Thomas Ribeiro, Paul Scotti, Michelle Wang, Marley Xiong, Jonathan Xu

Abstract: We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 2… ▽ More We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 20% with 2-cm resolution. With simulations and high-density tomography, we found that time-domain fNIRS can achieve 1-cm resolution, compared to 2-cm resolution for continuous-wave fNIRS. Lastly, we share designs for a prototype time-domain fNIRS device, consisting of a laser driver, a single photon detector, and a time-to-digital converter system. △ Less

Submitted 22 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.15158 [pdf, other]

ProtFAD: Introducing function-aware domains as implicit modality towards protein function perception

Authors: Mingqing Wang, Zhiwei Nie, Yonghong He, Zhixiang Ren

Abstract: Protein function prediction is currently achieved by encoding its sequence or structure, where the sequence-to-function transcendence and high-quality structural data scarcity lead to obvious performance bottlenecks. Protein domains are "building blocks" of proteins that are functionally independent, and their combinations determine the diverse biological functions. However, most existing studies… ▽ More Protein function prediction is currently achieved by encoding its sequence or structure, where the sequence-to-function transcendence and high-quality structural data scarcity lead to obvious performance bottlenecks. Protein domains are "building blocks" of proteins that are functionally independent, and their combinations determine the diverse biological functions. However, most existing studies have yet to thoroughly explore the intricate functional information contained in the protein domains. To fill this gap, we propose a synergistic integration approach for a function-aware domain representation, and a domain-joint contrastive learning strategy to distinguish different protein functions while aligning the modalities. Specifically, we associate domains with the GO terms as function priors to pre-train domain embeddings. Furthermore, we partition proteins into multiple sub-views based on continuous joint domains for contrastive training under the supervision of a novel triplet InfoNCE loss. Our approach significantly and comprehensively outperforms the state-of-the-art methods on various benchmarks, and clearly differentiates proteins carrying distinct functions compared to the competitor. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 16 pages, 6 figures, 5 tables

arXiv:2405.12144 [pdf]

Alterations of electrocortical activity during hand movements induced by motor cortex glioma

Authors: Yihan Wu, Tao Chang, Siliang Chen, Xiaodong Niu, Yu Li, Yuan Fang, Lei Yang, Yixuan Zong, Yaoxin Yang, Yuehua Li, Mengsong Wang, Wen Yang, Yixuan Wu, Chen Fu, Xia Fang, Yuxin Quan, Xilin Peng, Qiang Sun, Marc M. Van Hulle, Yanhui Liu, Ning Jiang, Dario Farina, Yuan Yang, Jiayuan He, Qing Mao

Abstract: Glioma cells can reshape functional neuronal networks by hijacking neuronal synapses, leading to partial or complete neurological dysfunction. These mechanisms have been previously explored for language functions. However, the impact of glioma on sensorimotor functions is still unknown. Therefore, we recruited a control group of patients with unaffected motor cortex and a group of patients with gl… ▽ More Glioma cells can reshape functional neuronal networks by hijacking neuronal synapses, leading to partial or complete neurological dysfunction. These mechanisms have been previously explored for language functions. However, the impact of glioma on sensorimotor functions is still unknown. Therefore, we recruited a control group of patients with unaffected motor cortex and a group of patients with glioma-infiltrated motor cortex, and recorded high-density electrocortical signals during finger movement tasks. The results showed that glioma suppresses task-related synchronization in the high-gamma band and reduces the power across all frequency bands. The resulting atypical motor information transmission model with discrete signaling pathways and delayed responses disrupts the stability of neuronal encoding patterns for finger movement kinematics across various temporal-spatial scales. These findings demonstrate that gliomas functionally invade neural circuits within the motor cortex. This result advances our understanding of motor function processing in chronic disease states, which is important to advance the surgical strategies and neurorehabilitation approaches for patients with malignant gliomas. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.11096 [pdf]

MicroBundlePillarTrack, A Python package for automated segmentation, tracking, and analysis of pillar deflection in cardiac microbundles

Authors: Hiba Kobeissi, Xining Gao, Samuel J. DePalma, Jourdan K. Ewoldt, Miranda C. Wang, Shoshana L. Das, Javiera Jilberto, David Nordsletten, Brendon M. Baker, Christopher S. Chen, Emma Lejeune

Abstract: Movies of human induced pluripotent stem cell (hiPSC)-derived engineered cardiac tissue (microbundles) contain abundant information about structural and functional maturity. However, extracting these data in a reproducible and high-throughput manner remains a major challenge. Furthermore, it is not straightforward to make direct quantitative comparisons across the multiple in vitro experimental pl… ▽ More Movies of human induced pluripotent stem cell (hiPSC)-derived engineered cardiac tissue (microbundles) contain abundant information about structural and functional maturity. However, extracting these data in a reproducible and high-throughput manner remains a major challenge. Furthermore, it is not straightforward to make direct quantitative comparisons across the multiple in vitro experimental platforms employed to fabricate these tissues. Here, we present "MicroBundlePillarTrack," an open-source optical flow-based package developed in Python to track the deflection of pillars in cardiac microbundles grown on experimental platforms with two different pillar designs ("Type 1" and "Type 2" design). Our software is able to automatically segment both pillars, track their displacements, and output time-dependent metrics for contractility analysis, including beating amplitude and rate, contractile force, and tissue stress. Because this software is fully automated, it will allow for both faster and more reproducible analyses of larger datasets and it will enable more reliable cross-platform comparisons as compared to existing approaches that require manual steps and are tailored to one specific experimental platform. To complement this open-source software, we share a dataset of 1,540 brightfield example movies on which we have tested our software. Through sharing this data and software, our goal is to directly enable quantitative comparisons across labs, and facilitate future collective progress via the biomedical engineering open-source data and software ecosystem. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: 8 main pages, 1 main figure, Supplementary Information included

MSC Class: 92F05; 74A05 ACM Class: J.2; J.3

arXiv:2404.18443 [pdf, other]

BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers

Authors: Ran Xu, Wenqi Shi, Yue Yu, Yuchen Zhuang, Yanqiao Zhu, May D. Wang, Joyce C. Ho, Chao Zhang, Carl Yang

Abstract: Develo** effective biomedical retrieval models is important for excelling at knowledge-intensive biomedical tasks but still challenging due to the deficiency of sufficient publicly annotated biomedical data and computational resources. We present BMRetriever, a series of dense retrievers for enhancing biomedical retrieval via unsupervised pre-training on large biomedical corpora, followed by ins… ▽ More Develo** effective biomedical retrieval models is important for excelling at knowledge-intensive biomedical tasks but still challenging due to the deficiency of sufficient publicly annotated biomedical data and computational resources. We present BMRetriever, a series of dense retrievers for enhancing biomedical retrieval via unsupervised pre-training on large biomedical corpora, followed by instruction fine-tuning on a combination of labeled datasets and synthetic pairs. Experiments on 5 biomedical tasks across 11 datasets verify BMRetriever's efficacy on various biomedical applications. BMRetriever also exhibits strong parameter efficiency, with the 410M variant outperforming baselines up to 11.7 times larger, and the 2B variant matching the performance of models with over 5B parameters. The training data and model checkpoints are released at \url{https://huggingface.co/BMRetriever} to ensure transparency, reproducibility, and application to new domains. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Work in progress. The model and data will be uploaded to \url{https://github.com/ritaranx/BMRetriever}

arXiv:2404.18021 [pdf, other]

CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments

Authors: Kaixuan Huang, Yuanhao Qu, Henry Cousins, William A. Johnson, Di Yin, Mihir Shah, Denny Zhou, Russ Altman, Mengdi Wang, Le Cong

Abstract: The introduction of genome engineering technology has transformed biomedical research, making it possible to make precise changes to genetic information. However, creating an efficient gene-editing system requires a deep understanding of CRISPR technology, and the complex experimental systems under investigation. While Large Language Models (LLMs) have shown promise in various tasks, they often la… ▽ More The introduction of genome engineering technology has transformed biomedical research, making it possible to make precise changes to genetic information. However, creating an efficient gene-editing system requires a deep understanding of CRISPR technology, and the complex experimental systems under investigation. While Large Language Models (LLMs) have shown promise in various tasks, they often lack specific knowledge and struggle to accurately solve biological design problems. In this work, we introduce CRISPR-GPT, an LLM agent augmented with domain knowledge and external tools to automate and enhance the design process of CRISPR-based gene-editing experiments. CRISPR-GPT leverages the reasoning ability of LLMs to facilitate the process of selecting CRISPR systems, designing guide RNAs, recommending cellular delivery methods, drafting protocols, and designing validation experiments to confirm editing outcomes. We showcase the potential of CRISPR-GPT for assisting non-expert researchers with gene-editing experiments from scratch and validate the agent's effectiveness in a real-world use case. Furthermore, we explore the ethical and regulatory considerations associated with automated gene-editing design, highlighting the need for responsible and transparent use of these tools. Our work aims to bridge the gap between beginner biological researchers and CRISPR genome engineering techniques, and demonstrate the potential of LLM agents in facilitating complex biological discovery tasks. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.02924 [pdf, other]

Accounting for contact network uncertainty in epidemic inferences

Authors: Maxwell H. Wang, Jukka-Pekka Onnela

Abstract: When modeling the dynamics of infectious disease, the incorporation of contact network information allows for the capture of the non-randomness and heterogeneity of realistic contact patterns. Oftentimes, it is assumed that the underlying contact pattern is known with perfect certainty. However, in realistic settings, the observed data often serves as an imperfect proxy of the actual contact patte… ▽ More When modeling the dynamics of infectious disease, the incorporation of contact network information allows for the capture of the non-randomness and heterogeneity of realistic contact patterns. Oftentimes, it is assumed that the underlying contact pattern is known with perfect certainty. However, in realistic settings, the observed data often serves as an imperfect proxy of the actual contact patterns in the population. Furthermore, the epidemic in the real world are often not fully observed; event times such as infection and recovery times may be missing. In order to conduct accurate inferences on parameters of contagion spread, it is crucial to incorporate these sources of uncertainty. In this paper, we propose the use of Mixture Density Network compressed ABC (MDN-ABC) to learn informative summary statistics for the available data. This method will allow for Bayesian inference on the epidemic parameters of a contagious process, while accounting for imperfect observations on the epidemic and the contact network. We will demonstrate the use of this method on simulated epidemics and networks, and extend this framework to analyze the spread of Tattoo Skin Disease (TSD) among bottlenose dolphins in Shark Bay, Australia. △ Less

Submitted 15 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: 27 pages, 7 figures

arXiv:2404.00014 [pdf]

Deep Geometry Handling and Fragment-wise Molecular 3D Graph Generation

Authors: Odin Zhang, Yufei Huang, Shichen Cheng, Mengyao Yu, Xujun Zhang, Haitao Lin, Yundian Zeng, Mingyang Wang, Zhenxing Wu, Huifeng Zhao, Zaixi Zhang, Chenqing Hua, Yu Kang, Sunliang Cui, Peichen Pan, Chang-Yu Hsieh, Tingjun Hou

Abstract: Most earlier 3D structure-based molecular generation approaches follow an atom-wise paradigm, incrementally adding atoms to a partially built molecular fragment within protein pockets. These methods, while effective in designing tightly bound ligands, often overlook other essential properties such as synthesizability. The fragment-wise generation paradigm offers a promising solution. However, a co… ▽ More Most earlier 3D structure-based molecular generation approaches follow an atom-wise paradigm, incrementally adding atoms to a partially built molecular fragment within protein pockets. These methods, while effective in designing tightly bound ligands, often overlook other essential properties such as synthesizability. The fragment-wise generation paradigm offers a promising solution. However, a common challenge across both atom-wise and fragment-wise methods lies in their limited ability to co-design plausible chemical and geometrical structures, resulting in distorted conformations. In response to this challenge, we introduce the Deep Geometry Handling protocol, a more abstract design that extends the design focus beyond the model architecture. Through a comprehensive review of existing geometry-related models and their protocols, we propose a novel hybrid strategy, culminating in the development of FragGen - a geometry-reliable, fragment-wise molecular generation method. FragGen marks a significant leap forward in the quality of generated geometry and the synthesis accessibility of molecules. The efficacy of FragGen is further validated by its successful application in designing type II kinase inhibitors at the nanomolar level. △ Less

Submitted 15 March, 2024; originally announced April 2024.

arXiv:2403.00815 [pdf, other]

RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records

Authors: Ran Xu, Wenqi Shi, Yue Yu, Yuchen Zhuang, Bowen **, May D. Wang, Joyce C. Ho, Carl Yang

Abstract: We present RAM-EHR, a Retrieval AugMentation pipeline to improve clinical predictions on Electronic Health Records (EHRs). RAM-EHR first collects multiple knowledge sources, converts them into text format, and uses dense retrieval to obtain information related to medical concepts. This strategy addresses the difficulties associated with complex names for the concepts. RAM-EHR then augments the loc… ▽ More We present RAM-EHR, a Retrieval AugMentation pipeline to improve clinical predictions on Electronic Health Records (EHRs). RAM-EHR first collects multiple knowledge sources, converts them into text format, and uses dense retrieval to obtain information related to medical concepts. This strategy addresses the difficulties associated with complex names for the concepts. RAM-EHR then augments the local EHR predictive model co-trained with consistency regularization to capture complementary information from patient visits and summarized knowledge. Experiments on two EHR datasets show the efficacy of RAM-EHR over previous knowledge-enhanced baselines (3.4% gain in AUROC and 7.2% gain in AUPR), emphasizing the effectiveness of the summarized knowledge from RAM-EHR for clinical prediction tasks. The code will be published at \url{https://github.com/ritaranx/RAM-EHR}. △ Less

Submitted 4 June, 2024; v1 submitted 25 February, 2024; originally announced March 2024.

Comments: ACL 2024

Journal ref: ACL 2024

arXiv:2401.06173 [pdf, other]

Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization

Authors: Jiahao Qiu, Hui Yuan, **ghong Zhang, Wentao Chen, Huazheng Wang, Mengdi Wang

Abstract: While modern biotechnologies allow synthesizing new proteins and function measurements at scale, efficiently exploring a protein sequence space and engineering it remains a daunting task due to the vast sequence space of any given protein. Protein engineering is typically conducted through an iterative process of adding mutations to the wild-type or lead sequences, recombination of mutations, and… ▽ More While modern biotechnologies allow synthesizing new proteins and function measurements at scale, efficiently exploring a protein sequence space and engineering it remains a daunting task due to the vast sequence space of any given protein. Protein engineering is typically conducted through an iterative process of adding mutations to the wild-type or lead sequences, recombination of mutations, and running new rounds of screening. To enhance the efficiency of such a process, we propose a tree search-based bandit learning method, which expands a tree starting from the initial sequence with the guidance of a bandit machine learning model. Under simplified assumptions and a Gaussian Process prior, we provide theoretical analysis and a Bayesian regret bound, demonstrating that the combination of local search and bandit learning method can efficiently discover a near-optimal design. The full algorithm is compatible with a suite of randomized tree search heuristics, machine learning models, pre-trained embeddings, and bandit techniques. We test various instances of the algorithm across benchmark protein datasets using simulated screens. Experiment results demonstrate that the algorithm is both sample-efficient and able to find top designs using reasonably small mutation counts. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: AAAI 2024

arXiv:2401.04246 [pdf, other]

Scalable Normalizing Flows Enable Boltzmann Generators for Macromolecules

Authors: Joseph C. Kim, David Bloore, Karan Kapoor, Jun Feng, Ming-Hong Hao, Mengdi Wang

Abstract: The Boltzmann distribution of a protein provides a roadmap to all of its functional states. Normalizing flows are a promising tool for modeling this distribution, but current methods are intractable for typical pharmacological targets; they become computationally intractable due to the size of the system, heterogeneity of intra-molecular potential energy, and long-range interactions. To remedy the… ▽ More The Boltzmann distribution of a protein provides a roadmap to all of its functional states. Normalizing flows are a promising tool for modeling this distribution, but current methods are intractable for typical pharmacological targets; they become computationally intractable due to the size of the system, heterogeneity of intra-molecular potential energy, and long-range interactions. To remedy these issues, we present a novel flow architecture that utilizes split channels and gated attention to efficiently learn the conformational distribution of proteins defined by internal coordinates. We show that by utilizing a 2-Wasserstein loss, one can smooth the transition from maximum likelihood training to energy-based training, enabling the training of Boltzmann Generators for macromolecules. We evaluate our model and training strategy on villin headpiece HP35(nle-nle), a 35-residue subdomain, and protein G, a 56-residue protein. We demonstrate that standard architectures and training strategies, such as maximum likelihood alone, fail while our novel architecture and multi-stage training strategy are able to model the conformational distributions of protein G and HP35. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2312.12989 [pdf, other]

Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest

Authors: Emily Groves, Minhong Wang, Yusuf Abdulle, Holger Kunz, Jason Hoelscher-Obermaier, Ronin Wu, Honghan Wu

Abstract: Automated knowledge curation for biomedical ontologies is key to ensure that they remain comprehensive, high-quality and up-to-date. In the era of foundational language models, this study compares and analyzes three NLP paradigms for curation tasks: in-context learning (ICL), fine-tuning (FT), and supervised learning (ML). Using the Chemical Entities of Biological Interest (ChEBI) database as a mo… ▽ More Automated knowledge curation for biomedical ontologies is key to ensure that they remain comprehensive, high-quality and up-to-date. In the era of foundational language models, this study compares and analyzes three NLP paradigms for curation tasks: in-context learning (ICL), fine-tuning (FT), and supervised learning (ML). Using the Chemical Entities of Biological Interest (ChEBI) database as a model ontology, three curation tasks were devised. For ICL, three prompting strategies were employed with GPT-4, GPT-3.5, BioGPT. PubmedBERT was chosen for the FT paradigm. For ML, six embedding models were utilized for training Random Forest and Long-Short Term Memory models. Five setups were designed to assess ML and FT model performance across different data availability scenarios.Datasets for curation tasks included: task 1 (620,386), task 2 (611,430), and task 3 (617,381), maintaining a 50:50 positive versus negative ratio. For ICL models, GPT-4 achieved best accuracy scores of 0.916, 0.766 and 0.874 for tasks 1-3 respectively. In a direct comparison, ML (trained on ~260,000 triples) outperformed ICL in accuracy across all tasks. (accuracy differences: +.11, +.22 and +.17). Fine-tuned PubmedBERT performed similarly to leading ML models in tasks 1 & 2 (F1 differences: -.014 and +.002), but worse in task 3 (-.048). Simulations revealed performance declines in both ML and FT models with smaller and higher imbalanced training data. where ICL (particularly GPT-4) excelled in tasks 1 & 3. GPT-4 excelled in tasks 1 and 3 with less than 6,000 triples, surpassing ML/FT. ICL underperformed ML/FT in task 2.ICL-augmented foundation models can be good assistants for knowledge curation with correct prompting, however, not making ML and FT paradigms obsolete. The latter two require task-specific data to beat ICL. In such cases, ML relies on small pretrained embeddings, minimizing computational demands. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 26 pages, 5 figures, 14 tables

arXiv:2311.04238 [pdf, other]

Flexible Bayesian Inference on Partially Observed Epidemics

Authors: Maxwell H. Wang, Jukka-Pekka Onnela

Abstract: Individual-based models of contagious processes are useful for predicting epidemic trajectories and informing intervention strategies. In such models, the incorporation of contact network information can capture the non-randomness and heterogeneity of realistic contact dynamics. In this paper, we consider Bayesian inference on the spreading parameters of an SIR contagion on a known, static network… ▽ More Individual-based models of contagious processes are useful for predicting epidemic trajectories and informing intervention strategies. In such models, the incorporation of contact network information can capture the non-randomness and heterogeneity of realistic contact dynamics. In this paper, we consider Bayesian inference on the spreading parameters of an SIR contagion on a known, static network, where information regarding individual disease status is known only from a series of tests (positive or negative disease status). When the contagion model is complex or information such as infection and removal times is missing, the posterior distribution can be difficult to sample from. Previous work has considered the use of Approximate Bayesian Computation (ABC), which allows for simulation-based Bayesian inference on complex models. However, ABC methods usually require the user to select reasonable summary statistics. Here, we consider an inference scheme based on the Mixture Density Network compressed ABC (MDN-ABC), which minimizes the expected posterior entropy in order to learn informative summary statistics. This allows us to conduct Bayesian inference on the parameters of a partially observed contagious process while also circumventing the need for manual summary statistic selection. This methodology can be extended to incorporate additional simulation complexities, including behavioral change after positive tests or false test results. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: 27 pages, 7 figures

arXiv:2308.01241 [pdf, other]

Digital Twin Brain: a simulation and assimilation platform for whole human brain

Authors: Wenlian Lu, Longbin Zeng, Xin Du, Wenyong Zhang, Shitong Xiang, Huarui Wang, Jiexiang Wang, Mingda Ji, Yubo Hou, Minglong Wang, Yuhao Liu, Zhongyu Chen, Qibao Zheng, Ningsheng Xu, Jianfeng Feng

Abstract: In this work, we present a computing platform named digital twin brain (DTB) that can simulate spiking neuronal networks of the whole human brain scale and more importantly, a personalized biological brain structure. In comparison to most brain simulations with a homogeneous global structure, we highlight that the sparseness, couplingness and heterogeneity in the sMRI, DTI and PET data of the brai… ▽ More In this work, we present a computing platform named digital twin brain (DTB) that can simulate spiking neuronal networks of the whole human brain scale and more importantly, a personalized biological brain structure. In comparison to most brain simulations with a homogeneous global structure, we highlight that the sparseness, couplingness and heterogeneity in the sMRI, DTI and PET data of the brain has an essential impact on the efficiency of brain simulation, which is proved from the scaling experiments that the DTB of human brain simulation is communication-intensive and memory-access intensive computing systems rather than computation-intensive. We utilize a number of optimization techniques to balance and integrate the computation loads and communication traffics from the heterogeneous biological structure to the general GPU-based HPC and achieve leading simulation performance for the whole human brain-scaled spiking neuronal networks. On the other hand, the biological structure, equipped with a mesoscopic data assimilation, enables the DTB to investigate brain cognitive function by a reverse-engineering method, which is demonstrated by a digital experiment of visual evaluation on the DTB. Furthermore, we believe that the develo** DTB will be a promising powerful platform for a large of research orients including brain-inspiredintelligence, rain disease medicine and brain-machine interface. △ Less

Submitted 2 August, 2023; originally announced August 2023.

Comments: 12 pages, 11 figures

arXiv:2212.06394 [pdf]

Tangent functional connectomes uncover more unique phenotypic traits

Authors: Kausar Abbas, Mintao Liu, Michael Wang, Duy Duong-Tran, Uttara Tipnis, Enrico Amico, Alan D. Kaplan, Mario Dzemidzic, David Kareken, Beau M. Ances, Jaroslaw Harezlak, Joaquín Goñi

Abstract: Functional connectomes (FCs) contain pairwise estimations of functional couplings based on pairs of brain regions activity. FCs are commonly represented as correlation matrices that are symmetric positive definite (SPD) lying on or inside the SPD manifold. Since the geometry on the SPD manifold is non-Euclidean, the inter-related entries of FCs undermine the use of Euclidean-based distances. By pr… ▽ More Functional connectomes (FCs) contain pairwise estimations of functional couplings based on pairs of brain regions activity. FCs are commonly represented as correlation matrices that are symmetric positive definite (SPD) lying on or inside the SPD manifold. Since the geometry on the SPD manifold is non-Euclidean, the inter-related entries of FCs undermine the use of Euclidean-based distances. By projecting FCs into a tangent space, we can obtain tangent functional connectomes (tangent-FCs). Tangent-FCs have shown a higher predictive power of behavior and cognition, but no studies have evaluated the effect of such projections with respect to fingerprinting. We hypothesize that tangent-FCs have a higher fingerprint than regular FCs. Fingerprinting was measured by identification rates (ID rates) on test-retest FCs as well as on monozygotic and dizygotic twins. Our results showed that identification rates are systematically higher when using tangent-FCs. Specifically, we found: (i) Riemann and log-Euclidean matrix references systematically led to higher ID rates. (ii) In tangent-FCs, Main-diagonal regularization prior to tangent space projection was critical for ID rate when using Euclidean distance, whereas barely affected ID rates when using correlation distance. (iii) ID rates were dependent on condition and fMRI scan length. (iv) Parcellation granularity was key for ID rates in FCs, as well as in tangent-FCs with fixed regularization, whereas optimal regularization of tangent-FCs mostly removed this effect. (v) Correlation distance in tangent-FCs outperformed any other configuration of distance on FCs or on tangent-FCs across the fingerprint gradient (here sampled by assessing test-retest, Monozygotic and Dizygotic twins). (vi)ID rates tended to be higher in task scans compared to resting-state scans when accounting for fMRI scan length. △ Less

Submitted 9 June, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

Comments: 31 pages, 10 figures, 2 tables

arXiv:2211.05658 [pdf, other]

doi 10.1016/j.neuroimage.2023.120348

Multi-objective optimization via evolutionary algorithm (MOVEA) for high-definition transcranial electrical stimulation of the human brain

Authors: Mo Wang, Kexin Lou, Zeming Liu, Pengfei Wei, Quanying Liu

Abstract: Designing a transcranial electrical stimulation (TES) strategy requires considering multiple objectives, such as intensity in the target area, focality, stimulation depth, and avoidance zone, which are often mutually exclusive. A computational framework for optimizing different strategies and comparing trade-offs between these objectives is currently lacking. In this paper, we propose a general fr… ▽ More Designing a transcranial electrical stimulation (TES) strategy requires considering multiple objectives, such as intensity in the target area, focality, stimulation depth, and avoidance zone, which are often mutually exclusive. A computational framework for optimizing different strategies and comparing trade-offs between these objectives is currently lacking. In this paper, we propose a general framework called multi-objective optimization via evolutionary algorithms (MOVEA) to address the non-convex optimization problem in designing TES strategies without predefined direction. MOVEA enables simultaneous optimization of multiple targets through Pareto optimization, generating a Pareto front after a single run without manual weight adjustment and allowing easy expansion to more targets. This Pareto front consists of optimal solutions that meet various requirements while respecting trade-off relationships between conflicting objectives such as intensity and focality. MOVEA is versatile and suitable for both transcranial alternating current stimulation (tACS) and transcranial temporal interference stimulation (tTIS) based on high definition (HD) and two-pair systems. We performed a comprehensive comparison between tACS and tTIS in terms of intensity, focality, and steerability for targets at different depths.MOVEA facilitates the optimization of TES based on specific objectives and constraints, advancing tTIS and tACS-based neuromodulation in understanding the causal relationship between brain regions and cognitive functions and in treating diseases. The code for MOVEA is available at https://github.com/ncclabsustech/MOVEA. △ Less

Submitted 3 April, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

Journal ref: NeuroImage, Volume 280, 2020

arXiv:2210.05713 [pdf, other]

Explainable fMRI-based Brain Decoding via Spatial Temporal-pyramid Graph Convolutional Network

Authors: Ziyuan Ye, Youzhi Qu, Zhichao Liang, Mo Wang, Quanying Liu

Abstract: Brain decoding, aiming to identify the brain states using neural activity, is important for cognitive neuroscience and neural engineering. However, existing machine learning methods for fMRI-based brain decoding either suffer from low classification performance or poor explainability. Here, we address this issue by proposing a biologically inspired architecture, Spatial Temporal-pyramid Graph Conv… ▽ More Brain decoding, aiming to identify the brain states using neural activity, is important for cognitive neuroscience and neural engineering. However, existing machine learning methods for fMRI-based brain decoding either suffer from low classification performance or poor explainability. Here, we address this issue by proposing a biologically inspired architecture, Spatial Temporal-pyramid Graph Convolutional Network (STpGCN), to capture the spatial-temporal graph representation of functional brain activities. By designing multi-scale spatial-temporal pathways and bottom-up pathways that mimic the information process and temporal integration in the brain, STpGCN is capable of explicitly utilizing the multi-scale temporal dependency of brain activities via graph, thereby achieving high brain decoding performance. Additionally, we propose a sensitivity analysis method called BrainNetX to better explain the decoding results by automatically annotating task-related brain regions from the brain-network standpoint. We conduct extensive experiments on fMRI data under 23 cognitive tasks from Human Connectome Project (HCP) S1200. The results show that STpGCN significantly improves brain decoding performance compared to competing baseline models; BrainNetX successfully annotates task-relevant brain regions. Post hoc analysis based on these regions further validates that the hierarchical structure in STpGCN significantly contributes to the explainability, robustness and generalization of the model. Our methods not only provide insights into information representation in the brain under multiple cognitive tasks but also indicate a bright future for fMRI-based brain decoding. △ Less

Submitted 8 October, 2022; originally announced October 2022.

arXiv:2208.04314 [pdf]

TripHLApan: predicting HLA molecules binding peptides based on triple coding matrix and transfer learning

Authors: Meng Wang, Chuqi Lei, Jianxin Wang, Yaohang Li, Min Li

Abstract: Human leukocyte antigen (HLA) is an important molecule family in the field of human immunity, which recognizes foreign threats and triggers immune responses by presenting peptides to T cells. In recent years, the synthesis of tumor vaccines to induce specific immune responses has become the forefront of cancer treatment. Computationally modeling the binding patterns between peptide and HLA can gre… ▽ More Human leukocyte antigen (HLA) is an important molecule family in the field of human immunity, which recognizes foreign threats and triggers immune responses by presenting peptides to T cells. In recent years, the synthesis of tumor vaccines to induce specific immune responses has become the forefront of cancer treatment. Computationally modeling the binding patterns between peptide and HLA can greatly accelerate the development of tumor vaccines. However, most of the prediction methods performance is very limited and they cannot fully take advantage of the analysis of existing biological knowledge as the basis of modeling. In this paper, we propose TripHLApan, a novel pan-specific prediction model, for HLA molecular peptide binding prediction. TripHLApan exhibits powerful prediction ability by integrating triple coding matrix, BiGRU + Attention models, and transfer learning strategy. The comprehensive evaluations demonstrate the effectiveness of TripHLApan in predicting HLA-I and HLA-II peptide binding in different test environments. The predictive power of HLA-I is further demonstrated in the latest data set. In addition, we show that TripHLApan has strong binding reconstitution ability in the samples of a melanoma patient. In conclusion, TripHLApan is a powerful tool for predicting the binding of HLA-I and HLA-II molecular peptides for the synthesis of tumor vaccines. △ Less

Submitted 6 August, 2022; originally announced August 2022.

Comments: 25 pages, 7 figures

arXiv:2206.12240 [pdf, other]

PSP: Million-level Protein Sequence Dataset for Protein Structure Prediction

Authors: Sirui Liu, Jun Zhang, Haotian Chu, Min Wang, Boxin Xue, Ningxi Ni, Jialiang Yu, Yuhao Xie, Zhenyu Chen, Mengyun Chen, Yuan Liu, Piya Patra, Fan Xu, Jie Chen, Zidong Wang, Lijiang Yang, Fan Yu, Lei Chen, Yi Qin Gao

Abstract: Proteins are essential component of human life and their structures are important for function and mechanism analysis. Recent work has shown the potential of AI-driven methods for protein structure prediction. However, the development of new models is restricted by the lack of dataset and benchmark training procedure. To the best of our knowledge, the existing open source datasets are far less to… ▽ More Proteins are essential component of human life and their structures are important for function and mechanism analysis. Recent work has shown the potential of AI-driven methods for protein structure prediction. However, the development of new models is restricted by the lack of dataset and benchmark training procedure. To the best of our knowledge, the existing open source datasets are far less to satisfy the needs of modern protein sequence-structure related research. To solve this problem, we present the first million-level protein structure prediction dataset with high coverage and diversity, named as PSP. This dataset consists of 570k true structure sequences (10TB) and 745k complementary distillation sequences (15TB). We provide in addition the benchmark training procedure for SOTA protein structure prediction model on this dataset. We validate the utility of this dataset for training by participating CAMEO contest in which our model won the first place. We hope our PSP dataset together with the training benchmark can enable a broader community of AI/biology researchers for AI-driven protein related research. △ Less

Submitted 24 June, 2022; originally announced June 2022.

arXiv:2109.00123 [pdf, ps, other]

doi 10.1103/PhysRevE.104.034405

Regulatory Feedback Effects on Tissue Growth Dynamics in a Two-Stage Cell Lineage Model

Authors: Mao-Xiang Wang, Arthur Lander, Pik-Yin Lai

Abstract: Identifying the mechanism of intercellular feedback regulation is critical for the basic understanding of tissue growth control in organisms. In this paper, we analyze a tissue growth model consisting of a single lineage of two cell types regulated by negative feedback signalling molecules that undergo spatial diffusion. By deriving the fixed points for the uniform steady states and carrying out l… ▽ More Identifying the mechanism of intercellular feedback regulation is critical for the basic understanding of tissue growth control in organisms. In this paper, we analyze a tissue growth model consisting of a single lineage of two cell types regulated by negative feedback signalling molecules that undergo spatial diffusion. By deriving the fixed points for the uniform steady states and carrying out linear stability analysis, phase diagrams are obtained analytically for arbitrary parameters of the model. Two different generic growth modes are found: blow-up growth and final-state controlled growth which are governed by the non-trivial fixed point and the trivial fixed point respectively, and can be sensitively switched by varying the negative feedback regulation on the proliferation of the stem cells. Analytic expressions for the characteristic time scales for these two growth modes are also derived. Remarkably, the trivial and non-trivial uniform steady states can coexist and a sharp transition occurs in the bistable regime as the relevant parameters are varied. Furthermore, the bi-stable growth properties allows for the external control to switch between these two growth modes. In addition, the condition for an early accelerated growth followed by a retarded growth can be derived. These analytical results are further verified by numerical simulations and provide insights on the growth behavior of the tissue. Our results are also discussed in the light of possible realistic biological experiments and tissue growth control strategy. Furthermore, by external feedback control of the concentration of regulatory molecules, it is possible to achieve a desired growth mode, as demonstrated with an analysis of boosted growth, catch-up growth and the design for the target of a linear growth dynamic. △ Less

Submitted 31 August, 2021; originally announced September 2021.

Comments: to be published in Physical Review E

arXiv:2104.10878 [pdf, other]

doi 10.3934/math.2022376

Comparing regional and provincial-wide COVID-19 models with physical distancing in British Columbia

Authors: Geoffrey McGregor, Jennifer Tippett, Andy T. S. Wan, Mengxiao Wang, Samuel W. K. Wong

Abstract: We study the effects of physical distancing measures for the spread of COVID-19 in regional areas within British Columbia, using the reported cases of the five provincial Health Authorities. Building on the Bayesian epidemiological model of Anderson et al. (2020), we propose a hierarchical regional Bayesian model with time-varying regional parameters between March to December of 2020. In the absen… ▽ More We study the effects of physical distancing measures for the spread of COVID-19 in regional areas within British Columbia, using the reported cases of the five provincial Health Authorities. Building on the Bayesian epidemiological model of Anderson et al. (2020), we propose a hierarchical regional Bayesian model with time-varying regional parameters between March to December of 2020. In the absence of COVID-19 variants and vaccinations during this period, we examine the regionalized basic reproduction number, modelled prevalence, relative reduction in contact due to physical distancing, and proportion of anticipated cases that have been tested and reported. We observe significant differences between the regional and provincial-wide models and demonstrate the hierarchical regional model can better estimate regional prevalence, especially in rural regions. These results indicate that it can be useful to apply similar regional models to other parts of Canada or other countries. △ Less

Submitted 13 November, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

Comments: 35 pages, 16 figures

Journal ref: AIMS Mathematics, 2022, 7(4): 6743-6778

arXiv:2104.01474 [pdf, other]

Thalamocortical contribution to solving credit assignment in neural systems

Authors: Mien Brabeeba Wang, Michael M. Halassa

Abstract: Animal brains evolved to optimize behavior in dynamically changing environments, selecting actions that maximize future rewards. A large body of experimental work indicates that such optimization changes the wiring of neural circuits, appropriately map** environmental input onto behavioral outputs. A major unsolved scientific question is how optimal wiring adjustments, which must target the conn… ▽ More Animal brains evolved to optimize behavior in dynamically changing environments, selecting actions that maximize future rewards. A large body of experimental work indicates that such optimization changes the wiring of neural circuits, appropriately map** environmental input onto behavioral outputs. A major unsolved scientific question is how optimal wiring adjustments, which must target the connections responsible for rewards, can be accomplished when the relation between sensory inputs, action taken, environmental context with rewards is ambiguous. The computational problem of properly targeting cues, contexts and actions that lead to reward is known as structural, contextual and temporal credit assignment respectively. In this review, we survey prior approaches to these three types of problems and advance the notion that the brain's specialized neural architectures provide efficient solutions. Within this framework, the thalamus with its cortical and basal ganglia interactions serve as a systems-level solution to credit assignment. Specifically, we propose that thalamocortical interaction is the locus of meta-learning where the thalamus provides cortical control functions that parametrize the cortical activity association space. By selecting among these control functions, the basal ganglia hierarchically guide thalamocortical plasticity across two timescales to enable meta-learning. The faster timescale establishes contextual associations to enable rapid behavioral flexibility while the slower one enables generalization to new contexts. Incorporating different thalamic control functions under this framework clarifies how thalamocortical-basal ganglia interactions may simultaneously solve the three credit assignment problems. △ Less

Submitted 3 April, 2021; originally announced April 2021.

arXiv:2103.00399 [pdf]

Hydrophobic interaction determines docking affinity of SARS CoV 2 variants with antibodies

Authors: Jiacheng Li, Chengyu Hou, Menghao Wang, Chencheng Liao, Shuai Guo, Li** Shi, Xiaoliang Ma, Hongchi Zhang, Shenda Jiang, Bing Zheng, Lin Ye, Lin Yang, Xiaodong He

Abstract: Preliminary epidemiologic, phylogenetic and clinical findings suggest that several novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants have increased transmissibility and decreased efficacy of several existing vaccines. Four mutations in the receptor-binding domain (RBD) of the spike protein that are reported to contribute to increased transmission. Understanding physical m… ▽ More Preliminary epidemiologic, phylogenetic and clinical findings suggest that several novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants have increased transmissibility and decreased efficacy of several existing vaccines. Four mutations in the receptor-binding domain (RBD) of the spike protein that are reported to contribute to increased transmission. Understanding physical mechanism responsible for the affinity enhancement between the SARS-CoV-2 variants and ACE2 is the "urgent challenge" for develo** blockers, vaccines and therapeutic antibodies against the coronavirus disease 2019 (COVID-19) pandemic. Based on a hydrophobic-interaction-based protein docking mechanism, this study reveals that the mutation N501Y obviously increased the hydrophobic attraction and decrease hydrophilic repulsion between the RBD and ACE2 that most likely caused the transmissibility increment of the variants. By analyzing the mutation-induced hydrophobic surface changes in the attraction and repulsion at the binding site of the complexes of the SARS-CoV-2 variants and antibodies, we found out that all the mutations of N501Y, E484K, K417N and L452R can selectively decrease or increase their binding affinity with some antibodies. △ Less

Submitted 28 February, 2021; originally announced March 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2008.11883

arXiv:2102.13276 [pdf, other]

Spectral Top-Down Recovery of Latent Tree Models

Authors: Yariv Aizenbud, Ariel Jaffe, Meng Wang, Amber Hu, Noah Amsel, Boaz Nadler, Joseph T. Chang, Yuval Kluger

Abstract: Modeling the distribution of high dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common appro… ▽ More Modeling the distribution of high dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common approach, termed divide-and-conquer, is to recover the tree structure in two steps. First, recover the structure separately of multiple, possibly random subsets of the terminal nodes. Second, merge the resulting subtrees to form a full tree. Here, we develop Spectral Top-Down Recovery (STDR), a deterministic divide-and-conquer approach to infer large latent tree models. Unlike previous methods, STDR partitions the terminal nodes in a non random way, based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes. We prove that under certain conditions, this partitioning is consistent with the tree structure. This, in turn, leads to a significantly simpler merging procedure of the small subtrees. We prove that STDR is statistically consistent and bound the number of samples required to accurately recover the tree with high probability. Using simulated data from several common tree models in phylogenetics, we demonstrate that STDR has a significant advantage in terms of runtime, with improved or similar accuracy. △ Less

Submitted 7 December, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

arXiv:2102.05440 [pdf]

Protein corona critically affects the bio-behaviors of SARS-CoV-2

Authors: Yue-wen Yin, Yan-**g Sheng, Min Wang, Song-di Ni, Hong-ming Ding, Yu-qiang Ma

Abstract: The outbreak of the coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) has become a worldwide public health crisis. When the SARS-CoV-2 enters the biological fluids in the human body, different types of biomolecules (in particular proteins) may adsorb on its surface and alter its infection ability. Although great efforts have recently been de… ▽ More The outbreak of the coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) has become a worldwide public health crisis. When the SARS-CoV-2 enters the biological fluids in the human body, different types of biomolecules (in particular proteins) may adsorb on its surface and alter its infection ability. Although great efforts have recently been devoted to the interaction of the specific antibodies with the SARS-CoV-2, it still remains largely unknown how the other serum proteins affect the infection of the SARS-CoV-2. In this work, we systematically investigate the interaction of serum proteins with the SARS-CoV-2 RBD by the molecular docking and the all-atom molecular dynamics simulations. It is found that the non-specific immunoglobulin (Ig) indeed cannot effectively bind to the SARS-CoV-2 RBD while the human serum albumin (HSA) may have some potential of blocking its infection (to ACE2). More importantly, we find that the RBD can cause the significant structural change of the Apolipoprotein E (ApoE), by which SARS-CoV-2 may hijack the metabolic pathway of the ApoE to facilitate its cell entry. The present study enhances the understanding of the role of protein corona in the bio-behaviors of SARS-CoV-2, which may aid the more precise and personalized treatment for COVID-19 infection in the clinic. △ Less

Submitted 10 February, 2021; originally announced February 2021.

Comments: 18 pages, 7 figures

arXiv:2005.14669 [pdf, other]

Mutations strengthened SARS-CoV-2 infectivity

Authors: Jiahui Chen, Rui Wang, Menglun Wang, Guo-Wei Wei

Abstract: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infectivity is a major concern in coronavirus disease 2019 (COVID-19) prevention and economic reopening. However, rigorous determination of SARS-COV-2 infectivity is essentially impossible owing to its continuous evolution with over 13752 single nucleotide polymorphisms (SNP) variants in six different subtypes. We develop an advanced mac… ▽ More Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infectivity is a major concern in coronavirus disease 2019 (COVID-19) prevention and economic reopening. However, rigorous determination of SARS-COV-2 infectivity is essentially impossible owing to its continuous evolution with over 13752 single nucleotide polymorphisms (SNP) variants in six different subtypes. We develop an advanced machine learning algorithm based on the algebraic topology to quantitatively evaluate the binding affinity changes of SARS-CoV-2 spike glycoprotein (S protein) and host angiotensin-converting enzyme 2 (ACE2) receptor following the mutations. Based on mutation-induced binding affinity changes, we reveal that five out of six SARS-CoV-2 subtypes have become either moderately or slightly more infectious, while one subtype has weakened its infectivity. We find that SARS-CoV-2 is slightly more infectious than SARS-CoV according to computed S protein-ACE2 binding affinity changes. Based on a systematic evaluation of all possible 3686 future mutations on the S protein receptor-binding domain (RBD), we show that most likely future mutations will make SARS-CoV-2 more infectious. Combining sequence alignment, probability analysis, and binding affinity calculation, we predict that a few residues on the receptor-binding motif (RBM), i.e., 452, 489, 500, 501, and 505, have very high chances to mutate into significantly more infectious COVID-19 strains. △ Less

Submitted 27 May, 2020; originally announced May 2020.

Comments: 24 pages, 2 tables and 19 figures

arXiv:2005.11935 [pdf]

A Novel Approach of using AR and Smart Surgical Glasses Supported Trauma Care

Authors: Anurag Lal, Ming-Hsien Hu, Pei-Yuan Lee, Min Liang Wang

Abstract: BACKGROUND: Augmented reality (AR) is gaining popularity in varying field such as computer gaming and medical education fields. However, still few of applications in real surgeries. Orthopedic surgical applications are currently limited and underdeveloped. - METHODS: The clinic validation was prepared with the currently available AR equipment and software. A total of 1 Vertebroplasty, 2 ORIF Pelvi… ▽ More BACKGROUND: Augmented reality (AR) is gaining popularity in varying field such as computer gaming and medical education fields. However, still few of applications in real surgeries. Orthopedic surgical applications are currently limited and underdeveloped. - METHODS: The clinic validation was prepared with the currently available AR equipment and software. A total of 1 Vertebroplasty, 2 ORIF Pelvis fracture, 1 ORIF with PFN for Proximal Femoral Fracture, 1 CRIF for distal radius fracture and 2 ORIF for Tibia Fracture cases were performed with fluoroscopy combined with AR smart surgical glasses system. - RESULTS: A total of 1 Vertebroplasty, 2 ORIF Pelvis fracture, 1 ORIF with PFN for Proximal Femoral Fracture, 1 CRIF for distal radius fracture and 2 ORIF for Tibia Fracture cases are performed to evaluate the benefits of AR surgery. Among the AR surgeries, surgeons wear the smart surgical are lot reduce of eyes of turns to focus on the monitors. This paper shows the potential ability of augmented reality technology for trauma surgery. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: 10 pages, 9 Figures, Conference. arXiv admin note: text overlap with arXiv:1801.01560 by other authors

arXiv:2002.07096 [pdf]

Visual Data Analysis and Simulation Prediction for COVID-19

Authors: Baoquan Chen, Mingyi Shi, Xingyu Ni, Liangwang Ruan, Hongda Jiang, Heyuan Yao, Mengdi Wang, Zhenhua Song, Qiang Zhou, Tong Ge

Abstract: The COVID-19 (formerly, 2019-nCoV) epidemic has become a global health emergency, as such, WHO declared PHEIC. China has taken the most hit since the outbreak of the virus, which could be dated as far back as late November by some experts. It was not until January 23rd that the Wuhan government finally recognized the severity of the epidemic and took a drastic measure to curtain the virus spread b… ▽ More The COVID-19 (formerly, 2019-nCoV) epidemic has become a global health emergency, as such, WHO declared PHEIC. China has taken the most hit since the outbreak of the virus, which could be dated as far back as late November by some experts. It was not until January 23rd that the Wuhan government finally recognized the severity of the epidemic and took a drastic measure to curtain the virus spread by closing down all transportation connecting the outside world. In this study, we seek to answer a few questions: How did the virus get spread from the epicenter Wuhan city to the rest of the country? To what extent did the measures, such as, city closure and community quarantine, help controlling the situation? More importantly, can we forecast any significant future development of the event had some of the conditions changed? By collecting and visualizing publicly available data, we first show patterns and characteristics of the epidemic development; we then employ a mathematical model of disease transmission dynamics to evaluate the effectiveness of some epidemic control measures, and more importantly, to offer a few tips on preventive measures. △ Less

Submitted 6 March, 2020; v1 submitted 14 February, 2020; originally announced February 2020.

Comments: 19 pages, 21 figures, revised English version and originally Chinese version

arXiv:1911.03839 [pdf, ps, other]

In Vitro Fertilization (IVF) Cumulative Pregnancy Rate Prediction from Basic Patient Characteristics

Authors: Bo Zhang, Yuqi Cui, Meng Wang, **g**g Li, Lei **, Dongrui Wu

Abstract: Tens of millions of women suffer from infertility worldwide each year. In vitro fertilization (IVF) is the best choice for many such patients. However, IVF is expensive, time-consuming, and both physically and emotionally demanding. The first question that a patient usually asks before the IVF is how likely she will conceive, given her basic medical examination information. This paper proposes thr… ▽ More Tens of millions of women suffer from infertility worldwide each year. In vitro fertilization (IVF) is the best choice for many such patients. However, IVF is expensive, time-consuming, and both physically and emotionally demanding. The first question that a patient usually asks before the IVF is how likely she will conceive, given her basic medical examination information. This paper proposes three approaches to predict the cumulative pregnancy rate after multiple oocyte pickup cycles. Experiments on 11,190 patients showed that first clustering the patients into different groups and then building a support vector machine model for each group can achieve the best overall performance. Our model could be a quick and economic approach for reliably estimating the cumulative pregnancy rate for a patient, given only her basic medical examination information, well before starting the actual IVF procedure. The predictions can help the patient make optimal decisions on whether to use her own oocyte or donor oocyte, how many oocyte pickup cycles she may need, whether to use embryo frozen, etc. They will also reduce the patient's cost and time to pregnancy, and improve her quality of life. △ Less

Submitted 9 November, 2019; originally announced November 2019.

arXiv:1911.02363 [pdf, other]

ODE-Inspired Analysis for the Biological Version of Oja's Rule in Solving Streaming PCA

Authors: Chi-Ning Chou, Mien Brabeeba Wang

Abstract: Oja's rule [Oja, Journal of mathematical biology 1982] is a well-known biologically-plausible algorithm using a Hebbian-type synaptic update rule to solve streaming principal component analysis (PCA). Computational neuroscientists have known that this biological version of Oja's rule converges to the top eigenvector of the covariance matrix of the input in the limit. However, prior to this work, i… ▽ More Oja's rule [Oja, Journal of mathematical biology 1982] is a well-known biologically-plausible algorithm using a Hebbian-type synaptic update rule to solve streaming principal component analysis (PCA). Computational neuroscientists have known that this biological version of Oja's rule converges to the top eigenvector of the covariance matrix of the input in the limit. However, prior to this work, it was open to prove any convergence rate guarantee. In this work, we give the first convergence rate analysis for the biological version of Oja's rule in solving streaming PCA. Moreover, our convergence rate matches the information theoretical lower bound up to logarithmic factors and outperforms the state-of-the-art upper bound for streaming PCA. Furthermore, we develop a novel framework inspired by ordinary differential equations (ODE) to analyze general stochastic dynamics. The framework abandons the traditional step-by-step analysis and instead analyzes a stochastic dynamic in one-shot by giving a closed-form solution to the entire dynamic. The one-shot framework allows us to apply stop** time and martingale techniques to have a flexible and precise control on the dynamic. We believe that this general framework is powerful and should lead to effective yet simple analysis for a large class of problems with stochastic dynamics. △ Less

Submitted 17 June, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2020

arXiv:1909.07784 [pdf, other]

MathDL: Mathematical deep learning for D3R Grand Challenge 4

Authors: Duc Duy Nguyen, Kaifu Gao, Menglun Wang, Guo-Wei Wei

Abstract: We present the performances of our mathematical deep learning (MathDL) models for D3R Grand Challenge 4 (GC4). This challenge involves pose prediction, affinity ranking, and free energy estimation for beta secretase 1 (BACE) as well as affinity ranking and free energy estimation for Cathepsin S (CatS). We have developed advanced mathematics, namely differential geometry, algebraic graph, and/or al… ▽ More We present the performances of our mathematical deep learning (MathDL) models for D3R Grand Challenge 4 (GC4). This challenge involves pose prediction, affinity ranking, and free energy estimation for beta secretase 1 (BACE) as well as affinity ranking and free energy estimation for Cathepsin S (CatS). We have developed advanced mathematics, namely differential geometry, algebraic graph, and/or algebraic topology, to accurately and efficiently encode high dimensional physical/chemical interactions into scalable low-dimensional rotational and translational invariant representations. These representations are integrated with deep learning models, such as generative adversarial networks (GAN) and convolutional neural networks (CNN) for pose prediction and energy evaluation, respectively. Overall, our MathDL models achieved the top place in pose prediction for BACE ligands in Stage 1a. Moreover, our submissions obtained the highest Spearman correlation coefficient on the affinity ranking of 460 CatS compounds, and the smallest centered root mean square error on the free energy set of 39 CatS molecules. It is worthy to mention that our method for docking pose predictions has significantly improved from our previous ones. △ Less

Submitted 17 September, 2019; originally announced September 2019.

Comments: 24 pages, 9 figure, and one table

arXiv:1908.00572 [pdf, other]

The de Rham-Hodge analysis and modeling of biomolecules

Authors: Rundong Zhao, Menglun Wang, Yiying Tong, Guo-Wei Wei

Abstract: Recent years have witnessed a trend that advanced mathematical tools, such as algebraic topology, differential geometry, graph theory, and partial differential equations, have been developed for describing biological macromolecules. These tools have considerably strengthened our ability to understand the molecular mechanism of macromolecular function, dynamics and transport from their structures.… ▽ More Recent years have witnessed a trend that advanced mathematical tools, such as algebraic topology, differential geometry, graph theory, and partial differential equations, have been developed for describing biological macromolecules. These tools have considerably strengthened our ability to understand the molecular mechanism of macromolecular function, dynamics and transport from their structures. However, currently, there is no unified mathematical theory to analyze, describe and characterize biological macromolecular geometry, topology, flexibility and natural mode at a variety of scales. We introduce the de Rham-Hodge theory, a landmark of 20th Century's mathematics, as a unified paradigm for analyzing biological macromolecular geometry and algebraic topology, for predicting macromolecular flexibility, and for modeling macromolecular natural modes at a variety of scales. In this paradigm, macromolecular geometric characteristic and topological invariants are revealed by de Rham-Hodge spectral analysis. By using the Helmholtz-Hodge decomposition, every macromolecular vector field is split into orthogonal divergence-free, curl-free, and harmonic components with a distinct physical interpretation. We organize the eigenvalues and eigenvectors of the 0-form Laplace-de Rham operator into one of the most accurate protein B-factor predictors. By combining the 1-form Laplace-de Rham operator and the Helfrich-type curvature energy, we predict the natural modes of both X-ray protein structures and cryo-EM maps. We construct accurate and efficient three-dimensional discrete exterior calculus algorithms for the aforementioned modeling and analysis of biological macromolecules. Using extensive experiments, we validate that the proposed paradigm is one of the most versatile and powerful tools for biological macromolecular studies. △ Less

Submitted 1 August, 2019; originally announced August 2019.

Comments: 13 figures, one table

arXiv:1809.04352 [pdf, other]

Divide-and-Conquer Strategy for Large-Scale Eulerian Solvent Excluded Surface

Authors: Rundong Zhao, Menglun Wang, Yiying Tong, Guo-Wei Wei

Abstract: Motivation: Surface generation and visualization are some of the most important tasks in biomolecular modeling and computation. Eulerian solvent excluded surface (ESES) software provides analytical solvent excluded surface (SES) in the Cartesian grid, which is necessary for simulating many biomolecular electrostatic and ion channel models. However, large biomolecules and/or fine grid resolutions g… ▽ More Motivation: Surface generation and visualization are some of the most important tasks in biomolecular modeling and computation. Eulerian solvent excluded surface (ESES) software provides analytical solvent excluded surface (SES) in the Cartesian grid, which is necessary for simulating many biomolecular electrostatic and ion channel models. However, large biomolecules and/or fine grid resolutions give rise to excessively large memory requirements in ESES construction. We introduce an out-of-core and parallel algorithm to improve the ESES software. Results: The present approach drastically improves the spatial and temporal efficiency of ESES. The memory footprint and time complexity are analyzed and empirically verified through extensive tests with a large collection of biomolecule examples. Our results show that our algorithm can successfully reduce memory footprint through a straightforward divide-and-conquer strategy to perform the calculation of arbitrarily large proteins on a typical commodity personal computer. On multi-core computers or clusters, our algorithm can reduce the execution time by parallelizing most of the calculation as disjoint subproblems. Various comparisons with the state-of-the-art Cartesian grid based SES calculation were done to validate the present method and show the improved efficiency. This approach makes ESES a robust software for the construction of analytical solvent excluded surfaces. Availability and implementation: http://weilab.math.msu.edu/ESES. △ Less

Submitted 12 September, 2018; originally announced September 2018.

Comments: 24 pages, 11 figures

Journal ref: Communications in Information and Systems, 2018

arXiv:1804.10647 [pdf, other]

Mathematical deep learning for pose and binding affinity prediction and ranking in D3R Grand Challenges

Authors: Duc Duy Nguyen, Zixuan Cang, Kedi Wu, Menglun Wang, Yin Cao, Guo-Wei Wei

Abstract: Advanced mathematics, such as multiscale weighted colored graph and element specific persistent homology, and machine learning including deep neural networks were integrated to construct mathematical deep learning models for pose and binding affinity prediction and ranking in the last two D3R grand challenges in computer-aided drug design and discovery. D3R Grand Challenge 2 (GC2) focused on the p… ▽ More Advanced mathematics, such as multiscale weighted colored graph and element specific persistent homology, and machine learning including deep neural networks were integrated to construct mathematical deep learning models for pose and binding affinity prediction and ranking in the last two D3R grand challenges in computer-aided drug design and discovery. D3R Grand Challenge 2 (GC2) focused on the pose prediction and binding affinity ranking and free energy prediction for Farnesoid X receptor ligands. Our models obtained the top place in absolute free energy prediction for free energy Set 1 in Stage 2. The latest competition, D3R Grand Challenge 3 (GC3), is considered as the most difficult challenge so far. It has 5 subchallenges involving Cathepsin S and five other kinase targets, namely VEGFR2, JAK2, p38-$α$, TIE2, and ABL1. There is a total of 26 official competitive tasks for GC3. Our predictions were ranked 1st in 10 out of 26 official competitive tasks. △ Less

Submitted 27 April, 2018; originally announced April 2018.

Comments: 15 pages, 4 figures

arXiv:1711.02177 [pdf]

doi 10.1002/jbio.201800269

Optical excitation and detection of neuronal activity

Authors: Chenfei Hu, Richard Sam, Mingguang Shan, Viorel Nastasa, Minqi Wang, Taewoo Kim, Martha Gillette, Parijat Sengupta, Gabriel Popescu

Abstract: Optogenetics has emerged as an exciting tool for manipulating neural activity, which in turn, can modulate behavior in live organisms. However, detecting the response to the optical stimulation requires electrophysiology with physical contact or fluorescent imaging at target locations, which is often limited by photobleaching and phototoxicity. In this paper, we show that phase imaging can report… ▽ More Optogenetics has emerged as an exciting tool for manipulating neural activity, which in turn, can modulate behavior in live organisms. However, detecting the response to the optical stimulation requires electrophysiology with physical contact or fluorescent imaging at target locations, which is often limited by photobleaching and phototoxicity. In this paper, we show that phase imaging can report the intracellular transport induced by optogenetic stimulation. We developed a multimodal instrument that can both stimulate cells with high spatial resolution and detect optical pathlength changes with nanometer scale sensitivity. We found that optical pathlength fluctuations following stimulation are consistent with active organelle transport. Furthermore, the results indicate a broadening in the transport velocity distribution, which is significantly higher in stimulated cells compared to optogenetically inactive cells. It is likely that this label-free, contactless measurement of optogenetic response will provide an enabling approach to neuroscience. △ Less

Submitted 26 July, 2018; v1 submitted 27 October, 2017; originally announced November 2017.

Comments: 20 pages, 5 figures

arXiv:1705.03321 [pdf]

MotifMark: Finding Regulatory Motifs in DNA Sequences

Authors: Hamid Reza Hassanzadeh, Pushkar Kolhe, Charles L. Isbell, May D. Wang

Abstract: The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A… ▽ More The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques. △ Less

Submitted 4 May, 2017; originally announced May 2017.

arXiv:1704.05883 [pdf, ps, other]

Rigidity strengthening is a vital mechanism for protein-ligand binding

Authors: Duc Duy Nguyen, Tian Xiao, Menglun Wang, Guo-Wei Wei

Abstract: Protein-ligand binding is essential to almost all life processes. The understanding of protein-ligand interactions is fundamentally important to rational drug design and protein design. Based on large scale data sets, we show that protein rigidity strengthening or flexibility reduction is a pivoting mechanism in protein-ligand binding. Our approach based solely on rigidity is able to unveil a surp… ▽ More Protein-ligand binding is essential to almost all life processes. The understanding of protein-ligand interactions is fundamentally important to rational drug design and protein design. Based on large scale data sets, we show that protein rigidity strengthening or flexibility reduction is a pivoting mechanism in protein-ligand binding. Our approach based solely on rigidity is able to unveil a surprisingly long range contribution of four residue layers to protein-ligand binding, which has a ramification for drug and protein design. Additionally, the present work reveals that among various pairwise interactions, the short range ones within the distance of the van der Waals diameter are most important. It is found that the present approach outperforms all the other state-of-the-art scoring functions for protein-ligand binding affinity predictions of two benchmark data sets △ Less

Submitted 31 March, 2017; originally announced April 2017.

Comments: 9 pages, 6 figures

arXiv:1610.03182 [pdf]

wtest: an R Package for Testing Main and Interaction Effect in Genotype Data with Binary Traits

Authors: Rui Sun, Billy Chang, Benny Chung-Ying Zee, Maggie Haitian Wang

Abstract: This R package evaluates main and pair-wise interaction effect of single nucleotide polymorphisms (SNPs) via the W-test, scalable to whole genome-wide data sets. The package provides fast and accurate p-value estimation of genetic markers, as well as diagnostic checking on the probability distributions. It allows flexible stage-wise or exhaustive association testing in a user-friendly interface. A… ▽ More This R package evaluates main and pair-wise interaction effect of single nucleotide polymorphisms (SNPs) via the W-test, scalable to whole genome-wide data sets. The package provides fast and accurate p-value estimation of genetic markers, as well as diagnostic checking on the probability distributions. It allows flexible stage-wise or exhaustive association testing in a user-friendly interface. Availability: The package is available in CRAN, or from website: http://www2.ccrb.cuhk.edu.hk/wtest △ Less

Submitted 11 October, 2016; originally announced October 2016.

Comments: 7 pages, 1 figure

arXiv:1607.07834 [pdf]

A W-test collapsing method for rare variant testing with applications to exome sequencing data of hypertensive disorder

Authors: Rui Sun, Haoyi Weng, Inchi Hu, Junfeng Guo, William K. K. Wu, Benny Chung-Ying Zee, Maggie Haitian Wang

Abstract: Advancement in sequencing technology enables the study of association between complex disorders and rare variants with low minor allele frequencies. One of the major challenges in rare variant testing is lack of statistical power of traditional testing methods due to extremely low variances of single nucleotide polymorphisms. In this paper, we introduce a W-test collapsing method that evaluates th… ▽ More Advancement in sequencing technology enables the study of association between complex disorders and rare variants with low minor allele frequencies. One of the major challenges in rare variant testing is lack of statistical power of traditional testing methods due to extremely low variances of single nucleotide polymorphisms. In this paper, we introduce a W-test collapsing method that evaluates the distributional differences in cases and controls using a combined log of odds ratio. The proposed method is compared with the Weighted-Sum Statistic and Sequence Kernel Association Test using simulation data sets and showed better performances and faster computing speed. In the study of real next generation sequencing data set of hypertensive disorder, we identified genes of interesting biological functions that are associated to metabolism disorder and inflammation, which include the MACROD1, NLRP7, AGK, PAK6 and APBB1. The W-test collapsing method offers a fast, effective and alternative way for rare variants association analysis. △ Less

Submitted 26 July, 2016; originally announced July 2016.

Comments: 18 pages, 1 figure, 4 tables. Genetic Epidemiology accepted

arXiv:1606.08941 [pdf]

Enhancing power of rare variant association test by Zoom-Focus Algorithm (ZFA) to locate optimal testing region

Authors: Maggie Haitian Wang, Haoyi Weng, Rui Sun, Benny Chung-Ying Zee

Abstract: Motivation: Exome or targeted sequencing data exerts analytical challenge to test single nucleotide polymorphisms (SNPs) with extremely small minor allele frequency (MAF). Various rare variant tests were proposed to increase power by aggregating SNPs within a fixed genomic region, such as a gene or pathway. However, a gene could contain from several to thousands of markers, and not all of them may… ▽ More Motivation: Exome or targeted sequencing data exerts analytical challenge to test single nucleotide polymorphisms (SNPs) with extremely small minor allele frequency (MAF). Various rare variant tests were proposed to increase power by aggregating SNPs within a fixed genomic region, such as a gene or pathway. However, a gene could contain from several to thousands of markers, and not all of them may be related to the phenotype. Combining functional and non-functional SNPs in arbitrary genomic region could impair the testing power. Results: We propose a Zoom-Focus algorithm (ZFA) to locate the optimal testing region within a given genomic region, as a wrapper function to be applied in conjunction with rare variant association tests. In the first Zooming step, a given genomic region is partitioned by order of two, and the best partition is located within all partition levels. In the next Focusing step, boundaries of the zoomed region are refined. Simulation studies showed that ZFA substantially enhanced the statistical power of rare variant tests by over 10 folds, including the WSS, SKAT and W-test. The algorithm is applied on real exome sequencing data of hypertensive disorder, and identified biologically relevant genetic markers to metabolic disorder that are undiscoverable by testing using full gene. The proposed algorithm is an efficient and powerful tool to increase the effectiveness of rare variant association tests for exome sequencing datasets of complex disorder. △ Less

Submitted 28 June, 2016; originally announced June 2016.

Comments: Main paper: 13 pages, 2 figures, 3 tables, 3 diagrams; Submitted to Bioinformatics, and the 27th International Conference on Genome Informatics

arXiv:1404.7766 [pdf]

Genome-wide Scan of Archaic Hominin Introgressions in Eurasians Reveals Complex Admixture History

Authors: Ya Hu, Yi Wang, Qiliang Ding, Yungang He, Minxian Wang, Jiucun Wang, Shuhua Xu, Li **

Abstract: Introgressions from Neanderthals and Denisovans were detected in modern humans. Introgressions from other archaic hominins were also implicated, however, identification of which poses a great technical challenge. Here, we introduced an approach in identifying introgressions from all possible archaic hominins in Eurasian genomes, without referring to archaic hominin sequences. We focused on mutatio… ▽ More Introgressions from Neanderthals and Denisovans were detected in modern humans. Introgressions from other archaic hominins were also implicated, however, identification of which poses a great technical challenge. Here, we introduced an approach in identifying introgressions from all possible archaic hominins in Eurasian genomes, without referring to archaic hominin sequences. We focused on mutations emerged in archaic hominins after their divergence from modern humans (denoted as archaic-specific mutations), and identified introgressive segments which showed significant enrichment of archaic-specific mutations over the rest of the genome. Furthermore, boundaries of introgressions were identified using a dynamic programming approach to partition whole genome into segments which contained different levels of archaic-specific mutations. We found that detected introgressions shared more archaic-specific mutations with Altai Neanderthal than they shared with Denisovan, and 60.3% of archaic hominin introgressions were from Neanderthals. Furthermore, we detected more introgressions from two unknown archaic hominins whom diverged with modern humans approximately 859 and 3,464 thousand years ago. The latter unknown archaic hominin contributed to the genomes of the common ancestors of modern humans and Neanderthals. In total, archaic hominin introgressions comprised 2.4% of Eurasian genomes. Above results suggested a complex admixture history among hominins. The proposed approach could also facilitate admixture research across species. △ Less

Submitted 30 April, 2014; originally announced April 2014.

Comments: 42 Pages, 1 Table, 4 Figures, 1 Supplementary Table, and 10 Supplementary Figures

arXiv:1211.2073 [pdf, ps, other]

LAGE: A Java Framework to reconstruct Gene Regulatory Networks from Large-Scale Continues Expression Data

Authors: Yang Lu, Mengying Wang, Kenny Q. Zhu, Bo Yuan

Abstract: LAGE is a systematic framework developed in Java. The motivation of LAGE is to provide a scalable and parallel solution to reconstruct Gene Regulatory Networks (GRNs) from continuous gene expression data for very large amount of genes. The basic idea of our framework is motivated by the philosophy of divideand-conquer. Specifically, LAGE recursively partitions genes into multiple overlap** commu… ▽ More LAGE is a systematic framework developed in Java. The motivation of LAGE is to provide a scalable and parallel solution to reconstruct Gene Regulatory Networks (GRNs) from continuous gene expression data for very large amount of genes. The basic idea of our framework is motivated by the philosophy of divideand-conquer. Specifically, LAGE recursively partitions genes into multiple overlap** communities with much smaller sizes, learns intra-community GRNs respectively before merge them altogether. Besides, the complete information of overlap** communities serves as the byproduct, which could be used to mine meaningful functional modules in biological networks. △ Less

Submitted 9 November, 2012; originally announced November 2012.

Comments: 2 pages

arXiv:1107.1927 [pdf, other]

Single-image diffusion coefficient measurements of proteins in free solution

Authors: Shannon Kian Zareh, Michael C. DeSantis, Jonathan Kessler, Je-Luen Li, Y. M. Wang

Abstract: Diffusion coefficient measurements are important for many biological and material investigations, such as particle dynamics, kinetics, and size determinations. Amongst current measurement methods, single particle tracking (SPT) offers the unique capability of providing location and diffusion information of a molecule simultaneously while using only femptomoles of sample. However, the temporal reso… ▽ More Diffusion coefficient measurements are important for many biological and material investigations, such as particle dynamics, kinetics, and size determinations. Amongst current measurement methods, single particle tracking (SPT) offers the unique capability of providing location and diffusion information of a molecule simultaneously while using only femptomoles of sample. However, the temporal resolution of SPT is limited to seconds for single-color labeled samples. By directly imaging three dimensional (3D) diffusing fluorescent proteins and studying the widths of their intensity profiles, we determine the proteins' diffusion coefficients using single protein images of sub-millisecond exposure times. This simple method improves the temporal resolution of diffusion coefficient measurements to sub-millisecond, and can be readily applied to a range of particle sizes in SPT investigations and applications where diffusion coefficient measurements are needed, such as reaction kinetics and particle size determinations. △ Less

Submitted 10 July, 2011; originally announced July 2011.

arXiv:1010.3247 [pdf, other]

doi 10.1016/j.bpj.2010.12.1268

Protein sliding and hop** kinetics on DNA

Authors: Michael C. DeSantis, Je-Luen Li, Y. M. Wang

Abstract: Using Monte-Carlo simulations, we deconvolved the sliding and hop** kinetics of GFP-LacI proteins on elongated DNA from their experimentally observed seconds-long diffusion trajectories. Our simulations suggest the following results: (1) in each diffusion trajectory, a protein makes on average hundreds of alternating slides and hops with a mean sliding time of several tens of ms; (2) sliding dom… ▽ More Using Monte-Carlo simulations, we deconvolved the sliding and hop** kinetics of GFP-LacI proteins on elongated DNA from their experimentally observed seconds-long diffusion trajectories. Our simulations suggest the following results: (1) in each diffusion trajectory, a protein makes on average hundreds of alternating slides and hops with a mean sliding time of several tens of ms; (2) sliding dominates the root mean square displacement of fast diffusion trajectories, whereas hop** dominates slow ones; (3) flow and variations in salt concentration have limited effects on hop** kinetics, while in vivo DNA configuration is not expected to influence sliding kinetics; furthermore, (4) the rate of occurrence for hops longer than 200 nm agrees with experimental data for EcoRV proteins. △ Less

Submitted 22 July, 2011; v1 submitted 15 October, 2010; originally announced October 2010.

arXiv:0909.3132 [pdf, ps, other]

doi 10.1186/1471-2105-10-S1-S66

Partial correlation analysis indicates causal relationships between GC-content, exon density and recombination rate in the human genome

Authors: Jan Freudengerb, Mingyi Wang, Yaning Yang, Wentian Li

Abstract: {\bf Background}: Several features are known to correlate with the GC-content in the human genome, including recombination rate, gene density and distance to telomere. However, by testing for pairwise correlation only, it is impossible to distinguish direct associations from indirect ones and to distinguish between causes and effects. {\bf Results}: We use partial correlations to construct parti… ▽ More {\bf Background}: Several features are known to correlate with the GC-content in the human genome, including recombination rate, gene density and distance to telomere. However, by testing for pairwise correlation only, it is impossible to distinguish direct associations from indirect ones and to distinguish between causes and effects. {\bf Results}: We use partial correlations to construct partially directed graphs for the following four variables: GC-content, recombination rate, exon density and distance-to-telomere. Recombination rate and exon density are unconditionally uncorrelated, but become inversely correlated by conditioning on GC-content. This pattern indicates a model where recombination rate and exon density are two independent causes of GC-content variation. {\bf Conclusions}: Causal inference and graphical models are useful methods to understand genome evolution and the mechanisms of isochore evolution in the human genome. △ Less

Submitted 16 September, 2009; originally announced September 2009.

Journal ref: BMC Bioinformatics, 10(suppl 1), S66 (2009)

arXiv:0908.0015 [pdf, other]

doi 10.1364/OE.18.006563

Precision analysis for standard deviation measurements of single fluorescent molecule images

Authors: Michael C. DeSantis, Shawn H. DeCenzo, Je-Luen Li, Y. M. Wang

Abstract: Standard deviation measurements of intensity profiles of stationary single fluorescent molecules are useful for studying axial localization, molecular orientation, and a fluorescence imaging system's spatial resolution. Here we report on the analysis of the precision of standard deviation measurements of intensity profiles of single fluorescent molecules imaged using an EMCCD camera. We have dev… ▽ More Standard deviation measurements of intensity profiles of stationary single fluorescent molecules are useful for studying axial localization, molecular orientation, and a fluorescence imaging system's spatial resolution. Here we report on the analysis of the precision of standard deviation measurements of intensity profiles of single fluorescent molecules imaged using an EMCCD camera. We have developed an analytical expression for the standard deviation measurement error of a single image which is a function of the total number of detected photons, the background photon noise, and the camera pixel size. The theoretical results agree well with the experimental, simulation, and numerical integration results. Using this expression, we show that single-molecule standard deviation measurements offer nanometer precision for a large range of experimental parameters. △ Less

Submitted 27 January, 2010; v1 submitted 31 July, 2009; originally announced August 2009.

Comments: 16 pages, 3 figures, revised

arXiv:0904.2223 [pdf, other]

Single-molecule imaging of protein adsorption mechanisms to surfaces

Authors: Shannon Kian Zareh, Y. M. Wang

Abstract: Protein-surface interactions cause the desirable effect of controlled protein adsorption onto biodevices as well as the undesirable effect of protein fouling. The key to controlling protein-surface adsorptions is to identify and quantify the main adsorption mechanisms: adsorptions that occur (1) while depositing a protein solution onto dry surfaces and (2) after the deposition onto wet surfaces. B… ▽ More Protein-surface interactions cause the desirable effect of controlled protein adsorption onto biodevices as well as the undesirable effect of protein fouling. The key to controlling protein-surface adsorptions is to identify and quantify the main adsorption mechanisms: adsorptions that occur (1) while depositing a protein solution onto dry surfaces and (2) after the deposition onto wet surfaces. Bulk measurements cannot reveal the dynamic protein adsorption pathways and thus cannot differentiate between the two adsorption mechanisms. We imaged the interactions of single streptavidin molecules with hydrophobic fused-silica surfaces in real-time. We observed both adsorbed proteins on surfaces and diffusing proteins near surfaces and analyzed their adsorption kinetics. Our analysis shows that the protein solution deposition process is the primary mechanism of streptavidin adsorption onto surfaces at the sub-nanomolar to nanomolar protein concentrations. Furthermore, we found that hydrophilic fused-silica surfaces can prevent the adsorption of streptavidin molecules. △ Less

Submitted 13 October, 2010; v1 submitted 14 April, 2009; originally announced April 2009.

arXiv:0811.3645 [pdf, other]

doi 10.1103/PhysRevE.80.040901

Discontinuities at the DNA supercoiling transition

Authors: Bryan C. Daniels, Scott Forth, Maxim Y. Sheinin, Michelle D. Wang, James P. Sethna

Abstract: While slowly turning the ends of a single molecule of DNA at constant applied force, a discontinuity was recently observed at the supercoiling transition, when a small plectoneme is suddenly formed. This can be understood as an abrupt transition into a state in which stretched and plectonemic DNA coexist. We argue that there should be discontinuities in both the extension and the torque at the t… ▽ More While slowly turning the ends of a single molecule of DNA at constant applied force, a discontinuity was recently observed at the supercoiling transition, when a small plectoneme is suddenly formed. This can be understood as an abrupt transition into a state in which stretched and plectonemic DNA coexist. We argue that there should be discontinuities in both the extension and the torque at the transition, and provide experimental evidence for both. To predict the sizes of these discontinuities and how they change with the overall length of DNA, we organize a theory for the coexisting plectonemic state in terms of four length-independent parameters. We also test plectoneme theories, including our own elastic rod simulation, finding discrepancies with experiment that can be understood in terms of the four coexisting state parameters. △ Less

Submitted 21 July, 2009; v1 submitted 21 November, 2008; originally announced November 2008.

Comments: 11 pages, 5 figures; revised version, with added supplemental material

Showing 1–48 of 48 results for author: Wang, M