Search | arXiv e-print repository

Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2

Authors: Yeqing Lin, Minji Lee, Zhao Zhang, Mohammed AlQuraishi

Abstract: Protein diffusion models have emerged as a promising approach for protein design. One such pioneering model is Genie, a method that asymmetrically represents protein structures during the forward and backward processes, using simple Gaussian noising for the former and expressive SE(3)-equivariant attention for the latter. In this work we introduce Genie 2, extending Genie to capture a larger and m… ▽ More Protein diffusion models have emerged as a promising approach for protein design. One such pioneering model is Genie, a method that asymmetrically represents protein structures during the forward and backward processes, using simple Gaussian noising for the former and expressive SE(3)-equivariant attention for the latter. In this work we introduce Genie 2, extending Genie to capture a larger and more diverse protein structure space through architectural innovations and massive data augmentation. Genie 2 adds motif scaffolding capabilities via a novel multi-motif framework that designs co-occurring motifs with unspecified inter-motif positions and orientations. This makes possible complex protein designs that engage multiple interaction partners and perform multiple functions. On both unconditional and conditional generation, Genie 2 achieves state-of-the-art performance, outperforming all known methods on key design metrics including designability, diversity, and novelty. Genie 2 also solves more motif scaffolding problems than other methods and does so with more unique and varied solutions. Taken together, these advances set a new standard for structure-based protein design. Genie 2 inference and training code, as well as model weights, are freely available at: https://github.com/aqlaboratory/genie2. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2404.08027 [pdf, other]

SurvMamba: State Space Model with Multi-grained Multi-modal Interaction for Survival Prediction

Authors: Ying Chen, Jia**g Xie, Yuxiang Lin, Yuhang Song, Wenxian Yang, Rongshan Yu

Abstract: Multi-modal learning that combines pathological images with genomic data has significantly enhanced the accuracy of survival prediction. Nevertheless, existing methods have not fully utilized the inherent hierarchical structure within both whole slide images (WSIs) and transcriptomic data, from which better intra-modal representations and inter-modal integration could be derived. Moreover, many ex… ▽ More Multi-modal learning that combines pathological images with genomic data has significantly enhanced the accuracy of survival prediction. Nevertheless, existing methods have not fully utilized the inherent hierarchical structure within both whole slide images (WSIs) and transcriptomic data, from which better intra-modal representations and inter-modal integration could be derived. Moreover, many existing studies attempt to improve multi-modal representations through attention mechanisms, which inevitably lead to high complexity when processing high-dimensional WSIs and transcriptomic data. Recently, a structured state space model named Mamba emerged as a promising approach for its superior performance in modeling long sequences with low complexity. In this study, we propose Mamba with multi-grained multi-modal interaction (SurvMamba) for survival prediction. SurvMamba is implemented with a Hierarchical Interaction Mamba (HIM) module that facilitates efficient intra-modal interactions at different granularities, thereby capturing more detailed local features as well as rich global representations. In addition, an Interaction Fusion Mamba (IFM) module is used for cascaded inter-modal interactive fusion, yielding more comprehensive features for survival prediction. Comprehensive evaluations on five TCGA datasets demonstrate that SurvMamba outperforms other existing methods in terms of performance and computational cost. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.03425 [pdf, other]

Sculpting Molecules in 3D: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization

Authors: Kaiwei Zhang, Yange Lin, Guangcheng Wu, Yuxiang Ren, Xuecang Zhang, Bo wang, Xiaoyu Zhang, Weitao Du

Abstract: The integration of deep learning, particularly AI-Generated Content, with high-quality data derived from ab initio calculations has emerged as a promising avenue for transforming the landscape of scientific research. However, the challenge of designing molecular drugs or materials that incorporate multi-modality prior knowledge remains a critical and complex undertaking. Specifically, achieving a… ▽ More The integration of deep learning, particularly AI-Generated Content, with high-quality data derived from ab initio calculations has emerged as a promising avenue for transforming the landscape of scientific research. However, the challenge of designing molecular drugs or materials that incorporate multi-modality prior knowledge remains a critical and complex undertaking. Specifically, achieving a practical molecular design necessitates not only meeting the diversity requirements but also addressing structural and textural constraints with various symmetries outlined by domain experts. In this article, we present an innovative approach to tackle this inverse design problem by formulating it as a multi-modality guidance generation/optimization task. Our proposed solution involves a textural-structure alignment symmetric diffusion framework for the implementation of molecular generation/optimization tasks, namely 3DToMolo. 3DToMolo aims to harmonize diverse modalities, aligning them seamlessly to produce molecular structures adhere to specified symmetric structural and textural constraints by experts in the field. Experimental trials across three guidance generation settings have shown a superior hit generation performance compared to state-of-the-art methodologies. Moreover, 3DToMolo demonstrates the capability to generate novel molecules, incorporating specified target substructures, without the need for prior knowledge. This work not only holds general significance for the advancement of deep learning methodologies but also paves the way for a transformative shift in molecular design strategies. 3DToMolo creates opportunities for a more nuanced and effective exploration of the vast chemical space, opening new frontiers in the development of molecular entities with tailored properties and functionalities. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2401.10144 [pdf, other]

Exploiting Hierarchical Interactions for Protein Surface Learning

Authors: Yiqun Lin, Liang Pan, Yi Li, Ziwei Liu, Xiaomeng Li

Abstract: Predicting interactions between proteins is one of the most important yet challenging problems in structural bioinformatics. Intrinsically, potential function sites in protein surfaces are determined by both geometric and chemical features. However, existing works only consider handcrafted or individually learned chemical features from the atom type and extract geometric features independently. He… ▽ More Predicting interactions between proteins is one of the most important yet challenging problems in structural bioinformatics. Intrinsically, potential function sites in protein surfaces are determined by both geometric and chemical features. However, existing works only consider handcrafted or individually learned chemical features from the atom type and extract geometric features independently. Here, we identify two key properties of effective protein surface learning: 1) relationship among atoms: atoms are linked with each other by covalent bonds to form biomolecules instead of appearing alone, leading to the significance of modeling the relationship among atoms in chemical feature learning. 2) hierarchical feature interaction: the neighboring residue effect validates the significance of hierarchical feature interaction among atoms and between surface points and atoms (or residues). In this paper, we present a principled framework based on deep learning techniques, namely Hierarchical Chemical and Geometric Feature Interaction Network (HCGNet), for protein surface analysis by bridging chemical and geometric features with hierarchical interactions. Extensive experiments demonstrate that our method outperforms the prior state-of-the-art method by 2.3% in site prediction task and 3.2% in interaction matching task, respectively. Our code is available at https://github.com/xmed-lab/HCGNet. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted to J-BHI

arXiv:2401.04873 [pdf, other]

Electrostatics of Salt-Dependent Reentrant Phase Behaviors Highlights Diverse Roles of ATP in Biomolecular Condensates

Authors: Yi-Hsuan Lin, Tae Hun Kim, Suman Das, Tanmoy Pal, Jonas Wessén, Atul Kaushik Rangadurai, Lewis E. Kay, Julie D. Forman-Kay, Hue Sun Chan

Abstract: Liquid-liquid phase separation (LLPS) involving intrinsically disordered protein regions (IDRs) is a major physical mechanism for biological membraneless compartmentalization. The multifaceted electrostatic effects in these biomolecular condensates are exemplified here by experimental and theoretical investigations of the different salt- and ATP-dependent LLPSs of an IDR of messenger RNA-regulatin… ▽ More Liquid-liquid phase separation (LLPS) involving intrinsically disordered protein regions (IDRs) is a major physical mechanism for biological membraneless compartmentalization. The multifaceted electrostatic effects in these biomolecular condensates are exemplified here by experimental and theoretical investigations of the different salt- and ATP-dependent LLPSs of an IDR of messenger RNA-regulating protein Caprin1 and its phosphorylated variant pY-Caprin1, exhibiting, e.g., reentrant behaviors in some instances but not others. Experimental data are rationalized by physical modeling using analytical theory, molecular dynamics, and polymer field-theoretic simulations, indicating in general that interchain salt bridges enhance LLPS of polyelectrolytes such as Caprin1 and that the high valency of ATP-magnesium is a significant factor for its colocalization with the condensed phases, as similar trends are observed for several other IDRs. Our findings underscore the role of biomolecular condensates in modulating ion concentrations and its functional ramifications. △ Less

Submitted 18 June, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

Comments: 67 pages, 2 main-text tables, 8 main-text figures, 6 supporting figures, 155 references. Submitted to eLife

arXiv:2401.01367 [pdf]

Guidelines in Wastewater-based Epidemiology of SARS-CoV-2 with Diagnosis

Authors: Madiha Fatima, Zhihua Cao, Aichun Huang, Shengyuan Wu, Xinxian Fan, Yi Wang, Liu Jiren, Ziyun Zhu, Qiongrou Ye, Yuan Ma, Joseph K. F Chow, Peng Jia, Yangshou Liu, Yubin Lin, Manjun Ye, Tong Wu, Zhixun Li, Cong Cai, Wenhai Zhang, Cheris H. Q. Ding, Yuanzhe Cai, Feijuan Huang

Abstract: With the global spread and increasing transmission rate of SARS-CoV-2, more and more laboratories and researchers are turning their attention to wastewater-based epidemiology (WBE), ho** it can become an effective tool for large-scale testing and provide more ac-curate predictions of the number of infected individuals. Based on the cases of sewage sampling and testing in some regions such as Hon… ▽ More With the global spread and increasing transmission rate of SARS-CoV-2, more and more laboratories and researchers are turning their attention to wastewater-based epidemiology (WBE), ho** it can become an effective tool for large-scale testing and provide more ac-curate predictions of the number of infected individuals. Based on the cases of sewage sampling and testing in some regions such as Hong Kong, Brazil, and the United States, the feasibility of detecting the novel coronavirus in sewage is extremely high. This study re-views domestic and international achievements in detecting SARS-CoV-2 through WBE and summarizes four aspects of COVID-19, including sampling methods, virus decay rate cal-culation, standardized population coverage of the watershed, algorithm prediction, and provides ideas for combining field modeling with epidemic prevention and control. Moreover, we highlighted some diagnostic techniques for detection of the virus from sew-age sample. Our review is a new approach in identification of the research gaps in waste water-based epidemiology and diagnosis and we also predict the future prospect of our analysis. △ Less

Submitted 26 December, 2023; originally announced January 2024.

arXiv:2401.00173 [pdf, other]

Variability of morphology in beat-to-beat photoplethysmographic waveform quantified with unsupervised wave-shape manifold learning for clinical assessment

Authors: Yu-Chieh Ho, Te-Sheng Lin, She-Chih Wang, Chen-Shi Chang, Yu-Ting Lin

Abstract: We investigated the beat-to-beat fluctuation of the photoplethysmography (PPG) waveform. The motivation is that morphology variability extracted from the arterial blood pressure (ABP) has been found to correlate with baseline condition and short-term surgical outcome of the patients undergoing liver transplant surgery. Numerous interactions of physiological mechanisms regulating the cardiovascular… ▽ More We investigated the beat-to-beat fluctuation of the photoplethysmography (PPG) waveform. The motivation is that morphology variability extracted from the arterial blood pressure (ABP) has been found to correlate with baseline condition and short-term surgical outcome of the patients undergoing liver transplant surgery. Numerous interactions of physiological mechanisms regulating the cardiovascular system could underlie the variability of morphology. We used the unsupervised manifold learning algorithm, Dynamic Diffusion Map, to quantify the multivariate waveform morphological variation. Due to the physical principle of light absorption, PPG waveform signals are more susceptible to artifact and are nominally used only for visual inspection of data quality in clinical medical environment. But on the other hand, the noninvasive, easy-to-use nature of PPG grants a wider range of biomedical application, which inspired us to investigate the variability of morphology information from PPG waveform signal. We developed data analysis techniques to improve the performance and validated with the real-life clinical database. △ Less

Submitted 30 December, 2023; originally announced January 2024.

arXiv:2310.07464 [pdf]

Deep Learning Predicts Biomarker Status and Discovers Related Histomorphology Characteristics for Low-Grade Glioma

Authors: Zijie Fang, Yihan Liu, Yifeng Wang, Xiangyang Zhang, Yang Chen, Chang**g Cai, Yiyang Lin, Ying Han, Zhi Wang, Shan Zeng, Hong Shen, Jun Tan, Yongbing Zhang

Abstract: Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a… ▽ More Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a Multi-Biomarker Histomorphology Discoverer (Multi-Beholder) model based on the multiple instance learning (MIL) framework, to predict the status of five biomarkers in LGG using only hematoxylin and eosin-stained whole slide images and slide-level biomarker status labels. Specifically, by incorporating the one-class classification into the MIL framework, accurate instance pseudo-labeling is realized for instance-level supervision, which greatly complements the slide-level labels and improves the biomarker prediction performance. Multi-Beholder demonstrates superior prediction performance and generalizability for five LGG biomarkers (AUROC=0.6469-0.9735) in two cohorts (n=607) with diverse races and scanning protocols. Moreover, the excellent interpretability of Multi-Beholder allows for discovering the quantitative and qualitative correlations between biomarker status and histomorphology characteristics. Our pipeline not only provides a novel approach for biomarker prediction, enhancing the applicability of molecular treatments for LGG patients but also facilitates the discovery of new mechanisms in molecular functionality and LGG progression. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 47 pages, 6 figures

arXiv:2309.07178 [pdf]

CloudBrain-NMR: An Intelligent Cloud Computing Platform for NMR Spectroscopy Processing, Reconstruction and Analysis

Authors: Di Guo, Si** Li, Jun Liu, Zhangren Tu, Tianyu Qiu, **g**g Xu, Liubin Feng, Donghai Lin, Qing Hong, Mei** Lin, Yanqin Lin, Xiaobo Qu

Abstract: Nuclear Magnetic Resonance (NMR) spectroscopy has served as a powerful analytical tool for studying molecular structure and dynamics in chemistry and biology. However, the processing of raw data acquired from NMR spectrometers and subsequent quantitative analysis involves various specialized tools, which necessitates comprehensive knowledge in programming and NMR. Particularly, the emerging deep l… ▽ More Nuclear Magnetic Resonance (NMR) spectroscopy has served as a powerful analytical tool for studying molecular structure and dynamics in chemistry and biology. However, the processing of raw data acquired from NMR spectrometers and subsequent quantitative analysis involves various specialized tools, which necessitates comprehensive knowledge in programming and NMR. Particularly, the emerging deep learning tools is hard to be widely used in NMR due to the sophisticated setup of computation. Thus, NMR processing is not an easy task for chemist and biologists. In this work, we present CloudBrain-NMR, an intelligent online cloud computing platform designed for NMR data reading, processing, reconstruction, and quantitative analysis. The platform is conveniently accessed through a web browser, eliminating the need for any program installation on the user side. CloudBrain-NMR uses parallel computing with graphics processing units and central processing units, resulting in significantly shortened computation time. Furthermore, it incorporates state-of-the-art deep learning-based algorithms offering comprehensive functionalities that allow users to complete the entire processing procedure without relying on additional software. This platform has empowered NMR applications with advanced artificial intelligence processing. CloudBrain-NMR is openly accessible for free usage at https://csrc.xmu.edu.cn/CloudBrain.html △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: 11 pages, 13 figures

arXiv:2306.15599 [pdf, other]

Coupling a Recurrent Neural Network to SPAD TCSPC Systems for Real-time Fluorescence Lifetime Imaging

Authors: Yang Lin, Paul Mos, Andrei Ardelean, Claudio Bruschini, Edoardo Charbon

Abstract: Fluorescence lifetime imaging (FLI) has been receiving increased attention in recent years as a powerful diagnostic technique in biological and medical research. However, existing FLI systems often suffer from a tradeoff between processing speed, accuracy, and robustness. In this paper, we propose a robust approach that enables fast FLI with no degradation of accuracy. The approach is based on a S… ▽ More Fluorescence lifetime imaging (FLI) has been receiving increased attention in recent years as a powerful diagnostic technique in biological and medical research. However, existing FLI systems often suffer from a tradeoff between processing speed, accuracy, and robustness. In this paper, we propose a robust approach that enables fast FLI with no degradation of accuracy. The approach is based on a SPAD TCSPC system coupled to a recurrent neural network (RNN) that accurately estimates the fluorescence lifetime directly from raw timestamps without building histograms, thereby drastically reducing transfer data volumes and hardware resource utilization, thus enabling FLI acquisition at video rate. We train two variants of the RNN on a synthetic dataset and compare the results to those obtained using center-of-mass method (CMM) and least squares fitting (LS fitting). Results demonstrate that two RNN variants, gated recurrent unit (GRU) and long short-term memory (LSTM), are comparable to CMM and LS fitting in terms of accuracy, while outperforming them in background noise by a large margin. To explore the ultimate limits of the approach, we derived the Cramer-Rao lower bound of the measurement, showing that RNN yields lifetime estimations with near-optimal precision. Moreover, our FLI model, which is purely trained on synthetic datasets, works well with never-seen-before, real-world data. To demonstrate real-time operation, we have built a FLI microscope based on Piccolo, a 32x32 SPAD sensor developed in our lab. Four quantized GRU cores, capable of processing up to 4 million photons per second, are deployed on a Xilinx Kintex-7 FPGA. Powered by the GRU, the FLI setup can retrieve real-time fluorescence lifetime images at up to 10 frames per second. The proposed FLI system is promising and ideally suited for biomedical applications. △ Less

Submitted 24 July, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

arXiv:2301.12485 [pdf, other]

Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds

Authors: Yeqing Lin, Mohammed AlQuraishi

Abstract: Proteins power a vast array of functional processes in living cells. The capability to create new proteins with designed structures and functions would thus enable the engineering of cellular behavior and development of protein-based therapeutics and materials. Structure-based protein design aims to find structures that are designable (can be realized by a protein sequence), novel (have dissimilar… ▽ More Proteins power a vast array of functional processes in living cells. The capability to create new proteins with designed structures and functions would thus enable the engineering of cellular behavior and development of protein-based therapeutics and materials. Structure-based protein design aims to find structures that are designable (can be realized by a protein sequence), novel (have dissimilar geometry from natural proteins), and diverse (span a wide range of geometries). While advances in protein structure prediction have made it possible to predict structures of novel protein sequences, the combinatorially large space of sequences and structures limits the practicality of search-based methods. Generative models provide a compelling alternative, by implicitly learning the low-dimensional structure of complex data distributions. Here, we leverage recent advances in denoising diffusion probabilistic models and equivariant neural networks to develop Genie, a generative model of protein structures that performs discrete-time diffusion using a cloud of oriented reference frames in 3D space. Through in silico evaluations, we demonstrate that Genie generates protein backbones that are more designable, novel, and diverse than existing models. This indicates that Genie is capturing key aspects of the distribution of protein structure space and facilitates protein design with high success rates. Code for generating new proteins and training new versions of Genie is available at https://github.com/aqlaboratory/genie. △ Less

Submitted 6 June, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

arXiv:2210.12158 [pdf, other]

Graph Coloring via Neural Networks for Haplotype Assembly and Viral Quasispecies Reconstruction

Authors: Hansheng Xue, Vaibhav Rajan, Yu Lin

Abstract: Understanding genetic variation, e.g., through mutations, in organisms is crucial to unravel their effects on the environment and human health. A fundamental characterization can be obtained by solving the haplotype assembly problem, which yields the variation across multiple copies of chromosomes. Variations among fast evolving viruses that lead to different strains (called quasispecies) are also… ▽ More Understanding genetic variation, e.g., through mutations, in organisms is crucial to unravel their effects on the environment and human health. A fundamental characterization can be obtained by solving the haplotype assembly problem, which yields the variation across multiple copies of chromosomes. Variations among fast evolving viruses that lead to different strains (called quasispecies) are also deciphered with similar approaches. In both these cases, high-throughput sequencing technologies that provide oversampled mixtures of large noisy fragments (reads) of genomes, are used to infer constituent components (haplotypes or quasispecies). The problem is harder for polyploid species where there are more than two copies of chromosomes. State-of-the-art neural approaches to solve this NP-hard problem do not adequately model relations among the reads that are important for deconvolving the input signal. We address this problem by develo** a new method, called NeurHap, that combines graph representation learning with combinatorial optimization. Our experiments demonstrate substantially better performance of NeurHap in real and synthetic datasets compared to competing approaches. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: Accepted by NeurIPS 2022

arXiv:2203.11123 [pdf, other]

Gene expression noise accelerates the evolution of a biological oscillator

Authors: Yen Ting Lin, Nicolas E. Buchler

Abstract: Gene expression is a biochemical process, where stochastic binding and un-binding events naturally generate fluctuations and cell-to-cell variability in gene dynamics. These fluctuations typically have destructive consequences for proper biological dynamics and function (e.g., loss of timing and synchrony in biological oscillators). Here, we show that gene expression noise counter-intuitively acce… ▽ More Gene expression is a biochemical process, where stochastic binding and un-binding events naturally generate fluctuations and cell-to-cell variability in gene dynamics. These fluctuations typically have destructive consequences for proper biological dynamics and function (e.g., loss of timing and synchrony in biological oscillators). Here, we show that gene expression noise counter-intuitively accelerates the evolution of a biological oscillator and, thus, can impart a benefit to living organisms. We used computer simulations to evolve two mechanistic models of a biological oscillator at different levels of gene expression noise. We first show that gene expression noise induces oscillatory-like dynamics in regions of parameter space that cannot oscillate in the absence of noise. We then demonstrate that these noise-induced oscillations generate a fitness landscape whose gradient robustly and quickly guides evolution by mutation towards robust and self-sustaining oscillation. These results suggest that noise can help dynamical systems evolve or learn new behavior by revealing cryptic dynamic phenotypes outside the bifurcation point. △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: 36 pages, 9 figures

Report number: LA-UR-21-32251 MSC Class: 37A50; 92C45; 68W50; 92B25

arXiv:2202.08195 [pdf, other]

doi 10.1016/j.media.2023.102933

Nuclei Segmentation with Point Annotations from Pathology Images via Self-Supervised Learning and Co-Training

Authors: Yi Lin, Zhiyong Qu, Hao Chen, Zhongke Gao, Yuexiang Li, Lili Xia, Kai Ma, Yefeng Zheng, Kwang-Ting Cheng

Abstract: Nuclei segmentation is a crucial task for whole slide image analysis in digital pathology. Generally, the segmentation performance of fully-supervised learning heavily depends on the amount and quality of the annotated data. However, it is time-consuming and expensive for professional pathologists to provide accurate pixel-level ground truth, while it is much easier to get coarse labels such as po… ▽ More Nuclei segmentation is a crucial task for whole slide image analysis in digital pathology. Generally, the segmentation performance of fully-supervised learning heavily depends on the amount and quality of the annotated data. However, it is time-consuming and expensive for professional pathologists to provide accurate pixel-level ground truth, while it is much easier to get coarse labels such as point annotations. In this paper, we propose a weakly-supervised learning method for nuclei segmentation that only requires point annotations for training. First, coarse pixel-level labels are derived from the point annotations based on the Voronoi diagram and the k-means clustering method to avoid overfitting. Second, a co-training strategy with an exponential moving average method is designed to refine the incomplete supervision of the coarse labels. Third, a self-supervised visual representation learning method is tailored for nuclei segmentation of pathology images that transforms the hematoxylin component images into the H&E stained images to gain better understanding of the relationship between the nuclei and cytoplasm. We comprehensively evaluate the proposed method using two public datasets. Both visual and quantitative results demonstrate the superiority of our method to the state-of-the-art methods, and its competitive performance compared to the fully-supervised methods. Code: https://github.com/hust-linyi/SC-Net △ Less

Submitted 17 August, 2023; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: Accepted by MedIA

arXiv:2201.01920 [pdf, other]

doi 10.1007/978-1-0716-2663-4_3

Numerical Techniques for Applications of Analytical Theories to Sequence-Dependent Phase Separations of Intrinsically Disordered Proteins

Authors: Yi-Hsuan Lin, Jonas Wessén, Tanmoy Pal, Suman Das, Hue Sun Chan

Abstract: Biomolecular condensates, physically underpinned to a significant extent by liquid-liquid phase separation (LLPS), are now widely recognized by numerous experimental studies to be of fundamental biological, biomedical, and biophysical importance. In the face of experimental discoveries, analytical formulations emerged as a powerful yet tractable tool in recent theoretical investigations of the rol… ▽ More Biomolecular condensates, physically underpinned to a significant extent by liquid-liquid phase separation (LLPS), are now widely recognized by numerous experimental studies to be of fundamental biological, biomedical, and biophysical importance. In the face of experimental discoveries, analytical formulations emerged as a powerful yet tractable tool in recent theoretical investigations of the role of LLPS in the assembly and dissociation of these condensates. The pertinent LLPS often involves, though not exclusively, intrinsically disordered proteins engaging in multivalent interactions that are governed by their amino acid sequences. For researchers interested in applying these theoretical methods, here we provide a practical guide to a set of computational techniques devised for extracting sequence-dependent LLPS properties from analytical formulations. The numerical procedures covered include those for the determinination of spinodal and binodal phase boundaries from a general free energy function with examples based on the random phase approximation in polymer theory, construction of tie lines for multiple-component LLPS, and field-theoretic simulation of multiple-chain heteropolymeric systems using complex Langevin dynamics. Since a more accurate physical picture often requires comparing analytical theory against explicit-chain model predictions, a commonly utilized methodology for coarse-grained molecular dynamics simulations of sequence-specific LLPS is also briefly outlined. △ Less

Submitted 30 August, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

Comments: 46 pages, 10 figures, 105 references, with hyperlinks to relevant computer codes and related information; Figure 8 in version 2 corrected; accepted for publication in "Methods in Molecular Biology" volume "Phase-Separated Biomolecular Condensates" edited by H.-X. Zhou, J.-H. Spille, and P. Banerjee (expected October 2022)

Journal ref: In: Phase-Separated Biomolecular Condensates, Methods and Protocols; edited by H.-X. Zhou, J.-H. Spille and P.R. Banerjee, Methods in Molecular Biology (Springer-Nature), Volume 2563, Chapter 3, pages 51-94 (2022)

arXiv:2112.11696 [pdf, other]

RepBin: Constraint-based Graph Representation Learning for Metagenomic Binning

Authors: Hansheng Xue, Vi**i Mallawaarachchi, Yujia Zhang, Vaibhav Rajan, Yu Lin

Abstract: Mixed communities of organisms are found in many environments (from the human gut to marine ecosystems) and can have profound impact on human health and the environment. Metagenomics studies the genomic material of such communities through high-throughput sequencing that yields DNA subsequences for subsequent analysis. A fundamental problem in the standard workflow, called binning, is to discover… ▽ More Mixed communities of organisms are found in many environments (from the human gut to marine ecosystems) and can have profound impact on human health and the environment. Metagenomics studies the genomic material of such communities through high-throughput sequencing that yields DNA subsequences for subsequent analysis. A fundamental problem in the standard workflow, called binning, is to discover clusters, of genomic subsequences, associated with the unknown constituent organisms. Inherent noise in the subsequences, various biological constraints that need to be imposed on them and the skewed cluster size distribution exacerbate the difficulty of this unsupervised learning problem. In this paper, we present a new formulation using a graph where the nodes are subsequences and edges represent homophily information. In addition, we model biological constraints providing heterophilous signal about nodes that cannot be clustered together. We solve the binning problem by develo** new algorithms for (i) graph representation learning that preserves both homophily relations and heterophily constraints (ii) constraint-based graph clustering method that addresses the problems of skewed cluster size distribution. Extensive experiments, on real and synthetic datasets, demonstrate that our approach, called RepBin, outperforms a wide variety of competing methods. Our constraint-based graph representation learning and clustering methods, that may be useful in other domains as well, advance the state-of-the-art in both metagenomics binning and graph representation learning. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: Accepted by AAAI-2022

arXiv:2112.10670 [pdf, other]

An adaptively optimized algorithm for counting nuclei in X-ray micro-CT scans of whole organisms

Authors: Anna Madra, Alex YS. Lin, Daniel J. Vanselow, Keith C. Cheng

Abstract: Living organisms are primarily made of cells. Identifying them and characterizing their geometry and spatial distribution is a first step towards building multi-scale models of these biomaterials. We propose a method to count cells using nuclei in an X-ray microtomographic scan of a zebrafish. To account for scanning artifacts and partial volume effect, the method is adaptively calibrated using pa… ▽ More Living organisms are primarily made of cells. Identifying them and characterizing their geometry and spatial distribution is a first step towards building multi-scale models of these biomaterials. We propose a method to count cells using nuclei in an X-ray microtomographic scan of a zebrafish. To account for scanning artifacts and partial volume effect, the method is adaptively calibrated using parameters approximated from the manifold of manually selected and optimized special cases. The methodology is tested on nuclei in the eyes of zebrafish larvae of different ages. △ Less

Submitted 20 December, 2021; originally announced December 2021.

arXiv:2110.02937 [pdf, other]

doi 10.1016/j.bpj.2021.10.008

Assembly of Model Postsynaptic Densities Involves Interactions Auxiliary to Stoichiometric Binding

Authors: Yi-Hsuan Lin, Haowei Wu, Bowen Jia, Mingjie Zhang, Hue Sun Chan

Abstract: The assembly of functional biomolecular condensates often involves liquid-liquid phase separation (LLPS) of proteins with multiple modular domains, which can be folded or conformationally disordered to various degrees. To understand the LLPS-driving domain-domain interactions, a fundamental question is how readily the interactions in the condensed phase can be inferred from inter-domain interactio… ▽ More The assembly of functional biomolecular condensates often involves liquid-liquid phase separation (LLPS) of proteins with multiple modular domains, which can be folded or conformationally disordered to various degrees. To understand the LLPS-driving domain-domain interactions, a fundamental question is how readily the interactions in the condensed phase can be inferred from inter-domain interactions in dilute solutions. In particular, are the interactions leading to LLPS exclusively those underlying the formation of discrete inter-domain complexes in homogeneous solutions? We address this question by develo** a mean-field LLPS theory of two stoichiometrically constrained solute species. The theory is applied to the neuronal proteins SynGAP and PSD-95, whose complex coacervate serves as a rudimentary model for neuronal postsynaptic densities (PSDs). The predicted phase behaviors are compared with experiments. Previously, a three-SynGAP, two-PSD-95 ratio was determined for SynGAP/PSD-95 complexes in dilute solutions. However, when this 3:2 stoichiometry is uniformly imposed in our theory encompassing both dilute and condensed phases, the tie-line pattern of the predicted SynGAP/PSD-95 phase diagram differs drastically from that obtained experimentally. In contrast, theories embodying alternate scenarios postulating auxiliary SynGAP-PSD-95 as well as SynGAP-SynGAP and PSD-95-PSD-95 interactions in addition to those responsible for stoichiometric SynGAP/PSD-95 complexes produce tie-line patterns consistent with experiment. Hence, our combined theoretical-experimental analysis indicates that weaker interactions or higher-order complexes beyond the 3:2 stoichiometry, but not yet documented, are involved in the formation of SynGAP/PSD-95 condensates, imploring future efforts to ascertain the nature of these auxiliary interactions in PSD-like LLPS. △ Less

Submitted 6 October, 2021; originally announced October 2021.

Comments: 38 pages, 5 figures. Accepted for publication in Biophysical Journal

Journal ref: Biophys. J. 121 (1) 2022 157-171

arXiv:2109.14445 [pdf]

Implementation of a practical Markov chain Monte Carlo sampling algorithm in PyBioNetFit

Authors: Jacob Neumann, Yen Ting Lin, Abhishek Mallela, Ely F. Miller, Joshua Colvin, Abell T. Duprat1, Ye Chen, William S. Hlavacek, Richard G. Posner

Abstract: Bayesian inference in biological modeling commonly relies on Markov chain Monte Carlo (MCMC) sampling of a multidimensional and non-Gaussian posterior distribution that is not analytically tractable. Here, we present the implementation of a practical MCMC method in the open-source software package PyBioNetFit (PyBNF), which is designed to support parameterization of mathematical models for biologi… ▽ More Bayesian inference in biological modeling commonly relies on Markov chain Monte Carlo (MCMC) sampling of a multidimensional and non-Gaussian posterior distribution that is not analytically tractable. Here, we present the implementation of a practical MCMC method in the open-source software package PyBioNetFit (PyBNF), which is designed to support parameterization of mathematical models for biological systems. The new MCMC method, am, incorporates an adaptive move proposal distribution. For warm starts, sampling can be initiated at a specified location in parameter space and with a multivariate Gaussian proposal distribution defined initially by a specified covariance matrix. Multiple chains can be generated in parallel using a computer cluster. We demonstrate that am can be used to successfully solve real-world Bayesian inference problems, including forecasting of new Coronavirus Disease 2019 case detection with Bayesian quantification of forecast uncertainty. PyBNF version 1.1.9, the first stable release with am, is available at PyPI and can be installed using the pip package-management system on platforms that have a working installation of Python 3. PyBNF relies on libRoadRunner and BioNetGen for simulations (e.g., numerical integration of ordinary differential equations defined in SBML or BNGL files) and Dask.Distributed for task scheduling on Linux computer clusters. △ Less

Submitted 29 September, 2021; originally announced September 2021.

arXiv:2109.10258 [pdf]

Arterial blood pressure waveform in liver transplant surgery possesses variability of morphology reflecting recipients' acuity and predicting short term outcomes

Authors: Shen-Chih Wang, Chien-Kun Ting, Cheng-Yen Chen, Chin-Su Liu, Niang-Cheng Lin, Che-Chuan Loon, Hau-Tieng Wu, Yu-Ting Lin

Abstract: Background: We investigated clinical information underneath the beat-to-beat fluctuation of the arterial blood pressure (ABP) waveform morphology. We proposed the Dynamical Diffusion Map algorithm (DDMap) to quantify the variability of morphology. The underlying physiology could be the compensatory mechanisms involving complex interactions between various physiological mechanisms to regulate the c… ▽ More Background: We investigated clinical information underneath the beat-to-beat fluctuation of the arterial blood pressure (ABP) waveform morphology. We proposed the Dynamical Diffusion Map algorithm (DDMap) to quantify the variability of morphology. The underlying physiology could be the compensatory mechanisms involving complex interactions between various physiological mechanisms to regulate the cardiovascular system. As a liver transplant surgery contains distinct periods, we investigated its clinical behavior in different surgical steps. Methods: Our study used DDmap algorithm, based on unsupervised manifold learning, to obtain a quantitative index for the beat-to-beat variability of morphology. We examined the correlation between the variability of ABP morphology and disease acuity as indicated by Model for End-Stage Liver Disease (MELD) scores, the postoperative laboratory data, and 4 early allograft failure (EAF) scores. Results: Among the 85 enrolled patients, the variability of morphology obtained during the presurgical phase was best correlated with MELD-Na scores. The neohepatic phase variability of morphology was associated with EAF scores as well as postoperative bilirubin levels, international normalized ratio, aspartate aminotransferase levels, and platelet count. Furthermore, variability of morphology presents more associations with the above clinical conditions than the common BP measures and their BP variability indices. Conclusions: The variability of morphology obtained during the presurgical phase is indicative of patient acuity, whereas those during the neohepatic phase are indicative of short-term surgical outcomes. △ Less

Submitted 1 July, 2023; v1 submitted 21 September, 2021; originally announced September 2021.

Comments: 5 figures and 1 table

arXiv:2108.04682 [pdf, other]

ChemiRise: a data-driven retrosynthesis engine

Authors: Xiangyan Sun, Ke Liu, Yuquan Lin, Lingjie Wu, Haoming Xing, Minghong Gao, Ji Liu, Suocheng Tan, Zekun Ni, Qi Han, Junqiu Wu, Jie Fan

Abstract: We have developed an end-to-end, retrosynthesis system, named ChemiRise, that can propose complete retrosynthesis routes for organic compounds rapidly and reliably. The system was trained on a processed patent database of over 3 million organic reactions. Experimental reactions were atom-mapped, clustered, and extracted into reaction templates. We then trained a graph convolutional neural network-… ▽ More We have developed an end-to-end, retrosynthesis system, named ChemiRise, that can propose complete retrosynthesis routes for organic compounds rapidly and reliably. The system was trained on a processed patent database of over 3 million organic reactions. Experimental reactions were atom-mapped, clustered, and extracted into reaction templates. We then trained a graph convolutional neural network-based one-step reaction proposer using template embeddings and developed a guiding algorithm on the directed acyclic graph (DAG) of chemical compounds to find the best candidate to explore. The atom-map** algorithm and the one-step reaction proposer were benchmarked against previous studies and showed better results. The final product was demonstrated by retrosynthesis routes reviewed and rated by human experts, showing satisfying functionality and a potential productivity boost in real-life use cases. △ Less

Submitted 9 August, 2021; originally announced August 2021.

arXiv:2107.00719 [pdf, other]

doi 10.1109/BIBM52615.2021.9669729

Toward Drug-Target Interaction Prediction via Ensemble Modeling and Transfer Learning

Authors: Po-Yu Kao, Shu-Min Kao, Nan-Lan Huang, Yen-Chu Lin

Abstract: Drug-target interaction (DTI) prediction plays a crucial role in drug discovery, and deep learning approaches have achieved state-of-the-art performance in this field. We introduce an ensemble of deep learning models (EnsembleDLM) for DTI prediction. EnsembleDLM only uses the sequence information of chemical compounds and proteins, and it aggregates the predictions from multiple deep neural networ… ▽ More Drug-target interaction (DTI) prediction plays a crucial role in drug discovery, and deep learning approaches have achieved state-of-the-art performance in this field. We introduce an ensemble of deep learning models (EnsembleDLM) for DTI prediction. EnsembleDLM only uses the sequence information of chemical compounds and proteins, and it aggregates the predictions from multiple deep neural networks. This approach not only achieves state-of-the-art performance in Davis and KIBA datasets but also reaches cutting-edge performance in the cross-domain applications across different bio-activity types and different protein classes. We also demonstrate that EnsembleDLM achieves a good performance (Pearson correlation coefficient and concordance index > 0.8) in the new domain with approximately 50% transfer learning data, i.e., the training set has twice as much data as the test set. △ Less

Submitted 18 November, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

Comments: 8 pages, 1 figure, 10 tables

arXiv:2105.00267 [pdf]

Combating small molecule aggregation with machine learning

Authors: Kuan Lee, Ann Yang, Yen-Chu Lin, Daniel Reker, Goncalo J. L. Bernardes, Tiago Rodrigues

Abstract: Biological screens are plagued by false positive hits resulting from aggregation. Thus, methods to triage small colloidally aggregating molecules (SCAMs) are in high demand. Herein, we disclose a bespoke machine-learning tool to confidently and intelligibly flag such entities. Our data demonstrate an unprecedented utility of machine learning for predicting SCAMs, achieving 80% of correct predictio… ▽ More Biological screens are plagued by false positive hits resulting from aggregation. Thus, methods to triage small colloidally aggregating molecules (SCAMs) are in high demand. Herein, we disclose a bespoke machine-learning tool to confidently and intelligibly flag such entities. Our data demonstrate an unprecedented utility of machine learning for predicting SCAMs, achieving 80% of correct predictions in a challenging out-of-sample validation. The tool outperformed a panel of expert chemists, who correctly predicted 61 +/- 7% of the same test molecules in a Turing-like test. Further, the computational routine provided insight into molecular features governing aggregation that had remained hidden to expert intuition. Leveraging our tool, we quantify that up to 15-20% of ligands in publicly available chemogenomic databases have the high potential to aggregate at typical screening concentrations, imposing caution in systems biology and drug design programs. Our approach provides a means to augment human intuition, mitigate attrition and a pathway to accelerate future molecular medicine. △ Less

Submitted 1 May, 2021; originally announced May 2021.

arXiv:2102.03687 [pdf, other]

doi 10.1021/acs.jpcb.1c00954

A Simple Explicit-Solvent Model of Polyampholyte Phase Behaviors and its Ramifications for Dielectric Effects in Biomolecular Condensates

Authors: Jonas Wessén, Tanmoy Pal, Suman Das, Yi-Hsuan Lin, Hue Sun Chan

Abstract: Biomolecular condensates such as membraneless organelles, underpinned by liquid-liquid phase separation (LLPS), are important for physiological function, with electrostatics -- among other interaction types -- being a prominent force in their assembly. Charge interactions of intrinsically disordered proteins (IDPs) and other biomolecules are sensitive to the aqueous dielectric environment. Because… ▽ More Biomolecular condensates such as membraneless organelles, underpinned by liquid-liquid phase separation (LLPS), are important for physiological function, with electrostatics -- among other interaction types -- being a prominent force in their assembly. Charge interactions of intrinsically disordered proteins (IDPs) and other biomolecules are sensitive to the aqueous dielectric environment. Because the relative permittivity of protein is significantly lower than that of water, the interior of an IDP condensate is a relatively low-dielectric regime, which, aside from its possible functional effects on client molecules, should facilitate stronger electrostatic interactions among the scaffold IDPs. To gain insight into this LLPS-induced dielectric heterogeneity, addressing in particular whether a low-dielectric condensed phase entails more favorable LLPS than that posited by assuming IDP electrostatic interactions are uniformly modulated by the higher dielectric constant of the pure solvent, we consider a simplified multiple-chain model of polyampholytes immersed in explicit solvents that are either polarizable or possess a permanent dipole. Notably, simulated phase behaviors of these systems exhibit only minor to moderate differences from those obtained using implicit-solvent models with a uniform relative permittivity equals to that of pure solvent. Buttressed by theoretical treatments developed here using random phase approximation and polymer field-theoretic simulations, these observations indicate a partial compensation of effects between favorable solvent-mediated interactions among the polyampholytes in the condensed phase and favorable polyampholyte-solvent interactions in the dilute phase, often netting only a minor enhancement of overall LLPS propensity from the very dielectric heterogeneity that arises from the LLPS itself. Further ramifications of this principle are discussed. △ Less

Submitted 7 April, 2021; v1 submitted 6 February, 2021; originally announced February 2021.

Comments: 54 pages, 14 figures, 1 table, and 132 references. Accepted for publication in the Journal of Physical Chemistry B ("Liquid-Liquid Phase Separation" Special Issue)

Journal ref: J. Phys. Chem. B 125, 4337-4358 (2021)

arXiv:2012.05038 [pdf]

Cost-efficiency trade-offs of the human brain network revealed by a multiobjective evolutionary algorithm

Authors: Junji Ma, **bo Zhang, Ying Lin, Zhengjia Dai

Abstract: It is widely believed that the formation of brain network structure is under the pressure of optimal trade-off between reducing wiring cost and promoting communication efficiency. However, the question of whether this trade-off exists in empirical human brain networks and, if so, how it takes effect is still not well understood. Here, we employed a multiobjective evolutionary algorithm to directly… ▽ More It is widely believed that the formation of brain network structure is under the pressure of optimal trade-off between reducing wiring cost and promoting communication efficiency. However, the question of whether this trade-off exists in empirical human brain networks and, if so, how it takes effect is still not well understood. Here, we employed a multiobjective evolutionary algorithm to directly and quantitatively explore the cost-efficiency trade-off in human brain networks. Using this algorithm, we generated a population of synthetic networks with optimal but diverse cost-efficiency trade-offs. It was found that these synthetic networks could not only reproduce a large portion of connections in the empirical brain networks but also embed a resembling small-world structure. Moreover, the synthetic and empirical brain networks were found similar in terms of the spatial arrangement of hub regions and the modular structure, which are two important topological features widely assumed to be outcomes of cost-efficiency trade-offs. The synthetic networks had high robustness against random attack as the empirical brain networks did. Additionally, we also revealed some differences of the synthetic networks from the empirical brain networks, including lower segregated processing capacity and weaker robustness against targeted attack. These findings provide direct and quantitative evidence that the structure of human brain networks is indeed largely influenced by optimal cost-efficiency trade-offs. We also suggest that some additional factors (e.g., segregated processing capacity) might jointly determine the network organization with cost and efficiency. △ Less

Submitted 9 December, 2020; originally announced December 2020.

arXiv:2010.00060 [pdf, other]

Constructions and Comparisons of Pooling Matrices for Pooled Testing of COVID-19

Authors: Yi-Jheng Lin, Che-Hao Yu, Tzu-Hsuan Liu, Cheng-Shang Chang, Wen-Tsuen Chen

Abstract: In comparison with individual testing, group testing (also known as pooled testing) is more efficient in reducing the number of tests and potentially leading to tremendous cost reduction. As indicated in the recent article posted on the US FDA website, the group testing approach for COVID-19 has received a lot of interest lately. There are two key elements in a group testing technique: (i) the poo… ▽ More In comparison with individual testing, group testing (also known as pooled testing) is more efficient in reducing the number of tests and potentially leading to tremendous cost reduction. As indicated in the recent article posted on the US FDA website, the group testing approach for COVID-19 has received a lot of interest lately. There are two key elements in a group testing technique: (i) the pooling matrix that directs samples to be pooled into groups, and (ii) the decoding algorithm that uses the group test results to reconstruct the status of each sample. In this paper, we propose a new family of pooling matrices from packing the pencil of lines (PPoL) in a finite projective plane. We compare their performance with various pooling matrices proposed in the literature, including 2D-pooling, P-BEST, and Tapestry, using the two-stage definite defectives (DD) decoding algorithm. By conducting extensive simulations for a range of prevalence rates up to 5%, our numerical results show that there is no pooling matrix with the lowest relative cost in the whole range of the prevalence rates. To optimize the performance, one should choose the right pooling matrix, depending on the prevalence rate. The family of PPoL matrices can dynamically adjust their column weights according to the prevalence rates and could be a better alternative than using a fixed pooling matrix. △ Less

Submitted 15 June, 2021; v1 submitted 30 September, 2020; originally announced October 2020.

arXiv:2009.03753 [pdf, other]

Data-driven Optimized Control of the COVID-19 Epidemics

Authors: Afroza Shirin, Yen Ting Lin, Francesco Sorrentino

Abstract: Optimizing the impact on the economy of control strategies aiming at containing the spread of COVID-19 is a critical challenge. We use daily new case counts of COVID-19 patients reported by local health administrations from different Metropolitan Statistical Areas (MSAs) within the US to parametrize a model that well describes the propagation of the disease in each area. We then introduce a time-v… ▽ More Optimizing the impact on the economy of control strategies aiming at containing the spread of COVID-19 is a critical challenge. We use daily new case counts of COVID-19 patients reported by local health administrations from different Metropolitan Statistical Areas (MSAs) within the US to parametrize a model that well describes the propagation of the disease in each area. We then introduce a time-varying control input that represents the level of social distancing imposed on the population of a given area and solve an optimal control problem with the goal of minimizing the impact of social distancing on the economy in the presence of relevant constraints, such as a desired level of suppression for the epidemics at a terminal time. We find that with the exception of the initial time and of the final time, the optimal control input is well approximated by a constant, specific to each area, which contrasts with the implemented system of reopening `in phases'. For all the areas considered, this optimal level corresponds to stricter social distancing than the level estimated from data. Proper selection of the time period for application of the control action optimally is important: depending on the particular MSA this period should be either short or long or intermediate. We also consider the case that the transmissibility increases in time (due e.g. to increasingly colder weather), for which we find that the optimal control solution yields progressively stricter measures of social distancing. {We finally compute the optimal control solution for a model modified to incorporate the effects of vaccinations on the population and we see that depending on a number of factors, social distancing measures could be optimally reduced during the period over which vaccines are administered to the population. △ Less

Submitted 10 March, 2021; v1 submitted 4 September, 2020; originally announced September 2020.

Comments: 5 figures

arXiv:2008.06642 [pdf, other]

Group Testing Enables Asymptomatic Screening for COVID-19 Mitigation: Feasibility and Optimal Pool Size Selection with Dilution Effects

Authors: Yifan Lin, Yuxuan Ren, **gyuan Wan, Massey Cashore, Jiayue Wan, Yujia Zhang, Peter Frazier, Enlu Zhou

Abstract: Repeated asymptomatic screening for SARS-CoV-2 promises to control spread of the virus but would require too many resources to implement at scale. Group testing is promising for screening more people with fewer test resources: multiple samples tested together in one pool can be excluded with one negative test result. Existing approaches to group testing design for SARS-CoV-2 asymptomatic screening… ▽ More Repeated asymptomatic screening for SARS-CoV-2 promises to control spread of the virus but would require too many resources to implement at scale. Group testing is promising for screening more people with fewer test resources: multiple samples tested together in one pool can be excluded with one negative test result. Existing approaches to group testing design for SARS-CoV-2 asymptomatic screening, however, do not consider dilution effects: that false negatives become more common with larger pools. As a consequence, they may recommend pool sizes that are too large or misestimate the benefits of screening. Modeling dilution effects, we derive closed-form expressions for the expected number of tests and false negative/positives per person screened under two popular group testing methods: the linear and square array methods. We find that test error correlation induced by a common viral load across an individual's samples results in many fewer false negatives than would be expected from less realistic but more widely assumed independent errors. This insight also suggests that false positives can be controlled through repeated tests without significantly increasing false negatives. Using these closed-form expressions to trace a Pareto frontier over error rates and tests, we design testing protocols for repeated asymptomatic screening of a large population. We minimize disease prevalence by optimizing a time-varying pool sizes and screening frequency constrained by daily test capacity and a false positive limit. This provides a testing protocol practitioners can use for mitigating COVID-19. In a case study, we demonstrate the effectiveness of this methodology in controlling spread. △ Less

Submitted 16 November, 2020; v1 submitted 14 August, 2020; originally announced August 2020.

arXiv:2007.12523 [pdf]

Daily Forecasting of New Cases for Regional Epidemics of Coronavirus Disease 2019 with Bayesian Uncertainty Quantification

Authors: Yen Ting Lin, Jacob Neumann, Ely Miller, Richard G. Posner, Abhishek Mallela, Cosmin Safta, Jaideep Ray, Gautam Thakur, Supriya Chinthavali, William S. Hlavacek

Abstract: To increase situational awareness and support evidence-based policy-making, we formulated two types of mathematical models for COVID-19 transmission within a regional population. One is a fitting function that can be calibrated to reproduce an epidemic curve with two timescales (e.g., fast growth and slow decay). The other is a compartmental model that accounts for quarantine, self-isolation, soci… ▽ More To increase situational awareness and support evidence-based policy-making, we formulated two types of mathematical models for COVID-19 transmission within a regional population. One is a fitting function that can be calibrated to reproduce an epidemic curve with two timescales (e.g., fast growth and slow decay). The other is a compartmental model that accounts for quarantine, self-isolation, social distancing, a non-exponentially distributed incubation period, asymptomatic individuals, and mild and severe forms of symptomatic disease. Using Bayesian inference, we have been calibrating our models daily for consistency with new reports of confirmed cases from the 15 most populous metropolitan statistical areas in the United States and quantifying uncertainty in parameter estimates and predictions of future case reports. This online learning approach allows for early identification of new trends despite considerable variability in case reporting. We infer new significant upward trends for five of the metropolitan areas starting between 19-April-2020 and 12-June-2020. △ Less

Submitted 20 July, 2020; originally announced July 2020.

Comments: 48 pages, 10 figures, 4 Appendix figures, 3 tables, 1 Appendix figure, 1 Appendix text

arXiv:2005.06712 [pdf, other]

doi 10.1073/pnas.2008122117

Comparative Roles of Charge, $π$ and Hydrophobic Interactions in Sequence-Dependent Phase Separation of Intrinsically Disordered Proteins

Authors: Suman Das, Yi-Hsuan Lin, Robert M. Vernon, Julie D. Forman-Kay, Hue Sun Chan

Abstract: Endeavoring toward a transferable, predictive coarse-grained explicit-chain model for biomolecular condensates underlain by liquid-liquid phase separation (LLPS), we conducted multiple-chain simulations of the N-terminal intrinsically disordered region (IDR) of DEAD-box helicase Ddx4, as a test case, to assess the roles of electrostatic, hydrophobic, cation-$π$, and aromatic interactions in amino… ▽ More Endeavoring toward a transferable, predictive coarse-grained explicit-chain model for biomolecular condensates underlain by liquid-liquid phase separation (LLPS), we conducted multiple-chain simulations of the N-terminal intrinsically disordered region (IDR) of DEAD-box helicase Ddx4, as a test case, to assess the roles of electrostatic, hydrophobic, cation-$π$, and aromatic interactions in amino acid sequence-dependent LLPS. We evaluated 3 residue-residue interaction schemes with a shared electrostatic potential. Neither a common hydrophobicity scheme nor one augmented with arginine/lysine-aromatic cation-$π$ interactions consistently accounted for the experimental LLPS data on the wildtype, a charge-scrambled, an FtoA, and an RtoK mutant of Ddx4 IDR. In contrast, interactions based on contact statistics among folded globular protein structures reproduce the overall experimental trend, including that the RtoK mutant has a much diminished LLPS propensity. Consistency between simulation and LLPS experiment was also found for RtoK mutants of P-granule protein LAF-1, underscoring that, to a degree, the important LLPS-driving $π$-related interactions are embodied in classical statistical potentials. Further elucidation will be necessary, however, especially of phenylalanine's role in condensate assembly because experiments on FtoA and YtoF mutants suggest that LLPS-driving phenylalanine interactions are significantly weaker than those posited by common statistical potentials. Protein-protein electrostatic interactions are modulated by relative permittivity, which depends on protein concentration. Analytical theory suggests that this dependence entails enhanced inter-protein interactions in the condensed phase but more favorable protein-solvent interactions in the dilute phase. The opposing trends lead to a modest overall impact on LLPS. △ Less

Submitted 6 October, 2020; v1 submitted 13 May, 2020; originally announced May 2020.

Comments: 65 pages (main text and supporting information), 7 main-text figures, 7 supporting figures, 1 supporting table, 135 references; accepted for publication in the Proceedings of the National Academy of Sciences, U.S.A

Journal ref: Proc. Natl. Acad. Sci. U.S.A. 117, 28795-28805 (2020)

arXiv:2003.08518 [pdf]

A framework to decipher the genetic architecture of combinations of complex diseases: applications in cardiovascular medicine

Authors: Liangying Yin, Carlos Kwan-long Chau, Yu-** Lin, Pak-Chung Sham, Hon-Cheong So

Abstract: Genome-wide association studies(GWAS) have proven to be highly useful in revealing the genetic basis of complex diseases. At present, most GWAS are studies of a particular single disease diagnosis against controls. However, in practice, an individual is often affected by more than one condition/disorder. For example, patients with coronary artery disease(CAD) are often comorbid with diabetes melli… ▽ More Genome-wide association studies(GWAS) have proven to be highly useful in revealing the genetic basis of complex diseases. At present, most GWAS are studies of a particular single disease diagnosis against controls. However, in practice, an individual is often affected by more than one condition/disorder. For example, patients with coronary artery disease(CAD) are often comorbid with diabetes mellitus(DM). Along a similar line, it is often clinically meaningful to study patients with one disease but without a comorbidity. For example, obese DM may have different pathophysiology from non-obese DM. Here we developed a statistical framework to uncover susceptibility variants for comorbid disorders (or a disorder without comorbidity), using GWAS summary statistics only. In essence, we mimicked a case-control GWAS in which the cases are affected with comorbidities or a disease without a relevant comorbid condition (in either case, we may consider the cases as those affected by a specific subtype of disease, as characterized by the presence or absence of comorbid conditions). We extended our methodology to deal with continuous traits with clinically meaningful categories (e.g. lipids). In addition, we illustrated how the analytic framework may be extended to more than two traits. We verified the feasibility and validity of our method by applying it to simulated scenarios and four cardiometabolic (CM) traits. We also analyzed the genes, pathways, cell-types/tissues involved in CM disease subtypes. LD-score regression analysis revealed some subtypes may indeed be biologically distinct with low genetic correlations. Further Mendelian randomization analysis found differential causal effects of different subtypes to relevant complications. We believe the findings are of both scientific and clinical value, and the proposed method may open a new avenue to analyzing GWAS data. △ Less

Submitted 29 December, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

arXiv:2002.03268 [pdf]

The Novel Coronavirus, 2019-nCoV, is Highly Contagious and More Infectious Than Initially Estimated

Authors: Steven Sanche, Yen Ting Lin, Chonggang Xu, Ethan Romero-Severson, Nicolas W. Hengartner, Ruian Ke

Abstract: The novel coronavirus (2019-nCoV) is a recently emerged human pathogen that has spread widely since January 2020. Initially, the basic reproductive number, R0, was estimated to be 2.2 to 2.7. Here we provide a new estimate of this quantity. We collected extensive individual case reports and estimated key epidemiology parameters, including the incubation period. Integrating these estimates and high… ▽ More The novel coronavirus (2019-nCoV) is a recently emerged human pathogen that has spread widely since January 2020. Initially, the basic reproductive number, R0, was estimated to be 2.2 to 2.7. Here we provide a new estimate of this quantity. We collected extensive individual case reports and estimated key epidemiology parameters, including the incubation period. Integrating these estimates and high-resolution real-time human travel and infection data with mathematical models, we estimated that the number of infected individuals during early epidemic double every 2.4 days, and the R0 value is likely to be between 4.7 and 6.6. We further show that quarantine and contact tracing of symptomatic individuals alone may not be effective and early, strong control measures are needed to stop transmission of the virus. △ Less

Submitted 8 February, 2020; originally announced February 2020.

Comments: 8 pages, 3 figures, 1 Supplementary Text, 6 Supplementary figures, 2 Supplementary tables

arXiv:2001.07841 [pdf, other]

Simultaneous Localization and Parameter Estimation for Single Particle Tracking via Sigma Points based EM

Authors: Ye Lin, Sean B. Andersson

Abstract: Single Particle Tracking (SPT) is a powerful class of tools for analyzing the dynamics of individual biological macromolecules moving inside living cells. The acquired data is typically in the form of a sequence of camera images that are then post-processed to reveal details about the motion. In this work, we develop an algorithm for jointly estimating both particle trajectory and motion model par… ▽ More Single Particle Tracking (SPT) is a powerful class of tools for analyzing the dynamics of individual biological macromolecules moving inside living cells. The acquired data is typically in the form of a sequence of camera images that are then post-processed to reveal details about the motion. In this work, we develop an algorithm for jointly estimating both particle trajectory and motion model parameters from the data. Our approach uses Expectation Maximization (EM) combined with an Unscented Kalman filter (UKF) and an Unscented Rauch-Tung-Striebel smoother (URTSS), allowing us to use an accurate, nonlinear model of the observations acquired by the camera. Due to the shot noise characteristics of the photon generation process, this model uses a Poisson distribution to capture the measurement noise inherent in imaging. In order to apply a UKF, we first must transform the measurements into a model with additive Gaussian noise. We consider two approaches, one based on variance stabilizing transformations (where we compare the Anscombe and Freeman-Tukey transforms) and one on a Gaussian approximation to the Poisson distribution. Through simulations, we demonstrate efficacy of the approach and explore the differences among these measurement transformations. △ Less

Submitted 21 January, 2020; originally announced January 2020.

Comments: Accepted by 58th Conference on Decision and Control (CDC)

arXiv:1910.11194 [pdf, other]

doi 10.1021/acs.jpcb.0c04575

Analytical Theory for Sequence-Specific Binary Fuzzy Complexes of Charged Intrinsically Disordered Proteins

Authors: Alan N. Amin, Yi-Hsuan Lin, Suman Das, Hue Sun Chan

Abstract: Intrinsically disordered proteins (IDPs) are important for biological functions. In contrast to folded proteins, molecular recognition among certain IDPs is "fuzzy" in that their binding and/or phase separation are stochastically governed by the interacting IDPs' amino acid sequences while their assembled conformations remain largely disordered. To help elucidate a basic aspect of this fascinating… ▽ More Intrinsically disordered proteins (IDPs) are important for biological functions. In contrast to folded proteins, molecular recognition among certain IDPs is "fuzzy" in that their binding and/or phase separation are stochastically governed by the interacting IDPs' amino acid sequences while their assembled conformations remain largely disordered. To help elucidate a basic aspect of this fascinating yet poorly understood phenomenon, the binding of a homo- or hetero-dimeric pair of polyampholytic IDPs is modeled statistical mechanically using cluster expansion. We find that the binding affinities of binary fuzzy complexes in the model correlate strongly with a newly derived simple "jSCD" parameter readily calculable from the pair of IDPs' sequence charge patterns. Predictions by our analytical theory are in essential agreement with coarse-grained explicit-chain simulations. This computationally efficient theoretical framework is expected to be broadly applicable to rationalizing and predicting sequence-specific IDP-IDP polyelectrostatic interactions. △ Less

Submitted 7 July, 2020; v1 submitted 24 October, 2019; originally announced October 2019.

Comments: 51 pages, 11 figures. Accepted for Publication in J. Phys. Chem. B

Journal ref: J. Phys. Chem. B 124, 6709--6720 (2020)

arXiv:1909.04206 [pdf]

Novel imaging revealing inner dynamics for cardiovascular waveform analysis via unsupervised manifold learning

Authors: Shen-Chih Wang, Hau-Tieng Wu, Po-Hsun Huang, Cheng-Hsi Chang, Chien-Kun Ting, Yu-Ting Lin

Abstract: Cardiovascular waveforms contain information for clinical diagnosis. By "learning" and organizing the subtle change of waveform morphology from large amounts of raw waveform data, unsupervised manifold learning helps delineate a high-dimensional structure and display it as a novel three-dimensional (3D) image. We investigate the electrocardiography (ECG) waveform for ischemic heart disease and art… ▽ More Cardiovascular waveforms contain information for clinical diagnosis. By "learning" and organizing the subtle change of waveform morphology from large amounts of raw waveform data, unsupervised manifold learning helps delineate a high-dimensional structure and display it as a novel three-dimensional (3D) image. We investigate the electrocardiography (ECG) waveform for ischemic heart disease and arterial blood pressure (ABP) waveform in dynamic vasoactive episodes. We model each beat or pulse to be a point lying on a manifold, like a surface, and use the diffusion map (DMap) to establish the relationship among those pulses. For ECG datasets, first we analyzed the non-ST-elevation ECG waveform distribution from unstable angina to healthy control, and we investigated intraoperative ST-elevation ECG waveforms to show the dynamic ECG waveform changes. For ABP datasets, we analyzed waveforms collected under endotracheal intubation and administration of vasodilator. To quantify the dynamic separation, we applied the support vector machine (SVM) analysis and the trajectory analysis. For the non-ST-elevation ECG, a hierarchical tree structure comprising consecutive ECG waveforms spanning from unstable angina to healthy control is presented in the 3D image (accuracy=97.6%, macro-F1=96.1%). The DMap helps quantify and visualize the evolving direction of intraoperative ST-elevation myocardial episode in a 1-hour period (accuracy=97.58%, macro-F1=96.06%). The ABP waveform analysis of Nicardipine administration shows inter-individual difference (accuracy=95.01%, macro-F1=96.9%) and their common directions from intra-individual moving trajectories. The dynamic change of the ABP waveform during endotracheal intubation shows a loop-like trajectory structure, which can be further divided using the knowledge obtained from Nicardipine. The 3D images provide clues of underneath physiological mechanisms. △ Less

Submitted 2 December, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

Comments: 36 pages, 6 figures

arXiv:1908.09726 [pdf, ps, other]

doi 10.1063/1.5139661

A unified analytical theory of heteropolymers for sequence-specific phase behaviors of polyelectrolytes and polyampholytes

Authors: Yi-Hsuan Lin, Jacob P. Brady, Hue Sun Chan, Kingshuk Ghosh

Abstract: The physical chemistry of liquid-liquid phase separation (LLPS) of polymer solutions bears directly on the assembly of biologically functional droplet-like bodies from proteins and nucleic acids. These biomolecular condensates include certain extracellular materials, and intracellular compartments that are characterized as "membraneless organelles". Analytical theories are a valuable, computationa… ▽ More The physical chemistry of liquid-liquid phase separation (LLPS) of polymer solutions bears directly on the assembly of biologically functional droplet-like bodies from proteins and nucleic acids. These biomolecular condensates include certain extracellular materials, and intracellular compartments that are characterized as "membraneless organelles". Analytical theories are a valuable, computationally efficient tool for addressing general principles. LLPS of neutral homopolymers are quite well described by theory; but it has been a challenge to develop general theories for the LLPS of heteropolymers involving charge-charge interactions. Here we present a novel theory that combines a random-phase-approximation treatment of polymer density fluctuations and an account of intrachain conformational heterogeneity based upon renormalized Kuhn lengths to provide predictions of LLPS properties as a function of pH, salt, and charge patterning along the chain sequence. Advancing beyond more limited analytical approaches, our LLPS theory is applicable to a wide variety of charged sequences ranging from highly charged polyelectrolytes to neutral or nearly neutral polyampholytes. The new theory should be useful in high-throughput screening of protein and other sequences for their LLPS propensities and can serve as a basis for more comprehensive theories that incorporate non-electrostatic interactions. Experimental ramifications of our theory are discussed. △ Less

Submitted 9 January, 2020; v1 submitted 26 August, 2019; originally announced August 2019.

Comments: 48 pages, 11 figures. Accepted for publication in The Journal of Chemical Physics

Journal ref: J. Chem. Phys. 152, 045102 (2020)

arXiv:1907.04765 [pdf, other]

Evaluating bird collision risk of a high-speed railway crossing the habitat of the crested ibis (Nipponia nippon) in Qinling Mountains, China

Authors: Han Hu, Junqing Tang, Yi Wang, Hongfeng Zhang, Dong Wu, Yingchun Lin, Lina Su, Yan Liu, Wei Zhang, Chao Wang, Xiaomin Wu

Abstract: Bird collisions with high-speed transport modes is a vital topic on vehicle safety and wildlife protection, especially when high-speed trains, with an average speed of 250km/h, have to run across the habitat of an endangered bird species. This paper evaluates the bird-train collision risk associated with a recent high-speed railway project in Qinling Mountains, China, for the crested ibis (Nipponi… ▽ More Bird collisions with high-speed transport modes is a vital topic on vehicle safety and wildlife protection, especially when high-speed trains, with an average speed of 250km/h, have to run across the habitat of an endangered bird species. This paper evaluates the bird-train collision risk associated with a recent high-speed railway project in Qinling Mountains, China, for the crested ibis (Nipponia nippon) and other local bird species. Using line transect surveys and walking monitoring techniques, we surveyed the population abundance, spatial-temporal distributions, and bridge-crossing behaviors of the birds in the study area. The results show that: (1) The crested ibis and the egret were the two most abundant waterfowl species in the study area. The RAI of these two species were about 43.69% and 42.91%, respectively; (2) Crested ibises overall habitat closer to the railway bridge. 91.63% of them were firstly detected within the range of 0m to 25m of the vicinity of the bridge; (3) the ratio between crossing over and under the railway bridge was about 7:3. Crested ibises were found to prefer to fly over the railway bridge (89.29% of the total crossing activities observed for this species). Egrets were more likely to cross the railway below the bridge, and they accounted for 60.27% of the total observations of crossing under the bridge. We recommend that, while the collision risk of crested ibises could be low, barrier-like structures, such as fences, should still be considered to promote the conservation of multiple bird species in the area. This paper provides a practical case for railway ecology studies in China. To our best knowledge, this is the first high-speed railway project that takes protecting crested ibises as one of the top priorities, and exemplifies the recent nationwide initiative towards the construction of "eco-civilization" in the country. △ Less

Submitted 10 July, 2019; originally announced July 2019.

Comments: 25 pages, 6 figures, preprint for submission to Transportation Research Part D: Transport and Environment

arXiv:1905.04221 [pdf, ps, other]

Fingerprinting Orientation Distribution Functions in Diffusion MRI detects smaller crossing angles

Authors: Steven H. Baete, Martijn A. Cloos, Ying-Chia Lin, Dimitris G. Placantonakis, Timothy Shepherd, Fernando E. Boada

Abstract: Diffusion tractography is routinely used to study white matter architecture and brain connectivity in vivo. A key step for successful tractography of neuronal tracts is the correct identification of tract directions in each voxel. Here we propose a fingerprinting-based methodology to identify these fiber directions in Orientation Distribution Functions, dubbed ODF-Fingerprinting (ODF-FP). In ODF-F… ▽ More Diffusion tractography is routinely used to study white matter architecture and brain connectivity in vivo. A key step for successful tractography of neuronal tracts is the correct identification of tract directions in each voxel. Here we propose a fingerprinting-based methodology to identify these fiber directions in Orientation Distribution Functions, dubbed ODF-Fingerprinting (ODF-FP). In ODF-FP, fiber configurations are selected based on the similarity between measured ODFs and elements in a pre-computed library. In noisy ODFs, the library matching algorithm penalizes the more complex fiber configurations. ODF simulations and analysis of bootstrapped partial and whole-brain in vivo datasets show that the ODF-FP approach improves the detection of fiber pairs with small crossing angles while maintaining fiber direction precision, which leads to better tractography results. Rather than focusing on the ODF maxima, the ODF-FP approach uses the whole ODF shape to infer fiber directions to improve the detection of fiber bundles with small crossing angle. The resulting fiber directions aid tractography algorithms in accurately displaying neuronal tracts and calculating brain connectivity. △ Less

Submitted 10 May, 2019; originally announced May 2019.

Comments: 30 pages, 11 figures

arXiv:1903.08615 [pdf, other]

doi 10.1063/1.5096774

Scaling methods for accelerating kinetic Monte Carlo simulations of chemical reaction networks

Authors: Yen Ting Lin, Song Feng, William S. Hlavacek

Abstract: Various kinetic Monte Carlo algorithms become inefficient when some of the population sizes in a system are large, which gives rise to a large number of reaction events per unit time. Here, we present a new acceleration algorithm based on adaptive and heterogeneous scaling of reaction rates and stoichiometric coefficients. The algorithm is conceptually related to the commonly used idea of accelera… ▽ More Various kinetic Monte Carlo algorithms become inefficient when some of the population sizes in a system are large, which gives rise to a large number of reaction events per unit time. Here, we present a new acceleration algorithm based on adaptive and heterogeneous scaling of reaction rates and stoichiometric coefficients. The algorithm is conceptually related to the commonly used idea of accelerating a stochastic simulation by considering a sub-volume $λΩ$ ($0<λ<1$) within a system of interest, which reduces the number of reaction events per unit time occurring in a simulation by a factor $1/λ$ at the cost of greater error in unbiased estimates of first moments and biased overestimates of second moments. Our new approach offers two unique benefits. First, scaling is adaptive and heterogeneous, which eliminates the pitfall of overaggressive scaling. Second, there is no need for an \emph{a priori} classification of populations as discrete or continuous (as in a hybrid method), which is problematic when discreteness of a chemical species changes during a simulation. The method requires specification of only a single algorithmic parameter, $N_c$, a global critical population size above which populations are effectively scaled down to increase simulation efficiency. The method, which we term partial scaling, is implemented in the open-source BioNetGen software package. We demonstrate that partial scaling can significantly accelerate simulations without significant loss of accuracy for several published models of biological systems. These models characterize activation of the mitogen-activated protein kinase ERK, prion protein aggregation, and T-cell receptor signaling. △ Less

Submitted 10 May, 2019; v1 submitted 20 March, 2019; originally announced March 2019.

Comments: 18 pages, 7 figures, 1 table

arXiv:1903.07231 [pdf]

doi 10.1038/s41586-019-1629-x

Map** the Human Body at Cellular Resolution -- The NIH Common Fund Human BioMolecular Atlas Program

Authors: Michael P Snyder, Shin Lin, Amanda Posgai, Mark Atkinson, Aviv Regev, Jennifer Rood, Orit Rosen, Leslie Gaffney, Anna Hupalowska, Rahul Satija, Nils Gehlenborg, Jay Shendure, Julia Laskin, Pehr Harbury, Nicholas A Nystrom, Ziv Bar-Joseph, Kun Zhang, Katy Börner, Yiing Lin, Richard Conroy, Dena Procaccini, Ananda L Roy, Ajay Pillai, Marishka Brown, Zorina S Galis

Abstract: Transformative technologies are enabling the construction of three dimensional (3D) maps of tissues with unprecedented spatial and molecular resolution. Over the next seven years, the NIH Common Fund Human Biomolecular Atlas Program (HuBMAP) intends to develop a widely accessible framework for comprehensively map** the human body at single-cell resolution by supporting technology development, da… ▽ More Transformative technologies are enabling the construction of three dimensional (3D) maps of tissues with unprecedented spatial and molecular resolution. Over the next seven years, the NIH Common Fund Human Biomolecular Atlas Program (HuBMAP) intends to develop a widely accessible framework for comprehensively map** the human body at single-cell resolution by supporting technology development, data acquisition, and detailed spatial map**. HuBMAP will integrate its efforts with other funding agencies, programs, consortia, and the biomedical research community at large towards the shared vision of a comprehensive, accessible 3D molecular and cellular atlas of the human body, in health and various disease settings. △ Less

Submitted 7 June, 2019; v1 submitted 17 March, 2019; originally announced March 2019.

Comments: 20 pages, 3 figures

arXiv:1903.05947 [pdf, other]

Designing wildlife crossing structures for ungulates in a desert landscape: A case study in China

Authors: Bin Zhang, Junqing Tang, Yi Wang, Hongfeng Zhang, Gang Xu, Yu Lin, Xiaomin Wu

Abstract: This paper reports on the design of wildlife crossing structures (WCSs) along a new expressway in China, which exemplifies the country's increasing efforts on wildlife protection in infrastructure projects. The expert knowledge and field surveys were used to determine the target species in the study area and the quantity, locations, size, and type of the WCSs. The results on relative abundance ind… ▽ More This paper reports on the design of wildlife crossing structures (WCSs) along a new expressway in China, which exemplifies the country's increasing efforts on wildlife protection in infrastructure projects. The expert knowledge and field surveys were used to determine the target species in the study area and the quantity, locations, size, and type of the WCSs. The results on relative abundance index and encounter rate showed that the ibex (\textit{Capra ibex}), argali sheep (Ovis ammon), and goitered gazelle (Gazella subgutturosa) are the main ungulates in the study area. Among them, the goitered gazelle is the most widely distributed species. WCSs were proposed based on the estimated crossing hotspots. The mean deviation distance between those hotspots and their nearest proposed WCSs is around 341m. In addition, those 16 proposed underpass WCSs have a width of no less than 12m and height of no lower than 3.5m, which is believed to be sufficient for ungulates in the area. Given the limited availability of high-resolution movement data and wildlife-vehicle collision data during road's early design stage, the approach demonstrated in this paper facilitates practical spatial planning and provides insights into designing WCSs in a desert landscape. △ Less

Submitted 14 March, 2019; originally announced March 2019.

Comments: 20 pages, 7 figures, Submit to Transportation Research Part D

arXiv:1902.01540 [pdf, other]

Vaccination dilemma on an evolving social network

Authors: Yuting Wei, Yaosen Lin, Bin Wu

Abstract: Vaccination is crucial for the control of epidemics. Yet it is a social dilemma since non-vaccinators can benefit from the herd immunity created by the vaccinators. Thus the optimum vaccination level is not reached via voluntary vaccination at times. Intensive studies incorporate social networks to study vaccination behavior, and it is shown that vaccination can be promoted on some networks. The u… ▽ More Vaccination is crucial for the control of epidemics. Yet it is a social dilemma since non-vaccinators can benefit from the herd immunity created by the vaccinators. Thus the optimum vaccination level is not reached via voluntary vaccination at times. Intensive studies incorporate social networks to study vaccination behavior, and it is shown that vaccination can be promoted on some networks. The underlying network, however, is often assumed to be static, neglecting the dynamical nature of social networks. We investigate the vaccination behavior on dynamical social networks using both simulations and mean-field approximations. We find that the more robust the vaccinator-infected-non-vaccinator links are or the more fragile the vaccinator-healthy-non-vaccinator links are, the higher the final vaccination level is. This result is true for arbitrary rationality. Furthermore, we show that, under strong selection, the vaccination level can be higher than that in the well-mixed population. In addition, we show that vaccination on evolving social network is equivalent to the vaccination in well mixed population with a rescaled basic reproductive ratio. Our results highlight the dynamical nature of social network on the vaccination behavior, and can be insightful for the epidemic control. △ Less

Submitted 4 February, 2019; originally announced February 2019.

arXiv:1902.00486 [pdf]

Differentiation of skin incision and laparoscopic trocar insertion via quantifying transient bradycardia measured by electrocardiogram

Authors: Cheng-Hsi Chang, Yue-Lin Fang, Yu-Jung Wang, Hau-tieng Wu, Yu-Ting Lin

Abstract: Background. Most surgical procedures involve structures deeper than the skin. However, the difference in surgical noxious stimulation between skin incision and laparoscopic trocar insertion is unknown. By analyzing instantaneous heart rate (IHR) calculated from the electrocardiogram, in particular the transient bradycardia in response to surgical stimuli, this study investigates surgical noxious s… ▽ More Background. Most surgical procedures involve structures deeper than the skin. However, the difference in surgical noxious stimulation between skin incision and laparoscopic trocar insertion is unknown. By analyzing instantaneous heart rate (IHR) calculated from the electrocardiogram, in particular the transient bradycardia in response to surgical stimuli, this study investigates surgical noxious stimuli arising from skin incision and laparoscopic trocar insertion. Methods. Thirty-five patients undergoing laparoscopic cholecystectomy were enrolled in this prospective observational study. Sequential surgical steps including umbilical skin incision (11 mm), umbilical trocar insertion (11 mm), xiphoid skin incision (5 mm), xiphoid trocar insertion (5 mm), subcostal skin incision (3 mm), and subcostal trocar insertion (3 mm) were investigated. IHR was derived from electrocardiography and calculated by the modern time-varying power spectrum. Similar to the classical heart rate variability analysis, the time-varying low frequency power (tvLF), time-varying high frequency power (tvHF), and tvLF-to-tvHF ratio (tvLHR) were calculated. Prediction probability (PK) analysis and global pointwise F-test were used to compare the performance between indices and the heart rate readings from the patient monitor. Results. Analysis of IHR showed that surgical stimulus elicits a transient bradycardia, followed by the increase of heart rate. Transient bradycardia is more significant in trocar insertion than skin incision. The IHR change quantifies differential responses to different surgical intensity. Serial PK analysis demonstrates de-sensitization in skin incision, but not in laparoscopic trocar insertion. Conclusions. Quantitative indices present the transient bradycardia introduced by noxious stimulation. The results indicate different effects between skin incision and trocar insertion. △ Less

Submitted 1 February, 2019; originally announced February 2019.

Comments: One table and 4 figures

arXiv:1901.06790 [pdf, other]

doi 10.1103/PhysRevLett.122.148102

Spatial interactions and oscillatory tragedies of the commons

Authors: Yu-Hui Lin, Joshua S. Weitz

Abstract: A tragedy of the commons (TOC) occurs when individuals acting in their own self-interest deplete commonly-held resources, leading to a worse outcome than had they cooperated. Over time, the depletion of resources can change incentives for subsequent actions. Here, we investigate long-term feedback between game and environment across a continuum of incentives in an individual-based framework. We id… ▽ More A tragedy of the commons (TOC) occurs when individuals acting in their own self-interest deplete commonly-held resources, leading to a worse outcome than had they cooperated. Over time, the depletion of resources can change incentives for subsequent actions. Here, we investigate long-term feedback between game and environment across a continuum of incentives in an individual-based framework. We identify payoff-dependent transition rules that lead to oscillatory TOC-s in stochastic simulations and the mean field limit. Further extending the stochastic model, we find that spatially explicit interactions can lead to emergent, localized dynamics, including the propagation of cooperative wave fronts and cluster formation of both social context and resources. These dynamics suggest new mechanisms underlying how TOCs arise and how they might be averted. △ Less

Submitted 20 January, 2019; originally announced January 2019.

Comments: 5 pages and 3 figures in main text, 9 pages and 4 figures in supplementary material

Journal ref: Phys. Rev. Lett. 122, 148102 (2019)

arXiv:1901.00877 [pdf, other]

A Network-based Multimodal Data Fusion Approach for Characterizing Dynamic Multimodal Physiological Patterns

Authors: Miaolin Fan, Chun-An Chou, Sheng-Che Yen, Yingzi Lin

Abstract: Characterizing the dynamic interactive patterns of complex systems helps gain in-depth understanding of how components interrelate with each other while performing certain functions as a whole. In this study, we present a novel multimodal data fusion approach to construct a complex network, which models the interactions of biological subsystems in the human body under emotional states through phys… ▽ More Characterizing the dynamic interactive patterns of complex systems helps gain in-depth understanding of how components interrelate with each other while performing certain functions as a whole. In this study, we present a novel multimodal data fusion approach to construct a complex network, which models the interactions of biological subsystems in the human body under emotional states through physiological responses. Joint recurrence plot and temporal network metrics are employed to integrate the multimodal information at the signal level. A benchmark public dataset of is used for evaluating our model. △ Less

Submitted 3 January, 2019; originally announced January 2019.

arXiv:1812.02911 [pdf, other]

Accelerated Bayesian inference of gene expression models from snapshots of single-cell transcripts

Authors: Yen Ting Lin, Nicolas E. Buchler

Abstract: Understanding how stochastic gene expression is regulated in biological systems using snapshots of single-cell transcripts requires state-of-the-art methods of computational analysis and statistical inference. A Bayesian approach to statistical inference is the most complete method for model selection and uncertainty quantification of kinetic parameters from single-cell data. This approach is impr… ▽ More Understanding how stochastic gene expression is regulated in biological systems using snapshots of single-cell transcripts requires state-of-the-art methods of computational analysis and statistical inference. A Bayesian approach to statistical inference is the most complete method for model selection and uncertainty quantification of kinetic parameters from single-cell data. This approach is impractical because current numerical algorithms are too slow to handle typical models of gene expression. To solve this problem, we first show that time-dependent mRNA distributions of discrete-state models of gene expression are dynamic Poisson mixtures, whose mixing kernels are characterized by a piece-wise deterministic Markov process. We combined this analytical result with a kinetic Monte Carlo algorithm to create a hybrid numerical method that accelerates the calculation of time-dependent mRNA distributions by 1000-fold compared to current methods. We then integrated the hybrid algorithm into an existing Monte Carlo sampler to estimate the Bayesian posterior distribution of many different, competing models in a reasonable amount of time. We validated our method of accelerated Bayesian inference on several synthetic data sets. Our results show that kinetic parameters can be reasonably constrained for modestly sampled data sets, if the model is known \textit{a priori}. If the model is unknown,the Bayesian evidence can be used to rigorously quantify the likelihood of a model relative to other models from the data. We demonstrate that Bayesian evidence selects the true model and outperforms approximate metrics, e.g., Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC), often used for model selection. △ Less

Submitted 7 December, 2018; originally announced December 2018.

Comments: 13 pages, 5 figures, 1 Algorithm

arXiv:1809.01722 [pdf]

doi 10.1371/journal.pone.0221319

Unexpected sawtooth artifact in beat-to-beat pulse transit time measured from patient monitor data

Authors: Yu-Ting Lin, Yu-Lun Lo, Chen-Yun Lin, Hau-Tieng Wu, Martin G. Frasch

Abstract: Object: It is increasingly popular to collect as much data as possible in the hospital setting from clinical monitors for research purposes. However, in this setup the data calibration issue is often not discussed and, rather, implicitly assumed, while the clinical monitors might not be designed for the data analysis purpose. We hypothesize that this calibration issue for a secondary analysis may… ▽ More Object: It is increasingly popular to collect as much data as possible in the hospital setting from clinical monitors for research purposes. However, in this setup the data calibration issue is often not discussed and, rather, implicitly assumed, while the clinical monitors might not be designed for the data analysis purpose. We hypothesize that this calibration issue for a secondary analysis may become an important source of artifacts in patient monitor data. We test an off-the-shelf integrated photoplethysmography (PPG) and electrocardiogram (ECG) monitoring device for its ability to yield a reliable pulse transit time (PTT) signal. Approach: This is a retrospective clinical study using two databases: one containing 35 subjects who underwent laparoscopic cholecystectomy, another containing 22 subjects who underwent spontaneous breathing test in the intensive care unit. All data sets include recordings of PPG and ECG using a commonly deployed patient monitor. We calculated the PTT signal offline. Main Results: We report a novel constant oscillatory pattern in the PTT signal and identify this pattern as a sawtooth artifact. We apply an approach based on the de-shape method to visualize, quantify and validate this sawtooth artifact. Significance: The PPG and ECG signals not designed for the PTT evaluation may contain unwanted artifacts. The PTT signal should be calibrated before analysis to avoid erroneous interpretation of its physiological meaning. △ Less

Submitted 9 August, 2019; v1 submitted 27 August, 2018; originally announced September 2018.

arXiv:1808.10023 [pdf, other]

doi 10.1039/C8CP05095C

Coarse-Grained Residue-Based Models of Disordered Protein Condensates: Utility and Limitations of Simple Charge Pattern Parameters

Authors: Suman Das, Alan Amin, Yi-Hsuan Lin, Hue Sun Chan

Abstract: Biomolecular condensates undergirded by phase separations of proteins and nucleic acids serve crucial biological functions. To gain physical insights into their genetic basis, we study how liquid-liquid phase separation (LLPS) of intrinsically disordered proteins (IDPs) depends on their sequence charge patterns using a continuum Langevin chain model wherein each amino acid residue is represented b… ▽ More Biomolecular condensates undergirded by phase separations of proteins and nucleic acids serve crucial biological functions. To gain physical insights into their genetic basis, we study how liquid-liquid phase separation (LLPS) of intrinsically disordered proteins (IDPs) depends on their sequence charge patterns using a continuum Langevin chain model wherein each amino acid residue is represented by a single bead. Charge patterns are characterized by the `blockiness' measure $κ$ and the `sequence charge decoration' (SCD) parameter. Consistent with random phase approximation (RPA) theory and lattice simulations, LLPS propensity as characterized by critical temperature $T^*_{\rm cr}$ increases with increasingly negative SCD for a set of sequences showing a positive correlation between $κ$ and $-$SCD. Relative to RPA, the simulated sequence-dependent variation in $T^*_{\rm cr}$ is often---though not always---smaller, whereas the simulated critical volume fractions are higher. However, for a set of sequences exhibiting an anti-correlation between $κ$ and $-$SCD, the simulated $T^*_{\rm cr}$'s are quite insensitive to either parameters. Additionally, we find that blocky sequences that allow for strong electrostatic repulsion can lead to coexistence curves with upward concavity as stipulated by RPA, but the LLPS propensity of a strictly alternating charge sequence was likely overestimated by RPA and lattice models because interchain stabilization of this sequence requires spatial alignments that are difficult to achieve in real space. These results help delineate the utility and limitations of the charge pattern parameters and of RPA, pointing to further efforts necessary for rationalizing the newly observed subtleties. △ Less

Submitted 29 October, 2018; v1 submitted 29 August, 2018; originally announced August 2018.

Comments: 44 pages, 14 figures, 2 tables, accepted for publication in Physical Chemistry Chemical Physics (PCCP)

Journal ref: Physical Chemistry Chemical Physics (PCCP) Vol.20, pp.28558-28574 (2018)

arXiv:1803.05938 [pdf, other]

doi 10.1016/j.neuroimage.2018.03.014

Low Rank plus Sparse Decomposition of ODFs for Improved Detection of Group-level Differences and Variable Correlations in White Matter

Authors: Steven H. Baete, **gyun Chen, Ying-Chia Lin, Xiuyuan Wang, Ricardo Otazo, Fernando E. Boada

Abstract: A novel approach is presented for group statistical analysis of diffusion weighted MRI datasets through voxelwise Orientation Distribution Functions (ODF). Recent advances in MRI acquisition make it possible to use high quality diffusion weighted protocols (multi-shell, large number of gradient directions) for routine in vivo study of white matter architecture. The dimensionality of these data set… ▽ More A novel approach is presented for group statistical analysis of diffusion weighted MRI datasets through voxelwise Orientation Distribution Functions (ODF). Recent advances in MRI acquisition make it possible to use high quality diffusion weighted protocols (multi-shell, large number of gradient directions) for routine in vivo study of white matter architecture. The dimensionality of these data sets is however often reduced to simplify statistical analysis. While these approaches may detect large group differences, they do not fully capitalize on all acquired image volumes. Incorporation of all available diffusion information in the analysis however risks biasing the outcome by outliers. Here we propose a statistical analysis method operating on the ODF, either the diffusion ODF or fiber ODF. To avoid outlier bias and reliably detect voxelwise group differences and correlations with demographic or behavioral variables, we apply the Low-Rank plus Sparse (L + S) matrix decomposition on the voxelwise ODFs which separates the sparse individual variability in the sparse matrix S whilst recovering the essential ODF features in the low-rank matrix L. We demonstrate the performance of this ODF L + S approach by replicating the established negative association between global white matter integrity and physical obesity in the Human Connectome dataset. The volume of positive findings agrees with and expands on the volume found by TBSS, Connectivity based fixel enhancement and Connectometry. In the same dataset we further localize the correlations of brain structure with neurocognitive measures such as fluid intelligence and episodic memory. The presented ODF L + S approach will aid in the full utilization of all acquired diffusion weightings leading to the detection of smaller group differences in clinically relevant settings as well as in neuroscience applications. △ Less

Submitted 15 March, 2018; originally announced March 2018.

Comments: 20 pages, 11 figures, 5 supplementary figures

Journal ref: NeuroImage, 174:138-152, 2018

arXiv:1803.02941 [pdf, other]

doi 10.1103/PhysRevE.99.032122

Model reduction methods for classical stochastic systems with fast-switching environments: reduced master equations, stochastic differential equations, and applications

Authors: Peter G. Hufton, Yen Ting Lin, Tobias Galla

Abstract: We study classical stochastic systems with discrete states, coupled to switching external environments. For fast environmental processes we derive reduced dynamics for the system itself, focusing on corrections to the adiabatic limit of infinite time scale separation. In some cases, this leads to master equations with negative transition `rates' or bursting events. We devise a simulation algorithm… ▽ More We study classical stochastic systems with discrete states, coupled to switching external environments. For fast environmental processes we derive reduced dynamics for the system itself, focusing on corrections to the adiabatic limit of infinite time scale separation. In some cases, this leads to master equations with negative transition `rates' or bursting events. We devise a simulation algorithm in discrete time to unravel these master equations into sample paths, and provide an interpretation of bursting events. Focusing on stochastic population dynamics coupled to external environments, we discuss a series of approximation schemes combining expansions in the inverse switching rate of the environment, and a Kramers--Moyal expansion in the inverse size of the population. This places the different approximations in relation to existing work on piecewise-deterministic and piecewise-diffusive Markov processes. We apply the model reduction methods to different examples including systems in biology and a model of crack propagation. △ Less

Submitted 7 March, 2018; originally announced March 2018.

Comments: 31 pages, 13 figures

Journal ref: Phys. Rev. E 99, 032121 (2019)

Showing 1–50 of 77 results for author: Lin, Y