Search | arXiv e-print repository

Functional Protein Design with Local Domain Alignment

Authors: Chaohao Yuan, Songyou Li, Geyan Ye, Yikun Zhang, Long-Kai Huang, Wenbing Huang, Wei Liu, Jianhua Yao, Yu Rong

Abstract: The core challenge of de novo protein design lies in creating proteins with specific functions or properties, guided by certain conditions. Current models explore to generate protein using structural and evolutionary guidance, which only provide indirect conditions concerning functions and properties. However, textual annotations of proteins, especially the annotations for protein domains, which d… ▽ More The core challenge of de novo protein design lies in creating proteins with specific functions or properties, guided by certain conditions. Current models explore to generate protein using structural and evolutionary guidance, which only provide indirect conditions concerning functions and properties. However, textual annotations of proteins, especially the annotations for protein domains, which directly describe the protein's high-level functionalities, properties, and their correlation with target amino acid sequences, remain unexplored in the context of protein design tasks. In this paper, we propose Protein-Annotation Alignment Generation (PAAG), a multi-modality protein design framework that integrates the textual annotations extracted from protein database for controllable generation in sequence space. Specifically, within a multi-level alignment module, PAAG can explicitly generate proteins containing specific domains conditioned on the corresponding domain annotations, and can even design novel proteins with flexible combinations of different kinds of annotations. Our experimental results underscore the superiority of the aligned protein representations from PAAG over 7 prediction tasks. Furthermore, PAAG demonstrates a nearly sixfold increase in generation success rate (24.7% vs 4.7% in zinc finger, and 54.3% vs 8.7% in the immunoglobulin domain) in comparison to the existing model. △ Less

Submitted 27 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2402.13555 [pdf, other]

Full-Atom Peptide Design with Geometric Latent Diffusion

Authors: Xiangzhe Kong, Yinjun Jia, Wenbing Huang, Yang Liu

Abstract: Peptide design plays a pivotal role in therapeutics, allowing brand new possibility to leverage target binding sites that are previously undruggable. Most existing methods are either inefficient or only concerned with the target-agnostic design of 1D sequences. In this paper, we propose a generative model for full-atom \textbf{Pep}tide design with \textbf{G}eometric \textbf{LA}tent \textbf{D}iffus… ▽ More Peptide design plays a pivotal role in therapeutics, allowing brand new possibility to leverage target binding sites that are previously undruggable. Most existing methods are either inefficient or only concerned with the target-agnostic design of 1D sequences. In this paper, we propose a generative model for full-atom \textbf{Pep}tide design with \textbf{G}eometric \textbf{LA}tent \textbf{D}iffusion (PepGLAD). We first establish a benchmark consisting of both 1D sequences and 3D structures from Protein Data Bank (PDB) and literature for systematic evaluation. We then identify two major challenges of leveraging current diffusion-based models for peptide design: the full-atom geometry and the variable binding geometry. To tackle the first challenge, PepGLAD derives a variational autoencoder that first encodes full-atom residues of variable size into fixed-dimensional latent representations, and then decodes back to the residue space after conducting the diffusion process in the latent space. For the second issue, PepGLAD explores a receptor-specific affine transformation to convert the 3D coordinates into a shared standard space, enabling better generalization ability across different binding shapes. Experimental Results show that our method not only improves diversity and binding affinity significantly in the task of sequence-structure co-design, but also excels at recovering reference structures for binding conformation generation. △ Less

Submitted 21 May, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: 25 pages

arXiv:2401.17433 [pdf]

Coronary CTA and Quantitative Cardiac CT Perfusion (CCTP) in Coronary Artery Disease

Authors: Hao Wu, Yingnan Song, Ammar Hoori, Ananya Subramaniam, Juhwan Lee, Justin Kim, Tao Hu, Sadeer Al-Kindi, Wei-Ming Huang, Chun-Ho Yun, Chung-Lieh Hung, Sanjay Rajagopalan, David L. Wilson

Abstract: We assessed the benefit of combining stress cardiac CT perfusion (CCTP) myocardial blood flow (MBF) with coronary CT angiography (CCTA) using our innovative CCTP software. By combining CCTA and CCTP, one can uniquely identify a flow limiting stenosis (obstructive-lesion + low-MBF) versus MVD (no-obstructive-lesion + low-MBF. We retrospectively evaluated 104 patients with suspected CAD, including 1… ▽ More We assessed the benefit of combining stress cardiac CT perfusion (CCTP) myocardial blood flow (MBF) with coronary CT angiography (CCTA) using our innovative CCTP software. By combining CCTA and CCTP, one can uniquely identify a flow limiting stenosis (obstructive-lesion + low-MBF) versus MVD (no-obstructive-lesion + low-MBF. We retrospectively evaluated 104 patients with suspected CAD, including 18 with diabetes, who underwent CCTA+CCTP. Whole heart and territorial MBF was assessed using our automated pipeline for CCTP analysis that included beam hardening correction; temporal scan registration; automated segmentation; fast, accurate, robust MBF estimation; and visualization. Stenosis severity was scored using the CCTA coronary-artery-disease-reporting-and-data-system (CAD-RADS), with obstructive stenosis deemed as CAD-RADS>=3. We established a threshold MBF (MBF=199-mL/min-100g) for normal perfusion. In patients with CAD-RADS>=3, 28/37(76%) patients showed ischemia in the corresponding territory. Two patients with obstructive disease had normal perfusion, suggesting collaterals and/or a hemodynamically insignificant stenosis. Among diabetics, 10 of 18 (56%) demonstrated diffuse ischemia consistent with MVD. Among non-diabetics, only 6% had MVD. Sex-specific prevalence of MVD was 21%/24% (M/F). On a per-vessel basis (n=256), MBF showed a significant difference between territories with and without obstructive stenosis (165 +/- 61 mL/min-100g vs. 274 +/- 62 mL/min-100g, p <0.05). A significant and negative rank correlation (rho=-0.53, p<0.05) between territory MBF and CAD-RADS was seen. CCTA in conjunction with a new automated quantitative CCTP approach can augment the interpretation of CAD, enabling the distinction of ischemia due to obstructive lesions and MVD. △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.08986 [pdf, other]

Rigid Protein-Protein Docking via Equivariant Elliptic-Paraboloid Interface Prediction

Authors: Ziyang Yu, Wenbing Huang, Yang Liu

Abstract: The study of rigid protein-protein docking plays an essential role in a variety of tasks such as drug design and protein engineering. Recently, several learning-based methods have been proposed for the task, exhibiting much faster docking speed than those computational methods. In this paper, we propose a novel learning-based method called ElliDock, which predicts an elliptic paraboloid to represe… ▽ More The study of rigid protein-protein docking plays an essential role in a variety of tasks such as drug design and protein engineering. Recently, several learning-based methods have been proposed for the task, exhibiting much faster docking speed than those computational methods. In this paper, we propose a novel learning-based method called ElliDock, which predicts an elliptic paraboloid to represent the protein-protein docking interface. To be specific, our model estimates elliptic paraboloid interfaces for the two input proteins respectively, and obtains the roto-translation transformation for docking by making two interfaces coincide. By its design, ElliDock is independently equivariant with respect to arbitrary rotations/translations of the proteins, which is an indispensable property to ensure the generalization of the docking process. Experimental evaluations show that ElliDock achieves the fastest inference time among all compared methods and is strongly competitive with current state-of-the-art learning-based models such as DiffDock-PP and Multimer particularly for antibody-antigen docking. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: ICLR 2024

arXiv:2311.10315 [pdf, other]

Interpretable Modeling of Single-cell perturbation Responses to Novel Drugs Using Cycle Consistence Learning

Authors: Wei Huang, Aichun Zhu, Hui Liu

Abstract: Phenotype-based screening has attracted much attention for identifying cell-active compounds. Transcriptional and proteomic profiles of cell population or single cells are informative phenotypic measures of cellular responses to perturbations. In this paper, we proposed a deep learning framework based on encoder-decoder architecture that maps the initial cellular states to a latent space, in which… ▽ More Phenotype-based screening has attracted much attention for identifying cell-active compounds. Transcriptional and proteomic profiles of cell population or single cells are informative phenotypic measures of cellular responses to perturbations. In this paper, we proposed a deep learning framework based on encoder-decoder architecture that maps the initial cellular states to a latent space, in which we assume the effects of drug perturbation on cellular states follow linear additivity. Next, we introduced the cycle consistency constraints to enforce that initial cellular state subjected to drug perturbations would produce the perturbed cellular responses, and, conversely, removal of drug perturbation from the perturbed cellular states would restore the initial cellular states. The cycle consistency constraints and linear modeling in latent space enable to learn interpretable and transferable drug perturbation representations, so that our model can predict cellular response to unseen drugs. We validated our model on three different types of datasets, including bulk transcriptional responses, bulk proteomic responses, and single-cell transcriptional responses to drug perturbations. The experimental results show that our model achieves better performance than existing state-of-the-art methods. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2309.06355 [pdf, other]

Single-cell mutational burden distributions in birth-death processes

Authors: Christo Morison, Dudley Stark, Weini Huang

Abstract: Genetic mutations are footprints of tumour growth. While mutation data in bulk samples has been used to infer evolutionary parameters hard to measure in vivo, the advent of single-cell data has led to strong interest in the mutational burden distribution (MBD) among tumour cells. We introduce dynamical matrices and recurrence relations to integrate this single-cell MBD with known statistics, and d… ▽ More Genetic mutations are footprints of tumour growth. While mutation data in bulk samples has been used to infer evolutionary parameters hard to measure in vivo, the advent of single-cell data has led to strong interest in the mutational burden distribution (MBD) among tumour cells. We introduce dynamical matrices and recurrence relations to integrate this single-cell MBD with known statistics, and derive new analytical expressions. Surprisingly, we find that the shape of the MBD is driven by cell lineage-level stochasticity rather than by the distribution of mutations in each cell division. △ Less

Submitted 7 June, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: 28 pages, 6 figures

arXiv:2306.01474 [pdf, other]

Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning

Authors: Xiangzhe Kong, Wenbing Huang, Yang Liu

Abstract: Many processes in biology and drug discovery involve various 3D interactions between molecules, such as protein and protein, protein and small molecule, etc. Given that different molecules are usually represented in different granularity, existing methods usually encode each type of molecules independently with different models, leaving it defective to learn the various underlying interaction phys… ▽ More Many processes in biology and drug discovery involve various 3D interactions between molecules, such as protein and protein, protein and small molecule, etc. Given that different molecules are usually represented in different granularity, existing methods usually encode each type of molecules independently with different models, leaving it defective to learn the various underlying interaction physics. In this paper, we first propose to universally represent an arbitrary 3D complex as a geometric graph of sets, shedding light on encoding all types of molecules with one model. We then propose a Generalist Equivariant Transformer (GET) to effectively capture both domain-specific hierarchies and domain-agnostic interaction physics. To be specific, GET consists of a bilevel attention module, a feed-forward module and a layer normalization module, where each module is E(3) equivariant and specialized for handling sets of variable sizes. Notably, in contrast to conventional pooling-based hierarchical models, our GET is able to retain fine-grained information of all levels. Extensive experiments on the interactions between proteins, small molecules and RNA/DNAs verify the effectiveness and generalization capability of our proposed method across different domains. △ Less

Submitted 29 May, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: Accepted to ICML 2024

arXiv:2303.10221 [pdf, other]

A statistical framework for GWAS of high dimensional phenotypes using summary statistics, with application to metabolite GWAS

Authors: Weiqiong Huang, Emily C. Hector, Joshua Cape, Chris McKennan

Abstract: The recent explosion of genetic and high dimensional biobank and 'omic' data has provided researchers with the opportunity to investigate the shared genetic origin (pleiotropy) of hundreds to thousands of related phenotypes. However, existing methods for multi-phenotype genome-wide association studies (GWAS) do not model pleiotropy, are only applicable to a small number of phenotypes, or provide n… ▽ More The recent explosion of genetic and high dimensional biobank and 'omic' data has provided researchers with the opportunity to investigate the shared genetic origin (pleiotropy) of hundreds to thousands of related phenotypes. However, existing methods for multi-phenotype genome-wide association studies (GWAS) do not model pleiotropy, are only applicable to a small number of phenotypes, or provide no way to perform inference. To add further complication, raw genetic and phenotype data are rarely observed, meaning analyses must be performed on GWAS summary statistics whose statistical properties in high dimensions are poorly understood. We therefore developed a novel model, theoretical framework, and set of methods to perform Bayesian inference in GWAS of high dimensional phenotypes using summary statistics that explicitly model pleiotropy, beget fast computation, and facilitate the use of biologically informed priors. We demonstrate the utility of our procedure by applying it to metabolite GWAS, where we develop new nonparametric priors for genetic effects on metabolite levels that use known metabolic pathway information and foster interpretable inference at the pathway level. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: 24 pages of main text, 7 figures, 1 table

arXiv:2302.12177 [pdf, other]

EquiPocket: an E(3)-Equivariant Geometric Graph Neural Network for Ligand Binding Site Prediction

Authors: Yang Zhang, Zhewei Wei, Ye Yuan, Chongxuan Li, Wenbing Huang

Abstract: Predicting the binding sites of target proteins plays a fundamental role in drug discovery. Most existing deep-learning methods consider a protein as a 3D image by spatially clustering its atoms into voxels and then feed the voxelized protein into a 3D CNN for prediction. However, the CNN-based methods encounter several critical issues: 1) defective in representing irregular protein structures; 2)… ▽ More Predicting the binding sites of target proteins plays a fundamental role in drug discovery. Most existing deep-learning methods consider a protein as a 3D image by spatially clustering its atoms into voxels and then feed the voxelized protein into a 3D CNN for prediction. However, the CNN-based methods encounter several critical issues: 1) defective in representing irregular protein structures; 2) sensitive to rotations; 3) insufficient to characterize the protein surface; 4) unaware of protein size shift. To address the above issues, this work proposes EquiPocket, an E(3)-equivariant Graph Neural Network (GNN) for binding site prediction, which comprises three modules: the first one to extract local geometric information for each surface atom, the second one to model both the chemical and spatial structure of protein and the last one to capture the geometry of the surface via equivariant message passing over the surface atoms. We further propose a dense attention output layer to alleviate the effect incurred by variable protein size. Extensive experiments on several representative benchmarks demonstrate the superiority of our framework to the state-of-the-art methods. △ Less

Submitted 8 June, 2024; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: Accepted to ICML 2024 (Oral)

arXiv:2302.00203 [pdf, other]

End-to-End Full-Atom Antibody Design

Authors: Xiangzhe Kong, Wenbing Huang, Yang Liu

Abstract: Antibody design is an essential yet challenging task in various domains like therapeutics and biology. There are two major defects in current learning-based methods: 1) tackling only a certain subtask of the whole antibody design pipeline, making them suboptimal or resource-intensive. 2) omitting either the framework regions or side chains, thus incapable of capturing the full-atom geometry. To ad… ▽ More Antibody design is an essential yet challenging task in various domains like therapeutics and biology. There are two major defects in current learning-based methods: 1) tackling only a certain subtask of the whole antibody design pipeline, making them suboptimal or resource-intensive. 2) omitting either the framework regions or side chains, thus incapable of capturing the full-atom geometry. To address these pitfalls, we propose dynamic Multi-channel Equivariant grAph Network (dyMEAN), an end-to-end full-atom model for E(3)-equivariant antibody design given the epitope and the incomplete sequence of the antibody. Specifically, we first explore structural initialization as a knowledgeable guess of the antibody structure and then propose shadow paratope to bridge the epitope-antibody connections. Both 1D sequences and 3D structures are updated via an adaptive multi-channel equivariant encoder that is able to process protein residues of variable sizes when considering full atoms. Finally, the updated antibody is docked to the epitope via the alignment of the shadow paratope. Experiments on epitope-binding CDR-H3 design, complex structure prediction, and affinity optimization demonstrate the superiority of our end-to-end framework and full-atom modeling. △ Less

Submitted 29 May, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

Comments: Accepted to ICML 2023

arXiv:2212.13026 [pdf, other]

Network analysis on cortical morphometry in first-episode schizophrenia

Authors: Mowen Yin, Weikai Huang, Zhichao Liang, Quanying Liu, Xiaoying Tang

Abstract: First-episode schizophrenia (FES) results in abnormality of brain connectivity at different levels. Despite some successful findings on functional and structural connectivity of FES, relatively few studies have been focused on morphological connectivity, which may provide a potential biomarker for FES. In this study, we aim to investigate cortical morphological connectivity in FES. T1-weighted mag… ▽ More First-episode schizophrenia (FES) results in abnormality of brain connectivity at different levels. Despite some successful findings on functional and structural connectivity of FES, relatively few studies have been focused on morphological connectivity, which may provide a potential biomarker for FES. In this study, we aim to investigate cortical morphological connectivity in FES. T1-weighted magnetic resonance image data from 92 FES patients and 106 healthy controls (HCs) are analyzed.We parcellate brain into 68 cortical regions, calculate the averaged thickness and surface area of each region, construct undirected networks by correlating cortical thickness or surface area measures across 68 regions for each group, and finally compute a variety of network-related topology characteristics. Our experimental results show that both the cortical thickness network and the surface area network in two groups are small-world networks; that is, those networks have high clustering coefficients and low characteristic path lengths. At certain network sparsity levels, both the cortical thickness network and the surface area network of FES have significantly lower clustering coefficients and local efficiencies than those of HC, indicating FES-related abnormalities in local connectivity and small-worldness. These abnormalities mainly involve the frontal, parietal, and temporal lobes. Further regional analyses confirm significant group differences in the node betweenness of the posterior cingulate gyrus for both the cortical thickness network and the surface area network. Our work supports that cortical morphological connectivity, which is constructed based on correlations across subjects' cortical thickness, may serve as a tool to study topological abnormalities in neurological disorders. △ Less

Submitted 26 December, 2022; originally announced December 2022.

arXiv:2211.01978 [pdf, other]

doi 10.1145/3511808.3557142

PEMP: Leveraging Physics Properties to Enhance Molecular Property Prediction

Authors: Yuancheng Sun, Yimeng Chen, Weizhi Ma, Wenhao Huang, Kang Liu, Zhiming Ma, Wei-Ying Ma, Yanyan Lan

Abstract: Molecular property prediction is essential for drug discovery. In recent years, deep learning methods have been introduced to this area and achieved state-of-the-art performances. However, most of existing methods ignore the intrinsic relations between molecular properties which can be utilized to improve the performances of corresponding prediction tasks. In this paper, we propose a new approach,… ▽ More Molecular property prediction is essential for drug discovery. In recent years, deep learning methods have been introduced to this area and achieved state-of-the-art performances. However, most of existing methods ignore the intrinsic relations between molecular properties which can be utilized to improve the performances of corresponding prediction tasks. In this paper, we propose a new approach, namely Physics properties Enhanced Molecular Property prediction (PEMP), to utilize relations between molecular properties revealed by previous physics theory and physical chemistry studies. Specifically, we enhance the training of the chemical and physiological property predictors with related physics property prediction tasks. We design two different methods for PEMP, respectively based on multi-task learning and transfer learning. Both methods include a model-agnostic molecule representation module and a property prediction module. In our implementation, we adopt both the state-of-the-art molecule embedding models under the supervised learning paradigm and the pretraining paradigm as the molecule representation module of PEMP, respectively. Experimental results on public benchmark MoleculeNet show that the proposed methods have the ability to outperform corresponding state-of-the-art models. △ Less

Submitted 18 October, 2022; originally announced November 2022.

Comments: 9 pages. Published in CIKM 2022

arXiv:2208.06073 [pdf, other]

Conditional Antibody Design as 3D Equivariant Graph Translation

Authors: Xiangzhe Kong, Wenbing Huang, Yang Liu

Abstract: Antibody design is valuable for therapeutic usage and biological research. Existing deep-learning-based methods encounter several key issues: 1) incomplete context for Complementarity-Determining Regions (CDRs) generation; 2) incapability of capturing the entire 3D geometry of the input structure; 3) inefficient prediction of the CDR sequences in an autoregressive manner. In this paper, we propose… ▽ More Antibody design is valuable for therapeutic usage and biological research. Existing deep-learning-based methods encounter several key issues: 1) incomplete context for Complementarity-Determining Regions (CDRs) generation; 2) incapability of capturing the entire 3D geometry of the input structure; 3) inefficient prediction of the CDR sequences in an autoregressive manner. In this paper, we propose Multi-channel Equivariant Attention Network (MEAN) to co-design 1D sequences and 3D structures of CDRs. To be specific, MEAN formulates antibody design as a conditional graph translation problem by importing extra components including the target antigen and the light chain of the antibody. Then, MEAN resorts to E(3)-equivariant message passing along with a proposed attention mechanism to better capture the geometrical correlation between different components. Finally, it outputs both the 1D sequences and 3D structure via a multi-round progressive full-shot scheme, which enjoys more efficiency and precision against previous autoregressive approaches. Our method significantly surpasses state-of-the-art models in sequence and structure modeling, antigen-binding CDR design, and binding affinity optimization. Specifically, the relative improvement to baselines is about 23% in antigen-binding CDR design and 34% for affinity optimization. △ Less

Submitted 30 March, 2023; v1 submitted 11 August, 2022; originally announced August 2022.

Comments: Accepted to ICLR 2023 as oral presentation. Outstanding paper honorable mentions

arXiv:2207.08824 [pdf, other]

Energy-Motivated Equivariant Pretraining for 3D Molecular Graphs

Authors: Rui Jiao, Jiaqi Han, Wenbing Huang, Yu Rong, Yang Liu

Abstract: Pretraining molecular representation models without labels is fundamental to various applications. Conventional methods mainly process 2D molecular graphs and focus solely on 2D tasks, making their pretrained models incapable of characterizing 3D geometry and thus defective for downstream 3D tasks. In this work, we tackle 3D molecular pretraining in a complete and novel sense. In particular, we fi… ▽ More Pretraining molecular representation models without labels is fundamental to various applications. Conventional methods mainly process 2D molecular graphs and focus solely on 2D tasks, making their pretrained models incapable of characterizing 3D geometry and thus defective for downstream 3D tasks. In this work, we tackle 3D molecular pretraining in a complete and novel sense. In particular, we first propose to adopt an equivariant energy-based model as the backbone for pretraining, which enjoys the merits of fulfilling the symmetry of 3D space. Then we develop a node-level pretraining loss for force prediction, where we further exploit the Riemann-Gaussian distribution to ensure the loss to be E(3)-invariant, enabling more robustness. Moreover, a graph-level noise scale prediction task is also leveraged to further promote the eventual performance. We evaluate our model pretrained from a large-scale 3D dataset GEOM-QM9 on two challenging 3D benchmarks: MD17 and QM9. Experimental results demonstrate the efficacy of our method against current state-of-the-art pretraining approaches, and verify the validity of our design for each proposed component. △ Less

Submitted 29 November, 2022; v1 submitted 18 July, 2022; originally announced July 2022.

Comments: AAAI 2023

arXiv:2110.08213 [pdf, other]

Towards Identity Preserving Normal to Dysarthric Voice Conversion

Authors: Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda

Abstract: We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. Such a framework is essential for (1) clinical decision making processes and alleviation of patient stress, (2) data augmentation for dysarthric speech recognition. This is an especially challenging task since the converted samples should capture the severity of dysarth… ▽ More We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. Such a framework is essential for (1) clinical decision making processes and alleviation of patient stress, (2) data augmentation for dysarthric speech recognition. This is an especially challenging task since the converted samples should capture the severity of dysarthric speech while being highly natural and possessing the speaker identity of the normal speaker. To this end, we adopted a two-stage framework, which consists of a sequence-to-sequence model and a nonparallel frame-wise model. Objective and subjective evaluations were conducted on the UASpeech dataset, and results showed that the method was able to yield reasonable naturalness and capture severity aspects of the pathological speech. On the other hand, the similarity to the normal source speaker's voice was limited and requires further improvements. △ Less

Submitted 15 October, 2021; originally announced October 2021.

Comments: Submitted to ICASSP 2022

arXiv:2109.12608 [pdf, other]

U(1) dynamics in neuronal activities

Authors: Chia-Ying Lin, **-Han Chen, Hsiu-Hau Lin, Wen-Min Huang

Abstract: Neurons convert the external stimuli into action potentials, or spikes, and encode the contained information into the biological nerve system. Despite the complexity of neurons and the synaptic interactions in between, the rate models are often adapted to describe neural encoding with modest success. However, it is not clear whether the firing rate, the reciprocal of the time interval between spik… ▽ More Neurons convert the external stimuli into action potentials, or spikes, and encode the contained information into the biological nerve system. Despite the complexity of neurons and the synaptic interactions in between, the rate models are often adapted to describe neural encoding with modest success. However, it is not clear whether the firing rate, the reciprocal of the time interval between spikes, is sufficient to capture the essential feature for the neuronal dynamics. Going beyond the usual relaxation dynamics in Ginzburg-Landau theory for statistical systems, we propose the neural activities can be captured by the U(1) dynamics, integrating the action potential and the ``phase" of the neuron together. The gain function of the Hodgkin-Huxley neuron and the corresponding dynamical phase transitions can be described within the U(1) neuron framework. In addition, the phase dependence of the synaptic interactions is illustrated and the map** to the Kinouchi-Copelli neuron is established. It suggests that the U(1) neuron is the minimal model for single-neuron activities and serves as the building block of the neuronal network for information processing. △ Less

Submitted 26 September, 2021; originally announced September 2021.

Comments: 11 pages, 11 figures

arXiv:2107.10670 [pdf, other]

Structure-aware Interactive Graph Neural Networks for the Prediction of Protein-Ligand Binding Affinity

Authors: Shuangli Li, **gbo Zhou, Tong Xu, Liang Huang, Fan Wang, Haoyi Xiong, Weili Huang, De**g Dou, Hui Xiong

Abstract: Drug discovery often relies on the successful prediction of protein-ligand binding affinity. Recent advances have shown great promise in applying graph neural networks (GNNs) for better affinity prediction by learning the representations of protein-ligand complexes. However, existing solutions usually treat protein-ligand complexes as topological graph data, thus the biomolecular structural inform… ▽ More Drug discovery often relies on the successful prediction of protein-ligand binding affinity. Recent advances have shown great promise in applying graph neural networks (GNNs) for better affinity prediction by learning the representations of protein-ligand complexes. However, existing solutions usually treat protein-ligand complexes as topological graph data, thus the biomolecular structural information is not fully utilized. The essential long-range interactions among atoms are also neglected in GNN models. To this end, we propose a structure-aware interactive graph neural network (SIGN) which consists of two components: polar-inspired graph attention layers (PGAL) and pairwise interactive pooling (PiPool). Specifically, PGAL iteratively performs the node-edge aggregation process to update embeddings of nodes and edges while preserving the distance and angle information among atoms. Then, PiPool is adopted to gather interactive edges with a subsequent reconstruction loss to reflect the global interactions. Exhaustive experimental study on two benchmarks verifies the superiority of SIGN. △ Less

Submitted 20 July, 2021; originally announced July 2021.

Comments: 11 pages, 8 figures, Accepted by KDD 2021 (Research Track)

arXiv:2106.15098 [pdf, other]

Molecule Generation by Principal Subgraph Mining and Assembling

Authors: Xiangzhe Kong, Wenbing Huang, Zhixing Tan, Yang Liu

Abstract: Molecule generation is central to a variety of applications. Current attention has been paid to approaching the generation task as subgraph prediction and assembling. Nevertheless, these methods usually rely on hand-crafted or external subgraph construction, and the subgraph assembling depends solely on local arrangement. In this paper, we define a novel notion, principal subgraph, that is closely… ▽ More Molecule generation is central to a variety of applications. Current attention has been paid to approaching the generation task as subgraph prediction and assembling. Nevertheless, these methods usually rely on hand-crafted or external subgraph construction, and the subgraph assembling depends solely on local arrangement. In this paper, we define a novel notion, principal subgraph, that is closely related to the informative pattern within molecules. Interestingly, our proposed merge-and-update subgraph extraction method can automatically discover frequent principal subgraphs from the dataset, while previous methods are incapable of. Moreover, we develop a two-step subgraph assembling strategy, which first predicts a set of subgraphs in a sequence-wise manner and then assembles all generated subgraphs globally as the final output molecule. Built upon graph variational auto-encoder, our model is demonstrated to be effective in terms of several evaluation metrics and efficiency, compared with state-of-the-art methods on distribution learning and (constrained) property optimization tasks. △ Less

Submitted 17 December, 2022; v1 submitted 29 June, 2021; originally announced June 2021.

Comments: Accepted by NeurIPS 2022. Oral presentation

arXiv:2101.08757 [pdf, other]

Expectation-Maximization Regularized Deep Learning for Weakly Supervised Tumor Segmentation for Glioblastoma

Authors: Chao Li, Wenjian Huang, Xi Chen, Yiran Wei, Stephen J. Price, Carola-Bibiane Schönlieb

Abstract: We present an Expectation-Maximization (EM) Regularized Deep Learning (EMReDL) model for weakly supervised tumor segmentation. The proposed framework is tailored to glioblastoma, a type of malignant tumor characterized by its diffuse infiltration into the surrounding brain tissue, which poses significant challenge to treatment target and tumor burden estimation using conventional structural MRI. A… ▽ More We present an Expectation-Maximization (EM) Regularized Deep Learning (EMReDL) model for weakly supervised tumor segmentation. The proposed framework is tailored to glioblastoma, a type of malignant tumor characterized by its diffuse infiltration into the surrounding brain tissue, which poses significant challenge to treatment target and tumor burden estimation using conventional structural MRI. Although physiological MRI provides more specific information regarding tumor infiltration, the relatively low resolution hinders a precise full annotation. This has motivated us to develop a weakly supervised deep learning solution that exploits the partial labelled tumor regions. EMReDL contains two components: a physiological prior prediction model and EM-regularized segmentation model. The physiological prior prediction model exploits the physiological MRI by training a classifier to generate a physiological prior map. This map is passed to the segmentation model for regularization using the EM algorithm. We evaluated the model on a glioblastoma dataset with the pre-operative multiparametric and recurrence MRI available. EMReDL showed to effectively segment the infiltrated tumor from the partially labelled region of potential infiltration. The segmented core tumor and infiltrated tumor demonstrated high consistency with the tumor burden labelled by experts. The performance comparisons showed that EMReDL achieved higher accuracy than published state-of-the-art models. On MR spectroscopy, the segmented region displayed more aggressive features than other partial labelled region. The proposed model can be generalized to other segmentation tasks that rely on partial labels, with the CNN architecture flexible in the framework. △ Less

Submitted 15 July, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

arXiv:2007.03245 [pdf]

doi 10.1016/j.coelec.2020.04.003

Nanoelectrodes for intracellular measurements of reactive oxygen and nitrogen species in single living cells

Authors: Keke Hu, Yan-Ling Liu, Alexander Oleinick, Michael Mirkin, Wei-Hua Huang, Christian Amatore

Abstract: Reactive oxygen and nitrogen species (ROS and RNS) play important roles in various physiological processes (e.g., phagocytosis) and pathological conditions (e.g., cancer). The primary ROS/RNS, viz., hydrogen peroxide, peroxynitrite ion, nitric oxide, and nitrite ion, can be oxidized at different electrode potentials and therefore detected and quantified by electroanalytical techniques. Nanometer-s… ▽ More Reactive oxygen and nitrogen species (ROS and RNS) play important roles in various physiological processes (e.g., phagocytosis) and pathological conditions (e.g., cancer). The primary ROS/RNS, viz., hydrogen peroxide, peroxynitrite ion, nitric oxide, and nitrite ion, can be oxidized at different electrode potentials and therefore detected and quantified by electroanalytical techniques. Nanometer-sized electrochemical probes are especially suitable for measuring ROS/RNS in single cells and cellular organelles. In this article, we survey recent advances in localized measurements of ROS/RNS inside single cells and discuss several methodological issues, including optimization of nanoelectrode geometry, precise positioning of an electrochemical probe inside a cell, and interpretation of electroanalytical data. △ Less

Submitted 7 July, 2020; originally announced July 2020.

Journal ref: Current Opinion in Electrochemistry, Elsevier, 2020, 22, pp.44-50

arXiv:2007.02835 [pdf, other]

Self-Supervised Graph Transformer on Large-Scale Molecular Data

Authors: Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, Junzhou Huang

Abstract: How to obtain informative representations of molecules is a crucial prerequisite in AI-driven drug design and discovery. Recent researches abstract molecules as graphs and employ Graph Neural Networks (GNNs) for molecular representation learning. Nevertheless, two issues impede the usage of GNNs in real scenarios: (1) insufficient labeled molecules for supervised training; (2) poor generalization… ▽ More How to obtain informative representations of molecules is a crucial prerequisite in AI-driven drug design and discovery. Recent researches abstract molecules as graphs and employ Graph Neural Networks (GNNs) for molecular representation learning. Nevertheless, two issues impede the usage of GNNs in real scenarios: (1) insufficient labeled molecules for supervised training; (2) poor generalization capability to new-synthesized molecules. To address them both, we propose a novel framework, GROVER, which stands for Graph Representation frOm self-superVised mEssage passing tRansformer. With carefully designed self-supervised tasks in node-, edge- and graph-level, GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. Rather, to encode such complex information, GROVER integrates Message Passing Networks into the Transformer-style architecture to deliver a class of more expressive encoders of molecules. The flexibility of GROVER allows it to be trained efficiently on large-scale molecular dataset without requiring any supervision, thus being immunized to the two issues mentioned above. We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning. We then leverage the pre-trained GROVER for molecular property prediction followed by task-specific fine-tuning, where we observe a huge improvement (more than 6% on average) from current state-of-the-art methods on 11 challenging benchmarks. The insights we gained are that well-designed self-supervision losses and largely-expressive pre-trained models enjoy the significant potential on performance boosting. △ Less

Submitted 28 October, 2020; v1 submitted 18 June, 2020; originally announced July 2020.

Comments: 17 pages, 7 figures

ACM Class: I.2.0; J.3

arXiv:2005.13607 [pdf, other]

Multi-View Graph Neural Networks for Molecular Property Prediction

Authors: Hehuan Ma, Yatao Bian, Yu Rong, Wenbing Huang, Tingyang Xu, Weiyang Xie, Geyan Ye, Junzhou Huang

Abstract: The crux of molecular property prediction is to generate meaningful representations of the molecules. One promising route is to exploit the molecular graph structure through Graph Neural Networks (GNNs). It is well known that both atoms and bonds significantly affect the chemical properties of a molecule, so an expressive model shall be able to exploit both node (atom) and edge (bond) information… ▽ More The crux of molecular property prediction is to generate meaningful representations of the molecules. One promising route is to exploit the molecular graph structure through Graph Neural Networks (GNNs). It is well known that both atoms and bonds significantly affect the chemical properties of a molecule, so an expressive model shall be able to exploit both node (atom) and edge (bond) information simultaneously. Guided by this observation, we present Multi-View Graph Neural Network (MV-GNN), a multi-view message passing architecture to enable more accurate predictions of molecular properties. In MV-GNN, we introduce a shared self-attentive readout component and disagreement loss to stabilize the training process. This readout component also renders the whole architecture interpretable. We further boost the expressive power of MV-GNN by proposing a cross-dependent message passing scheme that enhances information communication of the two views, which results in the MV-GNN^cross variant. Lastly, we theoretically justify the expressiveness of the two proposed models in terms of distinguishing non-isomorphism graphs. Extensive experiments demonstrate that MV-GNN models achieve remarkably superior performance over the state-of-the-art models on a variety of challenging benchmarks. Meanwhile, visualization results of the node importance are consistent with prior knowledge, which confirms the interpretability power of MV-GNN models. △ Less

Submitted 12 June, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

arXiv:2004.08556 [pdf]

A new method to calculate the Total Fertility Rate from the number of birth

Authors: Weidong Huang

Abstract: The standard methods to calculate the Total Fertility Rate require the reliable age-specific fertility rate including birth data and the related age-specific women's population data. Historically, the number of births was often not counted according to the age of the mother, so it is difficult to estimate the historical total fertility rate with the standard methods. Many empirical methods have be… ▽ More The standard methods to calculate the Total Fertility Rate require the reliable age-specific fertility rate including birth data and the related age-specific women's population data. Historically, the number of births was often not counted according to the age of the mother, so it is difficult to estimate the historical total fertility rate with the standard methods. Many empirical methods have been proposed, but their application is limited to specific period and place. This paper deduces a new method for calculating the total fertility rate from the number of birth and the population of women at childbearing age, can be applied to most of cases. The relative error is usually less than 5%. It is easier to calculate TFR, and may be applied to obtain more TFRs for the history. △ Less

Submitted 18 April, 2020; originally announced April 2020.

Comments: 15 pages, 4 figures

arXiv:2004.08553 [pdf]

A new method for life table and life expectancy calculation

Authors: Weidong Huang

Abstract: The existing life table method needs to calculate the age-specific mortality first, not only has too many and complicated calculation steps, but also introduces the multiple approximation to bring error. This paper redefines the probability of death for the life table as the average probability of death of a group of people born in a certain period at a later time. Based on this definition, a new… ▽ More The existing life table method needs to calculate the age-specific mortality first, not only has too many and complicated calculation steps, but also introduces the multiple approximation to bring error. This paper redefines the probability of death for the life table as the average probability of death of a group of people born in a certain period at a later time. Based on this definition, a new method for the life table is proposed to obtain the life expectancy, which has the same meaning to that from the traditional life table. Using the Japanese population data to verify the method, the results show that it is consistent with the life expectancy of the birth of the baby, the maximum relative difference is no more than 0.1%, and average relative difference is less than 0.03%. The theory and method of life table described in this paper are simple and easy to understand. The needed data are easy to obtained from statistics, and the calculation is easy, the results obtained are accurate and reliable. It should be a very valuable demographic method for application. △ Less

Submitted 18 April, 2020; originally announced April 2020.

Comments: 8 pages, in Chinese, 2 figures, 3 tables

arXiv:1910.00190 [pdf, ps, other]

The energy-spectrum of bicompatible sequences

Authors: Fenix W. Huang, Christopher L. Barrett, Christian M. Reidys

Abstract: Background: Genotype-phenotype maps provide a meaningful filtration of sequence space and RNA secondary structures are particular such phenotypes. Compatible sequences i.e.~sequences that satisfy the base pairing constraints of a given RNA structure play an important role in the context of neutral networks and inverse folding. Sequences satisfying the constraints of two structures simultaneously a… ▽ More Background: Genotype-phenotype maps provide a meaningful filtration of sequence space and RNA secondary structures are particular such phenotypes. Compatible sequences i.e.~sequences that satisfy the base pairing constraints of a given RNA structure play an important role in the context of neutral networks and inverse folding. Sequences satisfying the constraints of two structures simultaneously are called bicompatible and phenotypic change, induced by erroneously replicating populations of RNA sequences, is closely connected to bicompatibility. Furthermore, bicompatible sequences are relevant for riboswitch sequences, beacons of evolution, realizing two distinct phenotypes. Results: We present a full loop energy model Boltzmann sampler of bicompatible sequences for pairs of structures. The novel dynamic programming algorithm is based on a topological framework encapsulating the relations between loops. We utilize our sequence sampler to study the energy spectra and density of bicompatible sequences, the rankings of the structures and key properties for evolutionary transitions. Conclusion: Our analysis of riboswitch sequences shows that key properties of bicompatible sequences depend on the particular pair of structures. While there always exist bicompatible sequences for random structure pairs, they are less suited to facilitate transitions. We show that native riboswitch sequences exhibit a distinct signature with regards to the ranking of their two phenotypes relative to the minimum free energy, suggesting a new criterion for identifying native sequences and sequences subjected to evolutionary pressure. △ Less

Submitted 30 September, 2019; originally announced October 2019.

Comments: 20 pages, 10 Figures

MSC Class: 92-08

arXiv:1906.03226 [pdf, other]

Predicting Onset of Dementia in Parkinson's Disease Patients

Authors: Dhruv Agarwal, Abhishek Srivastava, Edward W Huang

Abstract: Alzheimer's disease (AD) and Parkinson's disease (PD) are the two most common neurodegenerative disorders in humans. Because a significant percentage of patients have clinical and pathological features of both diseases, it has been hypothesized that the patho-cascades of the two diseases overlap. Despite this evidence, these two diseases are rarely studied in a joint manner. In this paper, we util… ▽ More Alzheimer's disease (AD) and Parkinson's disease (PD) are the two most common neurodegenerative disorders in humans. Because a significant percentage of patients have clinical and pathological features of both diseases, it has been hypothesized that the patho-cascades of the two diseases overlap. Despite this evidence, these two diseases are rarely studied in a joint manner. In this paper, we utilize clinical, imaging, genetic, and biospecimen features to cluster AD and PD patients into the same feature space. By training a machine learning classifier on the combined feature space, we predict the disease stage of patients two years after their baseline visits. We observed a considerable improvement in the prediction accuracy of Parkinson's dementia patients due to combined training on Alzheimer's and Parkinson's patients, thereby affirming the claim that these two diseases can be jointly studied. △ Less

Submitted 2 June, 2019; originally announced June 2019.

Comments: 7 pages

arXiv:1903.01500 [pdf, ps, other]

doi 10.3390/e21030243

Approximations of Shannon Mutual Information for Discrete Variables with Applications to Neural Population Coding

Authors: Wentao Huang, Kechen Zhang

Abstract: Although Shannon mutual information has been widely used, its effective calculation is often difficult for many practical problems, including those in neural population coding. Asymptotic formulas based on Fisher information sometimes provide accurate approximations to the mutual information but this approach is restricted to continuous variables because the calculation of Fisher information requi… ▽ More Although Shannon mutual information has been widely used, its effective calculation is often difficult for many practical problems, including those in neural population coding. Asymptotic formulas based on Fisher information sometimes provide accurate approximations to the mutual information but this approach is restricted to continuous variables because the calculation of Fisher information requires derivatives with respect to the encoded variables. In this paper, we consider information-theoretic bounds and approximations of the mutual information based on Kullback--Leibler divergence and Rényi divergence. We propose several information metrics to approximate Shannon mutual information in the context of neural population coding. While our asymptotic formulas all work for discrete variables, one of them has consistent performance and high accuracy regardless of whether the encoded variables are discrete or continuous. We performed numerical simulations and confirmed that our approximation formulas were highly accurate for approximating the mutual information between the stimuli and the responses of a large neural population. These approximation formulas may potentially bring convenience to the applications of information theory to many practical and theoretical problems. △ Less

Submitted 4 March, 2019; originally announced March 2019.

Comments: 31 pages, 6 figures

Journal ref: Entropy 2019, 21(3), 243

arXiv:1805.05001 [pdf]

Saikosaponins with similar structures but different mechanisms lead to combined hepatotoxicity

Authors: Qianqian Zhang, Wanqiu Huang, Yiqiao Gao, Yingtong Lv, Wei Zhang, Zunjian Zhang, Fengguo Xu

Abstract: Radix Bupleuri is a hepatoprotective traditional Chinese medicine (TCM) used for thousands of years in clinical, which was reported to be linked with liver damage. Previous studies have revealed that saikosaponins are the major types of components that contribute to the hepatotoxicity of Radix Bupleuri. However the underlying molecular mechanism is far from being understood. In order to clarify wh… ▽ More Radix Bupleuri is a hepatoprotective traditional Chinese medicine (TCM) used for thousands of years in clinical, which was reported to be linked with liver damage. Previous studies have revealed that saikosaponins are the major types of components that contribute to the hepatotoxicity of Radix Bupleuri. However the underlying molecular mechanism is far from being understood. In order to clarify whether these structural analogues exert toxicity effects through the same molecular targets, a systematic comparison study was done in this paper. The effects of SSa, b2, c, and d on isolated rat liver mitochondria and human hepatocyte L02 cells were explored, respectively. The collective results indicated that although saikosaponins share the similar structures but they have quite different mechanisms. SSb2 and SSd showed most serious damage on the function of mitochondria and survival rate of cell, respectively. SSb2 could cause mitochondrial permeability transition pore (mPTP) opening and collapse of mitochondrial membrane potential (ΔΨm) by impairing the mitochondrial respiratory chain complex III. While SSd destroyed plasma membrane and led to the release of lactate dehydrogenase (LDH) mainly through activating caspase-1. Furthermore, the combine index (CI) demonstrated that the combined hepatotoxicity of SSb2 and SSd could be additive. This finding might yield more in depth understanding of hepatotoxicity of Radix Bupleuri possess many different saikosaponins. △ Less

Submitted 13 May, 2018; originally announced May 2018.

arXiv:1801.05056 [pdf, ps, other]

Genetic robustness of let-7 miRNA sequence-structure pairs

Authors: Qijun He, Fenix W. Huang, Christopher Barrett, Christian M. Reidys

Abstract: Genetic robustness, the preservation of evolved phenotypes against genotypic mutations, is one of the central concepts in evolution. In recent years a large body of work has focused on the origins, mechanisms, and consequences of robustness in a wide range of biological systems. In particular, research on ncRNAs studied the ability of sequences to maintain folded structures against single-point mu… ▽ More Genetic robustness, the preservation of evolved phenotypes against genotypic mutations, is one of the central concepts in evolution. In recent years a large body of work has focused on the origins, mechanisms, and consequences of robustness in a wide range of biological systems. In particular, research on ncRNAs studied the ability of sequences to maintain folded structures against single-point mutations. In these studies, the structure is merely a reference. However, recent work revealed evidence that structure itself contributes to the genetic robustness of ncRNAs. We follow this line of thought and consider sequence-structure pairs as the unit of evolution and introduce the spectrum of inverse folding rates (IFR-spectrum) as a measurement of genetic robustness. Our analysis of the miRNA let-7 family captures key features of structure-modulated evolution and facilitates the study of robustness against multiple-point mutations. △ Less

Submitted 11 January, 2018; originally announced January 2018.

Comments: 9 pages 7 figures

MSC Class: 05A99

arXiv:1711.10549 [pdf, other]

An efficient dual sampling algorithm with Hamming distance filtration

Authors: Fenix W. Huang, Qijun He, Christopher Barrett, Christian M. Reidys

Abstract: Recently, a framework considering RNA sequences and their RNA secondary structures as pairs, led to some information-theoretic perspectives on how the semantics encoded in RNA sequences can be inferred. In this context, the pairing arises naturally from the energy model of RNA secondary structures. Fixing the sequence in the pairing produces the RNA energy landscape, whose partition function was d… ▽ More Recently, a framework considering RNA sequences and their RNA secondary structures as pairs, led to some information-theoretic perspectives on how the semantics encoded in RNA sequences can be inferred. In this context, the pairing arises naturally from the energy model of RNA secondary structures. Fixing the sequence in the pairing produces the RNA energy landscape, whose partition function was discovered by McCaskill. Dually, fixing the structure induces the energy landscape of sequences. The latter has been considered for designing more efficient inverse folding algorithms. We present here the Hamming distance filtered, dual partition function, together with a Boltzmann sampler using novel dynamic programming routines for the loop-based energy model. The time complexity of the algorithm is $O(h^2n)$, where $h,n$ are Hamming distance and sequence length, respectively, reducing the time complexity of samplers, reported in the literature by $O(n^2)$. We then present two applications, the first being in the context of the evolution of natural sequence-structure pairs of microRNAs and the second constructing neutral paths. The former studies the inverse fold rate (IFR) of sequence-structure pairs, filtered by Hamming distance, observing that such pairs evolve towards higher levels of robustness, i.e.,~increasing IFR. The latter is an algorithm that constructs neutral paths: given two sequences in a neutral network, we employ the sampler in order to construct short paths connecting them, consisting of sequences all contained in the neutral network. △ Less

Submitted 31 October, 2017; originally announced November 2017.

Comments: 8 pages 6 figures

MSC Class: 05-04

arXiv:1709.06563 [pdf]

The Impact of Speed and Bias on the Cognitive Processes of Experts and Novices in Medical Image Decision-making

Authors: Jennifer S. Trueblood, William R. Holmes, Adam C. Seegmiller, Jonathan Douds, Margaret Compton, Megan Woodruff, Wenrui Huang, Charles Stratton, Quentin Eichbaum

Abstract: Training individuals to make accurate decisions from medical images is a critical component of education in diagnostic pathology. We describe a joint experimental and computational modeling approach to examine the similarities and differences in the cognitive processes of novice participants and experienced participants (pathology residents and pathology faculty) in cancer cell image identificatio… ▽ More Training individuals to make accurate decisions from medical images is a critical component of education in diagnostic pathology. We describe a joint experimental and computational modeling approach to examine the similarities and differences in the cognitive processes of novice participants and experienced participants (pathology residents and pathology faculty) in cancer cell image identification. For this study we collected a bank of hundreds of digital images that were identified by cell type and classified by difficulty by a panel of expert hematopathologists. The key manipulations in our study included examining the speed-accuracy tradeoff as well as the impact of prior expectations on decisions. In addition, our study examined individual differences in decision-making by comparing task performance to domain general visual ability (as measured using the Novel Object Memory Test (NOMT) (Richler et al., 2017). Using Signal Detection Theory (SDT) and the Diffusion Decision Model (DDM), we found many similarities between expert and novices in our task. While experts tended to have better discriminability, the two groups responded similarly to time pressure (i.e., reduced caution under speed instructions in the DDM) and to the introduction of a probabilistic cue (i.e., increased response bias in the DDM). These results have important implications for training in this area as well as using novice participants in research on medical image perception and decision-making. △ Less

Submitted 18 May, 2018; v1 submitted 19 September, 2017; originally announced September 2017.

arXiv:1702.00493 [pdf]

Information-theoretic interpretation of tuning curves for multiple motion directions

Authors: Wentao Huang, Xin Huang, Kechen Zhang

Abstract: We have developed an efficient information-maximization method for computing the optimal shapes of tuning curves of sensory neurons by optimizing the parameters of the underlying feedforward network model. When applied to the problem of population coding of visual motion with multiple directions, our method yields several types of tuning curves with both symmetric and asymmetric shapes that resemb… ▽ More We have developed an efficient information-maximization method for computing the optimal shapes of tuning curves of sensory neurons by optimizing the parameters of the underlying feedforward network model. When applied to the problem of population coding of visual motion with multiple directions, our method yields several types of tuning curves with both symmetric and asymmetric shapes that resemble what have been found in the visual cortex. Our result suggests that the diversity or heterogeneity of tuning curve shapes as observed in neurophysiological experiment might actually constitute an optimal population representation of visual motions with multiple components. △ Less

Submitted 1 February, 2017; originally announced February 2017.

Comments: The 51st Annual Conference on Information Sciences and Systems (CISS), 2017

arXiv:1611.08751 [pdf, other]

Functional Alignment with Anatomical Networks is Associated with Cognitive Flexibility

Authors: John D. Medaglia, Weiyu Huang, Elisabeth A. Karuza, Sharon L. Thompson-Schill, Alejandro Ribeiro, Danielle S. Bassett

Abstract: Cognitive flexibility describes the human ability to switch between modes of mental function to achieve goals. Mental switching is accompanied by transient changes in brain activity, which must occur atop an anatomical architecture that bridges disparate cortical and subcortical regions by underlying white matter tracts. However, an integrated perspective regarding how white matter networks might… ▽ More Cognitive flexibility describes the human ability to switch between modes of mental function to achieve goals. Mental switching is accompanied by transient changes in brain activity, which must occur atop an anatomical architecture that bridges disparate cortical and subcortical regions by underlying white matter tracts. However, an integrated perspective regarding how white matter networks might constrain brain dynamics during cognitive processes requiring flexibility has remained elusive. To address this challenge, we applied emerging tools from graph signal processing to decompose BOLD signals based on diffusion imaging tractography in 28 individuals performing a perceptual task that probed cognitive flexibility. We found that the alignment between functional signals and the architecture of the underlying white matter network was associated with greater cognitive flexibility across subjects. Signals with behaviorally-relevant alignment were concentrated in the basal ganglia and anterior cingulate cortex, consistent with cortico-striatal mechanisms of cognitive flexibility. Importantly, these findings are not accessible to unimodal analyses of functional or anatomical neuroimaging alone. Instead, by taking a generalizable and concise reduction of multimodal neuroimaging data, we uncover an integrated structure-function driver of human behavior. △ Less

Submitted 26 November, 2016; originally announced November 2016.

arXiv:1611.01886 [pdf, other]

An Information-Theoretic Framework for Fast and Robust Unsupervised Learning via Neural Population Infomax

Authors: Wentao Huang, Kechen Zhang

Abstract: A framework is presented for unsupervised learning of representations based on infomax principle for large-scale neural populations. We use an asymptotic approximation to the Shannon's mutual information for a large neural population to demonstrate that a good initial approximation to the global information-theoretic optimum can be obtained by a hierarchical infomax method. Starting from the initi… ▽ More A framework is presented for unsupervised learning of representations based on infomax principle for large-scale neural populations. We use an asymptotic approximation to the Shannon's mutual information for a large neural population to demonstrate that a good initial approximation to the global information-theoretic optimum can be obtained by a hierarchical infomax method. Starting from the initial solution, an efficient algorithm based on gradient descent of the final objective function is proposed to learn representations from the input datasets, and the method works for complete, overcomplete, and undercomplete bases. As confirmed by numerical experiments, our method is robust and highly efficient for extracting salient features from input datasets. Compared with the main existing methods, our algorithm has a distinct advantage in both the training speed and the robustness of unsupervised representation learning. Furthermore, the proposed method is easily extended to the supervised or unsupervised model for training deep structure networks. △ Less

Submitted 10 March, 2017; v1 submitted 6 November, 2016; originally announced November 2016.

Comments: 25 pages, 7 figures, 5th International Conference on Learning Representations (ICLR 2017)

arXiv:1605.02628 [pdf, ps, other]

Topological language for RNA

Authors: Fenix W. D. Huang, Christian M. Reidys

Abstract: In this paper we introduce a novel, context-free grammar, {\it RNAFeatures$^*$}, capable of generating any RNA structure including pseudoknot structures (pk-structure). We represent pk-structures as orientable fatgraphs, which naturally leads to a filtration by their topological genus. Within this framework, RNA secondary structures correspond to pk-structures of genus zero. {\it RNAFeatures$^*$}… ▽ More In this paper we introduce a novel, context-free grammar, {\it RNAFeatures$^*$}, capable of generating any RNA structure including pseudoknot structures (pk-structure). We represent pk-structures as orientable fatgraphs, which naturally leads to a filtration by their topological genus. Within this framework, RNA secondary structures correspond to pk-structures of genus zero. {\it RNAFeatures$^*$} acts on formal, arc-labeled RNA secondary structures, called $λ$-structures. $λ$-structures correspond one-to-one to pk-structures together with some additional information. This information consists of the specific rearrangement of the backbone, by which a pk-structure can be made cross-free. {\it RNAFeatures$^*$} is an extension of the grammar for secondary structures and employs an enhancement by labelings of the symbols as well as the production rules. We discuss how to use {\it RNAFeatures$^*$} to obtain a stochastic context-free grammar for pk-structures, using data of RNA sequences and structures. The induced grammar facilitates fast Boltzmann sampling and statistical analysis. As a first application, we present an $O(n log(n))$ runtime algorithm which samples pk-structures based on ninety tRNA sequences and structures from the Nucleic Acid Database (NDB). △ Less

Submitted 9 May, 2016; originally announced May 2016.

Comments: 29 pages, 13 figures, 1 table

arXiv:1602.04773 [pdf, other]

doi 10.1098/rsos.160544

Competing metabolic strategies in a multilevel selection model

Authors: André Amado, Lenin Fernández, Weini Huang, Fernando F. Ferreira, Paulo R. A. Campos

Abstract: The interplay between energy efficiency and evolutionary mechanisms is addressed. One important question is how evolutionary mechanisms can select for the optimised usage of energy in situations where it does not lead to immediate advantage. For example, this problem is of great importance to improve our understanding about the major transition from unicellular to multicellular form of life. The i… ▽ More The interplay between energy efficiency and evolutionary mechanisms is addressed. One important question is how evolutionary mechanisms can select for the optimised usage of energy in situations where it does not lead to immediate advantage. For example, this problem is of great importance to improve our understanding about the major transition from unicellular to multicellular form of life. The immediate advantage of gathering efficient individuals in an energetic context is not clear. Although this process increases relatedness among individuals, it also increases local competition. To address this question, we propose a model of two competing metabolic strategies that makes explicit reference to the resource usage. We assume the existence of an efficient strain, which converts resource into energy at high efficiency but displays a low rate of resource consumption, and an inefficient strain, which consumes resource at a high rate with a low efficiency in converting it to energy. We explore the dynamics in both well-mixed and structured populations. The selection for optimised energy usage is measured by the likelihood of that an efficient strain can invade a population only comprised by inefficient strains. It is found that the region of the parameter space at which the efficient strain can thrive in structured populations is always larger than observed in well-mixed populations. In fact, in well-mixed populations the efficient strain is only evolutionarily stable in the domain whereupon there is no evolutionary dilemma. We also observe that small group sizes enhance the chance of invasion by the efficient strain in spite of increasing the competition among relatives. This outcome corroborates the key role played by kin selection and shows that the group dynamics relied on group expansion, overlap** generations and group split can balance the negative effects of local competition. △ Less

Submitted 15 February, 2016; originally announced February 2016.

Comments: 32 pages, 7 figures

arXiv:1601.07867 [pdf]

Brain network efficiency is influenced by pathological source of corticobasal syndrome

Authors: John D. Medaglia, Weiyu Huang, Santiago Segarra, Christopher Olm, James Gee, Murray Grossman, Alejandro Ribeiro, Corey T. McMillan, Danielle S. Bassett

Abstract: Multimodal neuroimaging studies of corticobasal syndrome using volumetric MRI and DTI successfully discriminate between Alzheimer's disease and frontotemporal lobar degeneration but this evidence has typically included clinically heterogeneous patient cohorts and has rarely assessed the network structure of these distinct sources of pathology. Using structural MRI data, we identify areas in fronto… ▽ More Multimodal neuroimaging studies of corticobasal syndrome using volumetric MRI and DTI successfully discriminate between Alzheimer's disease and frontotemporal lobar degeneration but this evidence has typically included clinically heterogeneous patient cohorts and has rarely assessed the network structure of these distinct sources of pathology. Using structural MRI data, we identify areas in fronto-temporo-parietal cortex with reduced gray matter density in corticobasal syndrome relative to age matched controls. A support vector machine procedure demonstrates that gray matter density poorly discriminates between frontotemporal lobar degeneration and Alzheimer's disease pathology subgroups with low sensitivity and specificity. In contrast, a statistic of local network efficiency demonstrates excellent discriminatory power, with high sensitivity and specificity. Our results indicate that the underlying pathological sources of corticobasal syndrome can be classified more accurately using graph theoretical statistics of white matter microstructure in association cortex than by regional gray matter density alone. These results highlight the importance of a multimodal neuroimaging approach to diagnostic analyses of corticobasal syndrome and suggest that distinct sources of pathology mediate the circuitry of brain regions affected by corticobasal syndrome. △ Less

Submitted 28 January, 2016; originally announced January 2016.

arXiv:1512.00037 [pdf, other]

doi 10.1109/JSTSP.2016.2600859

Graph Frequency Analysis of Brain Signals

Authors: Weiyu Huang, Leah Goldsberry, Nicholas F. Wymbs, Scott T. Grafton, Danielle S. Bassett, Alejandro Ribeiro

Abstract: This paper presents methods to analyze functional brain networks and signals from graph spectral perspectives. The notion of frequency and filters traditionally defined for signals supported on regular domains such as discrete time and image grids has been recently generalized to irregular graph domains, and defines brain graph frequencies associated with different levels of spatial smoothness acr… ▽ More This paper presents methods to analyze functional brain networks and signals from graph spectral perspectives. The notion of frequency and filters traditionally defined for signals supported on regular domains such as discrete time and image grids has been recently generalized to irregular graph domains, and defines brain graph frequencies associated with different levels of spatial smoothness across the brain regions. Brain network frequency also enables the decomposition of brain signals into pieces corresponding to smooth or rapid variations. We relate graph frequency with principal component analysis when the networks of interest denote functional connectivity. The methods are utilized to analyze brain networks and signals as subjects master a simple motor skill. We observe that brain signals corresponding to different graph frequencies exhibit different levels of adaptability throughout learning. Further, we notice a strong association between graph spectral properties of brain networks and the level of exposure to tasks performed, and recognize the most contributing and important frequency signatures at different task familiarity. △ Less

Submitted 3 May, 2016; v1 submitted 2 November, 2015; originally announced December 2015.

arXiv:1511.03141 [pdf, ps, other]

Sequence-structure relations of biopolymers

Authors: Christopher Barrett, Fenix W. Huang, Christian M. Reidys

Abstract: Motivation: DNA data is transcribed into single-stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence- and structure-information correlate. We view this correlation as structural semantics of sequence data that allows for a different interpretation than conventional sequence alignment. Structural semantics could enable us to identi… ▽ More Motivation: DNA data is transcribed into single-stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence- and structure-information correlate. We view this correlation as structural semantics of sequence data that allows for a different interpretation than conventional sequence alignment. Structural semantics could enable us to identify more general embedded "patterns" in DNA and RNA sequences. Results: We compute the partition function of sequences with respect to a fixed structure and connect this computation to the mutual information of a sequence-structure pair for RNA secondary structures. We present a Boltzmann sampler and obtain the a priori probability of specific sequence patterns. We present a detailed analysis for the three PDB-structures, 2JXV (hairpin), 2N3R (3-branch multi-loop) and 1EHZ (tRNA). We localize specific sequence patterns, contrast the energy spectrum of the Boltzmann sampled sequences versus those sequences that refold into the same structure and derive a criterion to identify native structures. We illustrate that there are multiple sequences in the partition function of a fixed structure, each having nearly the same mutual information, that are nevertheless poorly aligned. This indicates the possibility of the existence of relevant patterns embedded in the sequences that are not discoverable using alignments. △ Less

Submitted 22 August, 2016; v1 submitted 10 November, 2015; originally announced November 2015.

Comments: 8 pages, 13 figures

arXiv:1506.06572 [pdf, other]

Stochastic evolutionary games in dynamic populations

Authors: Weini Huang, Christoph Hauert, Arne Traulsen

Abstract: Frequency dependent selection and demographic fluctuations play important roles in evolutionary and ecological processes. Under frequency dependent selection, the average fitness of the population may increase or decrease based on interactions between individuals within the population. This should be reflected in fluctuations of the population size even in constant environments. Here, we propose a… ▽ More Frequency dependent selection and demographic fluctuations play important roles in evolutionary and ecological processes. Under frequency dependent selection, the average fitness of the population may increase or decrease based on interactions between individuals within the population. This should be reflected in fluctuations of the population size even in constant environments. Here, we propose a stochastic model, which naturally combines these two evolutionary ingredients by assuming frequency dependent competition between different types in an individual-based model. In contrast to previous game theoretic models, the carrying capacity of the population and thus the population size is determined by pairwise competition of individuals mediated by evolutionary games and demographic stochasticity. In the limit of infinite population size, the averaged stochastic dynamics is captured by the deterministic competitive Lotka-Volterra equations. In small populations, demographic stochasticity may instead lead to the extinction of the entire population. As the population size is driven by the fitness in evolutionary games, a population of cooperators is less prone to go extinct than a population of defectors, whereas in the usual systems of fixed size, the population would thrive regardless of its average payoff. △ Less

Submitted 22 June, 2015; originally announced June 2015.

arXiv:1403.2908 [pdf, ps, other]

Shapes of topological RNA structures

Authors: Fenix W. D. Huang, Christian M. Reidys

Abstract: A topological RNA structure is derived from a diagram and its shape is obtained by collapsing the stacks of the structure into single arcs and by removing any arcs of length one. Shapes contain key topological, information and for fixed topological genus there exist only finitely many such shapes. We shall express topological RNA structures as unicellular maps, i.e. graphs together with a cyclic o… ▽ More A topological RNA structure is derived from a diagram and its shape is obtained by collapsing the stacks of the structure into single arcs and by removing any arcs of length one. Shapes contain key topological, information and for fixed topological genus there exist only finitely many such shapes. We shall express topological RNA structures as unicellular maps, i.e. graphs together with a cyclic ordering of their half-edges. In this paper we prove a bijection of shapes of topological RNA structures. We furthermore derive a linear time algorithm generating shapes of fixed topological genus. We derive explicit expressions for the coefficients of the generating polynomial of these shapes and the generating function of RNA structures of genus $g$. Furthermore we outline how shapes can be used in order to extract essential information of RNA structure databases. △ Less

Submitted 11 March, 2014; originally announced March 2014.

Comments: 27 pages, 11 figures, 2 tables. arXiv admin note: text overlap with arXiv:1304.7397

MSC Class: 05C85

arXiv:1305.6760 [pdf]

SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads

Authors: Yinlong Xie, Gengxiong Wu, **gbo Tang, Ruibang Luo, Jordan Patterson, Shanlin Liu, Weihua Huang, Guangzhu He, Shengchang Gu, Shengkang Li, Xin Zhou, Tak-Wah Lam, Yingrui Li, Xun Xu, Gane Ka-Shu Wong, Jun Wang

Abstract: Motivation: Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining the sequences for a large number of genes from an organism with no reference genome. With the rapidly increasing throughputs and decreasing costs of next generation sequencing, RNA-Seq has gained in popularity; but given the typically short reads (e.g. 2 x 90 bp paired ends) of this techno… ▽ More Motivation: Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining the sequences for a large number of genes from an organism with no reference genome. With the rapidly increasing throughputs and decreasing costs of next generation sequencing, RNA-Seq has gained in popularity; but given the typically short reads (e.g. 2 x 90 bp paired ends) of this technol- ogy, de novo assembly to recover complete or full-length transcript sequences remains an algorithmic challenge. Results: We present SOAPdenovo-Trans, a de novo transcriptome assembler designed specifically for RNA-Seq. Its performance was evaluated on transcriptome datasets from rice and mouse. Using the known transcripts from these well-annotated genomes (sequenced a decade ago) as our benchmark, we assessed how SOAPdenovo- Trans and two other popular software handle the practical issues of alternative splicing and variable expression levels. Our conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy, and faster execution. Availability and Implementation: Source code and user manual are at http://sourceforge.net/projects/soapdenovotrans/ Contact: [email protected] or [email protected] △ Less

Submitted 9 August, 2013; v1 submitted 29 May, 2013; originally announced May 2013.

Comments: 7 pages, 4 figures, 3 tables

arXiv:1304.7397 [pdf, ps, other]

Uniform generation of RNA pseudoknot structures with genus filtration

Authors: Fenix W. D. Huang, Markus E. Nebel, Christian M. Reidys

Abstract: In this paper we present a sampling framework for RNA structures of fixed topological genus. We introduce a novel, linear time, uniform sampling algorithm for RNA structures of fixed topological genus $g$, for arbitrary $g>0$. Furthermore we develop a linear time sampling algorithm for RNA structures of fixed topological genus $g$ that are weighted by a simplified, loop-based energy functional. Fo… ▽ More In this paper we present a sampling framework for RNA structures of fixed topological genus. We introduce a novel, linear time, uniform sampling algorithm for RNA structures of fixed topological genus $g$, for arbitrary $g>0$. Furthermore we develop a linear time sampling algorithm for RNA structures of fixed topological genus $g$ that are weighted by a simplified, loop-based energy functional. For this process the partition function of the energy functional has to be computed once, which has $O(n^2)$ time complexity. △ Less

Submitted 27 April, 2013; originally announced April 2013.

Comments: 11 figures, 25 pages

MSC Class: 05C85

arXiv:1009.0310 [pdf]

Synthesis of Sugar and fixation of CO2 through Artificial Photosynthesis driving by Hydrogen or Electricity

Authors: Weidong Huang

Abstract: The overall process of photosynthesis consists of two main phases, the so-called light and dark eactions: light energy is absorbed by chlorophyll molecules and transferred to regenerate NADH and ATP, then drive Calvin-Benson cycle to synthesize sugar. In order to synthesize sugar through artificial photosynthesis, one of the key is to regenerate ATP economically and improve the efficiency of dark… ▽ More The overall process of photosynthesis consists of two main phases, the so-called light and dark eactions: light energy is absorbed by chlorophyll molecules and transferred to regenerate NADH and ATP, then drive Calvin-Benson cycle to synthesize sugar. In order to synthesize sugar through artificial photosynthesis, one of the key is to regenerate ATP economically and improve the efficiency of dark reactions. Here 9 kinds of dark reaction pathways are proposed, which only NADH is regenearated from hydrogen or electricity for driving, the efficiency of dark reactions is improved, combined with solar photovoltaic or solar hydrogen technology, the total efficiency of artificial photosynthesis can reach 30%, several ten times more than natural photosynthesis. One of them, to use chemical synthesis of formaldehyde from CO2 and H2, no NADH and ATP is need, to synthesize sugar efficiently through 9 enzymes, so it will be easier to produce in large scale, and the sugar will be a good energy carrier as the sugar can be efficiently converted to energy carrier hydrogen through enzymes. △ Less

Submitted 1 September, 2010; originally announced September 2010.

Comments: 15pages,7figures

arXiv:0908.0597 [pdf, ps, other]

Target prediction and a statistical sampling algorithm for RNA-RNA interaction

Authors: F. W. D. Huang, J. Qin, C. M. Reidys, P. F. Stadler

Abstract: It has been proven that the accessibility of the target sites has a critical influence for miRNA and siRNA. In this paper, we present a program, rip2.0, not only the energetically most favorable targets site based on the hybrid-probability, but also a statistical sampling structure to illustrate the statistical characterization and representation of the Boltzmann ensemble of RNA-RNA interaction… ▽ More It has been proven that the accessibility of the target sites has a critical influence for miRNA and siRNA. In this paper, we present a program, rip2.0, not only the energetically most favorable targets site based on the hybrid-probability, but also a statistical sampling structure to illustrate the statistical characterization and representation of the Boltzmann ensemble of RNA-RNA interaction structures. The outputs are retrieved via backtracing an improved dynamic programming solution for the partition function based on the approach of Huang et al. (Bioinformatics). The $O(N^6)$ time and $O(N^4)$ space algorithm is implemented in C (available from \url{http://www.combinatorics.cn/cbpc/rip2.html}) △ Less

Submitted 5 August, 2009; originally announced August 2009.

Comments: 7 pages, 10 figures

MSC Class: 05A16

arXiv:q-bio/0508006 [pdf]

Computational Fluid Dynamic Approach for Biological System Modeling

Authors: Weidong Huang, Chundu Wu, Bingjia Xiao, Weidong Xia

Abstract: Various biological system models have been proposed in systems biology, which are based on the complex biological reactions kinetic of various components. These models are not practical because we lack of kinetic information. In this paper, it is found that the enzymatic reaction and multi-order reaction rate is often controlled by the transport of the reactants in biological systems. A Computat… ▽ More Various biological system models have been proposed in systems biology, which are based on the complex biological reactions kinetic of various components. These models are not practical because we lack of kinetic information. In this paper, it is found that the enzymatic reaction and multi-order reaction rate is often controlled by the transport of the reactants in biological systems. A Computational Fluid Dynamic (CFD) approach, which is based on transport of the components and kinetics of biological reactions, is introduced for biological system modeling. We apply this approach to a biological wastewater treatment system for the study of metabolism of organic carbon substrates and the population of microbial. The results show that CFD model coupled with reaction kinetics is more accurate and more feasible than kinetic models for biological system modeling. △ Less

Submitted 2 August, 2005; originally announced August 2005.

Comments: 13 pages; 5 figures

Showing 1–46 of 46 results for author: Huang, W