Search | arXiv e-print repository

arXiv:2406.19755 [pdf, other]

Protein Representation Learning with Sequence Information Embedding: Does it Always Lead to a Better Performance?

Authors: Yang Tan, Lirong Zheng, Bozitao Zhong, Liang Hong, Bingxin Zhou

Abstract: Deep learning has become a crucial tool in studying proteins. While the significance of modeling protein structure has been discussed extensively in the literature, amino acid types are typically included in the input as a default operation for many inference tasks. This study demonstrates with structure alignment task that embedding amino acid types in some cases may not help a deep learning mode… ▽ More Deep learning has become a crucial tool in studying proteins. While the significance of modeling protein structure has been discussed extensively in the literature, amino acid types are typically included in the input as a default operation for many inference tasks. This study demonstrates with structure alignment task that embedding amino acid types in some cases may not help a deep learning model learn better representation. To this end, we propose ProtLOCA, a local geometry alignment method based solely on amino acid structure representation. The effectiveness of ProtLOCA is examined by a global structure-matching task on protein pairs with an independent test dataset based on CATH labels. Our method outperforms existing sequence- and structure-based representation learning methods by more quickly and accurately matching structurally consistent protein domains. Furthermore, in local structure pairing tasks, ProtLOCA for the first time provides a valid solution to highlight common local structures among proteins with different overall structures but the same function. This suggests a new possibility for using deep learning methods to analyze protein structure to infer function. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 8 pages, 4 figures

arXiv:2406.19744 [pdf, other]

ProtSolM: Protein Solubility Prediction with Multi-modal Features

Authors: Yang Tan, Jia Zheng, Liang Hong, Bingxin Zhou

Abstract: Understanding protein solubility is essential for their functional applications. Computational methods for predicting protein solubility are crucial for reducing experimental costs and enhancing the efficiency and success rates of protein engineering. Existing methods either construct a supervised learning scheme on small-scale datasets with manually processed physicochemical properties, or blindl… ▽ More Understanding protein solubility is essential for their functional applications. Computational methods for predicting protein solubility are crucial for reducing experimental costs and enhancing the efficiency and success rates of protein engineering. Existing methods either construct a supervised learning scheme on small-scale datasets with manually processed physicochemical properties, or blindly apply pre-trained protein language models to extract amino acid interaction information. The scale and quality of available training datasets leave significant room for improvement in terms of accuracy and generalization. To address these research gaps, we propose \sol, a novel deep learning method that combines pre-training and fine-tuning schemes for protein solubility prediction. ProtSolM integrates information from multiple dimensions, including physicochemical properties, amino acid sequences, and protein backbone structures. Our model is trained using \data, the largest solubility dataset that we have constructed. PDBSol includes over $60,000$ protein sequences and structures. We provide a comprehensive leaderboard of existing statistical learning and deep learning methods on independent datasets with computational and experimental labels. ProtSolM achieved state-of-the-art performance across various evaluation metrics, demonstrating its potential to significantly advance the accuracy of protein solubility prediction. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 10 pages, 7 figures, 9 tables

arXiv:2404.14850 [pdf, other]

Simple, Efficient and Scalable Structure-aware Adapter Boosts Protein Language Models

Authors: Yang Tan, Mingchen Li, Bingxin Zhou, Bozitao Zhong, Lirong Zheng, Pan Tan, Ziyi Zhou, Huiqun Yu, Guisheng Fan, Liang Hong

Abstract: Fine-tuning Pre-trained protein language models (PLMs) has emerged as a prominent strategy for enhancing downstream prediction tasks, often outperforming traditional supervised learning approaches. As a widely applied powerful technique in natural language processing, employing Parameter-Efficient Fine-Tuning techniques could potentially enhance the performance of PLMs. However, the direct transfe… ▽ More Fine-tuning Pre-trained protein language models (PLMs) has emerged as a prominent strategy for enhancing downstream prediction tasks, often outperforming traditional supervised learning approaches. As a widely applied powerful technique in natural language processing, employing Parameter-Efficient Fine-Tuning techniques could potentially enhance the performance of PLMs. However, the direct transfer to life science tasks is non-trivial due to the different training strategies and data forms. To address this gap, we introduce SES-Adapter, a simple, efficient, and scalable adapter method for enhancing the representation learning of PLMs. SES-Adapter incorporates PLM embeddings with structural sequence embeddings to create structure-aware representations. We show that the proposed method is compatible with different PLM architectures and across diverse tasks. Extensive evaluations are conducted on 2 types of folding structures with notable quality differences, 9 state-of-the-art baselines, and 9 benchmark datasets across distinct downstream tasks. Results show that compared to vanilla PLMs, SES-Adapter improves downstream task performance by a maximum of 11% and an average of 3%, with significantly accelerated training speed by a maximum of 1034% and an average of 362%, the convergence rate is also improved by approximately 2 times. Moreover, positive optimization is observed even with low-quality predicted structures. The source code for SES-Adapter is available at https://github.com/tyang816/SES-Adapter. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 30 pages, 4 figures, 8 tables

arXiv:2402.00548 [pdf]

Plant sesquiterpene lactones

Authors: Olivia Agatha, Daniela Mutwil-Anderwald, Jhing Yein Tan, Marek Mutwil

Abstract: Sesquiterpene lactones (STLs) are a prominent group of plant secondary metabolites predominantly found in the Asteraceae family and have multiple ecological roles and medicinal applications. This review describes the ecological significance of STLs, highlighting their roles in plant defense mechanisms against herbivory and as phytotoxins, alongside their function as environmental signaling molecul… ▽ More Sesquiterpene lactones (STLs) are a prominent group of plant secondary metabolites predominantly found in the Asteraceae family and have multiple ecological roles and medicinal applications. This review describes the ecological significance of STLs, highlighting their roles in plant defense mechanisms against herbivory and as phytotoxins, alongside their function as environmental signaling molecules. We also cover the substantial role of STLs in medicine and their mode of action in health and disease. We discuss the biosynthetic pathways and the various modifications that make STLs one of the most diverse groups of metabolites. Finally, we discuss methods in identifying and predicting STL biosynthesis pathways. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2311.09754 [pdf]

doi 10.1016/S2666-5247(23)00397-X

Efficacy of Wolbachia-mediated sterility to suppress dengue: a synthetic control study

Authors: Jue Tao Lim, Somya Bansal, Chee Seng Chong, Borame Dickens, Youming Ng, Lu Deng, Caleb Lee, Li Yun Tan, Grace Chain, Pei Ma, Shuzhen Sim, Cheong Huat Tan, Alex R Cook, Lee Ching Ng

Abstract: In a study conducted in Singapore, a country prone to dengue outbreaks due to its climate and urban population, researchers examined the effectiveness of releasing male Aedes aegypti mosquitoes infected with Wolbachia (wAlbB strain) to reduce dengue transmission. These infected males, when mating with wild-type females, produced non-viable eggs, leading to vector suppression. Extensive field trial… ▽ More In a study conducted in Singapore, a country prone to dengue outbreaks due to its climate and urban population, researchers examined the effectiveness of releasing male Aedes aegypti mosquitoes infected with Wolbachia (wAlbB strain) to reduce dengue transmission. These infected males, when mating with wild-type females, produced non-viable eggs, leading to vector suppression. Extensive field trials involving over 600,000 residents in four townships were conducted from 2018 to 2022. The results showed a 57% decline in total dengue incidence and a 64% decline in clustered dengue incidence. This approach offers promise for large-scale dengue control in regions facing rising dengue cases, providing a critical solution in combating the disease. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2310.17415 [pdf, other]

PETA: Evaluating the Impact of Protein Transfer Learning with Sub-word Tokenization on Downstream Applications

Authors: Yang Tan, Mingchen Li, Pan Tan, Ziyi Zhou, Huiqun Yu, Guisheng Fan, Liang Hong

Abstract: Large protein language models are adept at capturing the underlying evolutionary information in primary structures, offering significant practical value for protein engineering. Compared to natural language models, protein amino acid sequences have a smaller data volume and a limited combinatorial space. Choosing an appropriate vocabulary size to optimize the pre-trained model is a pivotal issue.… ▽ More Large protein language models are adept at capturing the underlying evolutionary information in primary structures, offering significant practical value for protein engineering. Compared to natural language models, protein amino acid sequences have a smaller data volume and a limited combinatorial space. Choosing an appropriate vocabulary size to optimize the pre-trained model is a pivotal issue. Moreover, despite the wealth of benchmarks and studies in the natural language community, there remains a lack of a comprehensive benchmark for systematically evaluating protein language model quality. Given these challenges, PETA trained language models with 14 different vocabulary sizes under three tokenization methods. It conducted thousands of tests on 33 diverse downstream datasets to assess the models' transfer learning capabilities, incorporating two classification heads and three random seeds to mitigate potential biases. Extensive experiments indicate that vocabulary sizes between 50 and 200 optimize the model, whereas sizes exceeding 800 detrimentally affect the model's representational performance. Our code, model weights and datasets are available at https://github.com/ginnm/ProteinPretraining. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 46 pages, 4figures, 9 tables

arXiv:2306.04899 [pdf, other]

Multi-level Protein Representation Learning for Blind Mutational Effect Prediction

Authors: Yang Tan, Bingxin Zhou, Yuanhong Jiang, Yu Guang Wang, Liang Hong

Abstract: Directed evolution plays an indispensable role in protein engineering that revises existing protein sequences to attain new or enhanced functions. Accurately predicting the effects of protein variants necessitates an in-depth understanding of protein structure and function. Although large self-supervised language models have demonstrated remarkable performance in zero-shot inference using only pro… ▽ More Directed evolution plays an indispensable role in protein engineering that revises existing protein sequences to attain new or enhanced functions. Accurately predicting the effects of protein variants necessitates an in-depth understanding of protein structure and function. Although large self-supervised language models have demonstrated remarkable performance in zero-shot inference using only protein sequences, these models inherently do not interpret the spatial characteristics of protein structures, which are crucial for comprehending protein folding stability and internal molecular interactions. This paper introduces a novel pre-training framework that cascades sequential and geometric analyzers for protein primary and tertiary structures. It guides mutational directions toward desired traits by simulating natural selection on wild-type proteins and evaluates the effects of variants based on their fitness to perform the function. We assess the proposed approach using a public database and two new databases for a variety of variant effect prediction tasks, which encompass a diverse set of proteins and assays from different taxa. The prediction results achieve state-of-the-art performance over other zero-shot learning methods for both single-site mutations and deep mutations. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2305.10699 [pdf, other]

Dirichlet Diffusion Score Model for Biological Sequence Generation

Authors: Pavel Avdeyev, Chenlai Shi, Yuhao Tan, Kseniia Dudnyk, Jian Zhou

Abstract: Designing biological sequences is an important challenge that requires satisfying complex constraints and thus is a natural problem to address with deep generative modeling. Diffusion generative models have achieved considerable success in many applications. Score-based generative stochastic differential equations (SDE) model is a continuous-time diffusion model framework that enjoys many benefits… ▽ More Designing biological sequences is an important challenge that requires satisfying complex constraints and thus is a natural problem to address with deep generative modeling. Diffusion generative models have achieved considerable success in many applications. Score-based generative stochastic differential equations (SDE) model is a continuous-time diffusion model framework that enjoys many benefits, but the originally proposed SDEs are not naturally designed for modeling discrete data. To develop generative SDE models for discrete data such as biological sequences, here we introduce a diffusion process defined in the probability simplex space with stationary distribution being the Dirichlet distribution. This makes diffusion in continuous space natural for modeling discrete data. We refer to this approach as Dirchlet diffusion score model. We demonstrate that this technique can generate samples that satisfy hard constraints using a Sudoku generation task. This generative model can also solve Sudoku, including hard puzzles, without additional training. Finally, we applied this approach to develop the first human promoter DNA sequence design model and showed that designed sequences share similar properties with natural promoter sequences. △ Less

Submitted 16 June, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: ICML 2023

arXiv:2304.01347 [pdf]

Temporal Dynamic Synchronous Functional Brain Network for Schizophrenia Diagnosis and Lateralization Analysis

Authors: Cheng Zhu, Ying Tan, Shuqi Yang, Jiaqing Miao, Jiayi Zhu, Huan Huang, Dezhong Yao, Cheng Luo

Abstract: The available evidence suggests that dynamic functional connectivity (dFC) can capture time-varying abnormalities in brain activity in resting-state cerebral functional magnetic resonance imaging (rs-fMRI) data and has a natural advantage in uncovering mechanisms of abnormal brain activity in schizophrenia(SZ) patients. Hence, an advanced dynamic brain network analysis model called the temporal br… ▽ More The available evidence suggests that dynamic functional connectivity (dFC) can capture time-varying abnormalities in brain activity in resting-state cerebral functional magnetic resonance imaging (rs-fMRI) data and has a natural advantage in uncovering mechanisms of abnormal brain activity in schizophrenia(SZ) patients. Hence, an advanced dynamic brain network analysis model called the temporal brain category graph convolutional network (Temporal-BCGCN) was employed. Firstly, a unique dynamic brain network analysis module, DSF-BrainNet, was designed to construct dynamic synchronization features. Subsequently, a revolutionary graph convolution method, TemporalConv, was proposed, based on the synchronous temporal properties of feature. Finally, the first modular abnormal hemispherical lateralization test tool in deep learning based on rs-fMRI data, named CategoryPool, was proposed. This study was validated on COBRE and UCLA datasets and achieved 83.62% and 89.71% average accuracies, respectively, outperforming the baseline model and other state-of-the-art methods. The ablation results also demonstrate the advantages of TemporalConv over the traditional edge feature graph convolution approach and the improvement of CategoryPool over the classical graph pooling approach. Interestingly, this study showed that the lower order perceptual system and higher order network regions in the left hemisphere are more severely dysfunctional than in the right hemisphere in SZ and reaffirms the importance of the left medial superior frontal gyrus in SZ. Our core code is available at: https://github.com/swfen/Temporal-BCGCN. △ Less

Submitted 11 September, 2023; v1 submitted 30 March, 2023; originally announced April 2023.

arXiv:2212.05316 [pdf, other]

Graph-Regularized Manifold-Aware Conditional Wasserstein GAN for Brain Functional Connectivity Generation

Authors: Yee-Fan Tan, Chee-Ming Ting, Fuad Noman, Raphaël C. -W. Phan, Hernando Ombao

Abstract: Common measures of brain functional connectivity (FC) including covariance and correlation matrices are semi-positive definite (SPD) matrices residing on a cone-shape Riemannian manifold. Despite its remarkable success for Euclidean-valued data generation, use of standard generative adversarial networks (GANs) to generate manifold-valued FC data neglects its inherent SPD structure and hence the in… ▽ More Common measures of brain functional connectivity (FC) including covariance and correlation matrices are semi-positive definite (SPD) matrices residing on a cone-shape Riemannian manifold. Despite its remarkable success for Euclidean-valued data generation, use of standard generative adversarial networks (GANs) to generate manifold-valued FC data neglects its inherent SPD structure and hence the inter-relatedness of edges in real FC. We propose a novel graph-regularized manifold-aware conditional Wasserstein GAN (GR-SPD-GAN) for FC data generation on the SPD manifold that can preserve the global FC structure. Specifically, we optimize a generalized Wasserstein distance between the real and generated SPD data under an adversarial training, conditioned on the class labels. The resulting generator can synthesize new SPD-valued FC matrices associated with different classes of brain networks, e.g., brain disorder or healthy control. Furthermore, we introduce additional population graph-based regularization terms on both the SPD manifold and its tangent space to encourage the generator to respect the inter-subject similarity of FC patterns in the real data. This also helps in avoiding mode collapse and produces more stable GAN training. Evaluated on resting-state functional magnetic resonance imaging (fMRI) data of major depressive disorder (MDD), qualitative and quantitative results show that the proposed GR-SPD-GAN clearly outperforms several state-of-the-art GANs in generating more realistic fMRI-based FC samples. When applied to FC data augmentation for MDD identification, classification models trained on augmented data generated by our approach achieved the largest margin of improvement in classification accuracy among the competing GANs over baselines without data augmentation. △ Less

Submitted 10 December, 2022; originally announced December 2022.

Comments: 10 pages, 4 figures

arXiv:2208.11517 [pdf, other]

EpiGNN: Exploring Spatial Transmission with Graph Neural Network for Regional Epidemic Forecasting

Authors: Feng Xie, Zhong Zhang, Liang Li, Bin Zhou, Yusong Tan

Abstract: Epidemic forecasting is the key to effective control of epidemic transmission and helps the world mitigate the crisis that threatens public health. To better understand the transmission and evolution of epidemics, we propose EpiGNN, a graph neural network-based model for epidemic forecasting. Specifically, we design a transmission risk encoding module to characterize local and global spatial effec… ▽ More Epidemic forecasting is the key to effective control of epidemic transmission and helps the world mitigate the crisis that threatens public health. To better understand the transmission and evolution of epidemics, we propose EpiGNN, a graph neural network-based model for epidemic forecasting. Specifically, we design a transmission risk encoding module to characterize local and global spatial effects of regions in epidemic processes and incorporate them into the model. Meanwhile, we develop a Region-Aware Graph Learner (RAGL) that takes transmission risk, geographical dependencies, and temporal information into account to better explore spatial-temporal dependencies and makes regions aware of related regions' epidemic situations. The RAGL can also combine with external resources, such as human mobility, to further improve prediction performance. Comprehensive experiments on five real-world epidemic-related datasets (including influenza and COVID-19) demonstrate the effectiveness of our proposed method and show that EpiGNN outperforms state-of-the-art baselines by 9.48% in RMSE. △ Less

Submitted 23 August, 2022; originally announced August 2022.

Comments: 16 pages, 6 figures, ECML-PKDD2022

arXiv:2208.11515 [pdf, other]

Inter- and Intra-Series Embeddings Fusion Network for Epidemiological Forecasting

Authors: Feng Xie, Zhong Zhang, Xuechen Zhao, Bin Zhou, Yusong Tan

Abstract: The accurate forecasting of infectious epidemic diseases is the key to effective control of the epidemic situation in a region. Most existing methods ignore potential dynamic dependencies between regions or the importance of temporal dependencies and inter-dependencies between regions for prediction. In this paper, we propose an Inter- and Intra-Series Embeddings Fusion Network (SEFNet) to improve… ▽ More The accurate forecasting of infectious epidemic diseases is the key to effective control of the epidemic situation in a region. Most existing methods ignore potential dynamic dependencies between regions or the importance of temporal dependencies and inter-dependencies between regions for prediction. In this paper, we propose an Inter- and Intra-Series Embeddings Fusion Network (SEFNet) to improve epidemic prediction performance. SEFNet consists of two parallel modules, named Inter-Series Embedding Module and Intra-Series Embedding Module. In Inter-Series Embedding Module, a multi-scale unified convolution component called Region-Aware Convolution is proposed, which cooperates with self-attention to capture dynamic dependencies between time series obtained from multiple regions. The Intra-Series Embedding Module uses Long Short-Term Memory to capture temporal relationships within each time series. Subsequently, we learn the influence degree of two embeddings and fuse them with the parametric-matrix fusion method. To further improve the robustness, SEFNet also integrates a traditional autoregressive component in parallel with nonlinear neural networks. Experiments on four real-world epidemic-related datasets show SEFNet is effective and outperforms state-of-the-art baselines. △ Less

Submitted 23 August, 2022; originally announced August 2022.

Comments: 6 pages, 5 figures, SEKE2022

arXiv:2106.13541 [pdf]

doi 10.1093/cercor/bhab456

A nonlinear hidden layer enables actor-critic agents to learn multiple paired association navigation

Authors: M Ganesh Kumar, Cheston Tan, Camilo Libedinsky, Shih-Cheng Yen, Andrew Yong-Yi Tan

Abstract: Navigation to multiple cued reward locations has been increasingly used to study rodent learning. Though deep reinforcement learning agents have been shown to be able to learn the task, they are not biologically plausible. Biologically plausible classic actor-critic agents have been shown to learn to navigate to single reward locations, but which biologically plausible agents are able to learn mul… ▽ More Navigation to multiple cued reward locations has been increasingly used to study rodent learning. Though deep reinforcement learning agents have been shown to be able to learn the task, they are not biologically plausible. Biologically plausible classic actor-critic agents have been shown to learn to navigate to single reward locations, but which biologically plausible agents are able to learn multiple cue-reward location tasks has remained unclear. In this computational study, we show versions of classic agents that learn to navigate to a single reward location, and adapt to reward location displacement, but are not able to learn multiple paired association navigation. The limitation is overcome by an agent in which place cell and cue information are first processed by a feedforward nonlinear hidden layer with synapses to the actor and critic subject to temporal difference error-modulated plasticity. Faster learning is obtained when the feedforward layer is replaced by a recurrent reservoir network. △ Less

Submitted 15 July, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: 31 pages, 8 figures. Acknowledgements revised

Journal ref: Cerebral Cortex, 2022;, bhab456

arXiv:2106.10109 [pdf, other]

A Weak Monotonicity Based Muscle Fatigue Detection Algorithm for a Short-Duration Poor Posture Using sEMG Measurements

Authors: Xinliang Guo, Lei Lu, Mark Robinson, Ying Tan, Kusal Goonewardena, Denny Oetomo

Abstract: Muscle fatigue is usually defined as a decrease in the ability to produce force. The surface electromyography (sEMG) signals have been widely used to provide information about muscle activities including detecting muscle fatigue by various data-driven techniques such as machine learning and statistical approaches. However, it is well-known that sEMG signals are weak signals (low amplitude of the s… ▽ More Muscle fatigue is usually defined as a decrease in the ability to produce force. The surface electromyography (sEMG) signals have been widely used to provide information about muscle activities including detecting muscle fatigue by various data-driven techniques such as machine learning and statistical approaches. However, it is well-known that sEMG signals are weak signals (low amplitude of the signals) with a low signal-to-noise ratio, data-driven techniques cannot work well when the quality of the data is poor. In particular, the existing methods are unable to detect muscle fatigue coming from static poses. This work exploits the concept of weak monotonicity, which has been observed in the process of fatigue, to robustly detect muscle fatigue in the presence of measurement noises and human variations. Such a population trend methodology has shown its potential in muscle fatigue detection as demonstrated by the experiment of a static pose. △ Less

Submitted 18 June, 2021; originally announced June 2021.

Comments: 4 pages, 4 figures. This work has been submitted to the IEEE EMBC for possible publication

arXiv:2106.07919 [pdf, other]

A stochastic metapopulation state-space approach to modeling and estimating Covid-19 spread

Authors: Yukun Tan, Durward Cator III, Martial Ndeffo-Mbah, Ulisses Braga-Neto

Abstract: Mathematical models are widely recognized as an important tool for analyzing and understanding the dynamics of infectious disease outbreaks, predict their future trends, and evaluate public health intervention measures for disease control and elimination. We propose a novel stochastic metapopulation state-space model for COVID-19 transmission, based on a discrete-time spatio-temporal susceptible/e… ▽ More Mathematical models are widely recognized as an important tool for analyzing and understanding the dynamics of infectious disease outbreaks, predict their future trends, and evaluate public health intervention measures for disease control and elimination. We propose a novel stochastic metapopulation state-space model for COVID-19 transmission, based on a discrete-time spatio-temporal susceptible/exposed/infected/recovered/deceased (SEIRD) model. The proposed framework allows the hidden SEIRD states and unknown transmission parameters to be estimated from noisy, incomplete time series of reported epidemiological data, by application of unscented Kalman filtering (UKF), maximum-likelihood adaptive filtering, and metaheuristic optimization. Experiments using both synthetic data and real data from the Fall 2020 Covid-19 wave in the state of Texas demonstrate the effectiveness of the proposed model. △ Less

Submitted 15 June, 2021; originally announced June 2021.

Comments: 17 pages, 5 figures

arXiv:2106.03580 [pdf]

One-shot learning of paired association navigation with biologically plausible schemas

Authors: M Ganesh Kumar, Cheston Tan, Camilo Libedinsky, Shih-Cheng Yen, Andrew Yong-Yi Tan

Abstract: Schemas are knowledge structures that can enable rapid learning. Rodent one-shot learning in a multiple paired association navigation task has been postulated to be schema-dependent. But how schemas, conceptualized at Marr's computational level, correspond with neural implementations remains poorly understood, and a biologically plausible computational model of the rodent learning has not been dem… ▽ More Schemas are knowledge structures that can enable rapid learning. Rodent one-shot learning in a multiple paired association navigation task has been postulated to be schema-dependent. But how schemas, conceptualized at Marr's computational level, correspond with neural implementations remains poorly understood, and a biologically plausible computational model of the rodent learning has not been demonstrated. Here, we compose such an agent from schemas with biologically plausible neural implementations. The agent contains an associative memory that can form one-shot associations between sensory cues and goal coordinates, implemented with a feedforward layer or a reservoir of recurrently connected neurons whose plastic output weights are governed by a novel 4-factor reward-modulated Exploratory Hebbian (EH) rule. Adding an actor-critic allows the agent to succeed even if an obstacle prevents direct heading. With the addition of working memory, the rodent behavior is replicated. Temporal-difference learning of a working memory gating mechanism enables one-shot learning despite distractors. △ Less

Submitted 27 August, 2023; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: Minor revisions from version 2 preprint

arXiv:2104.12187 [pdf, ps, other]

Frequency Superposition -- A Multi-Frequency Stimulation Method in SSVEP-based BCIs

Authors: **g Mu, David B. Grayden, Ying Tan, Denny Oetomo

Abstract: The steady-state visual evoked potential (SSVEP) is one of the most widely used modalities in brain-computer interfaces (BCIs) due to its many advantages. However, the existence of harmonics and the limited range of responsive frequencies in SSVEP make it challenging to further expand the number of targets without sacrificing other aspects of the interface or putting additional constraints on the… ▽ More The steady-state visual evoked potential (SSVEP) is one of the most widely used modalities in brain-computer interfaces (BCIs) due to its many advantages. However, the existence of harmonics and the limited range of responsive frequencies in SSVEP make it challenging to further expand the number of targets without sacrificing other aspects of the interface or putting additional constraints on the system. This paper introduces a novel multi-frequency stimulation method for SSVEP and investigates its potential to effectively and efficiently increase the number of targets presented. The proposed stimulation method, obtained by the superposition of the stimulation signals at different frequencies, is size-efficient, allows single-step target identification, puts no strict constraints on the usable frequency range, can be suited to self-paced BCIs, and does not require specific light sources. In addition to the stimulus frequencies and their harmonics, the evoked SSVEP waveforms include frequencies that are integer linear combinations of the stimulus frequencies. Results of decoding SSVEPs collected from nine subjects using canonical correlation analysis (CCA) with only the frequencies and harmonics as reference, also demonstrate the potential of using such a stimulation paradigm in SSVEP-based BCIs. △ Less

Submitted 11 August, 2021; v1 submitted 25 April, 2021; originally announced April 2021.

Comments: 4 pages, 5 figures. This work has been accepted for publication in the 2021 IEEE EMBC

arXiv:2104.02825 [pdf]

Fluorescence-Enhanced Mid-Infrared Photothermal Microscopy

Authors: Yi Zhang, Haonan Zong, Cheng Zong, Yuying Tan, Meng Zhang, Yuewei Zhan, Ji-Xin Cheng

Abstract: Mid-infrared photothermal microscopy is a new chemical imaging technology in which a visible beam senses the photothermal effect induced by a pulsed infrared laser. This technology provides infrared spectroscopic information at sub-micron spatial resolution and enables infrared spectroscopy and imaging of living cells and organisms. Yet, current mid-infrared photothermal imaging sensitivity suffer… ▽ More Mid-infrared photothermal microscopy is a new chemical imaging technology in which a visible beam senses the photothermal effect induced by a pulsed infrared laser. This technology provides infrared spectroscopic information at sub-micron spatial resolution and enables infrared spectroscopy and imaging of living cells and organisms. Yet, current mid-infrared photothermal imaging sensitivity suffers from a weak dependance of scattering on temperature and the image quality is vulnerable to the speckles caused by scattering. Here, we present a novel version of mid-infrared photothermal microscopy in which thermo-sensitive fluorescent probes are harnessed to sense the mid-infrared photothermal effect. The fluorescence intensity can be modulated at the level of 1% per Kelvin, which is 100 times larger than the modulation of scattering intensity. In addition, fluorescence emission is free of speckles, thus much improving the image quality. Moreover, fluorophores can target specific organelles or biomolecules, thus augmenting the specificity of photothermal imaging. Spectral fidelity is confirmed through fingerprinting a single bacterium. Finally, the photobleaching issue is successfully addressed through the development of a wide-field fluorescence-enhanced mid-infrared photothermal microscope which allows video rate bond-selective imaging of biological specimens. △ Less

Submitted 6 April, 2021; originally announced April 2021.

arXiv:2011.05861 [pdf, ps, other]

Multi-Frequency Canonical Correlation Analysis (MFCCA): A Generalised Decoding Algorithm for Multi-Frequency SSVEP

Authors: **g Mu, Ying Tan, David B. Grayden, Denny Oetomo

Abstract: Stimulation methods that utilise more than one stimulation frequency have been developed for steady-state visual evoked potential (SSVEP) brain-computer interfaces (BCIs) with the purpose of increasing the number of targets that can be presented simultaneously. However, there is no unified decoding algorithm that can be used without training for each individual users or cases, and applied to a lar… ▽ More Stimulation methods that utilise more than one stimulation frequency have been developed for steady-state visual evoked potential (SSVEP) brain-computer interfaces (BCIs) with the purpose of increasing the number of targets that can be presented simultaneously. However, there is no unified decoding algorithm that can be used without training for each individual users or cases, and applied to a large class of multi-frequency stimulated SSVEP settings. This paper extends the widely used canonical correlation analysis (CCA) decoder to explicitly accommodate multi-frequency SSVEP by exploiting the interactions between the multiple stimulation frequencies. A concept of order, defined as the sum of absolute value of the coefficients in the linear combination of the input frequencies, was introduced to assist the design of Multi-Frequency CCA (MFCCA). The probability distribution of the order in the resulting SSVEP response was then used to improve decoding accuracy. Results show that, compared to the standard CCA formulation, the proposed MFCCA has a 20% improvement in decoding accuracy on average at order 2, while kee** its generality and training-free characteristics. △ Less

Submitted 11 August, 2021; v1 submitted 27 October, 2020; originally announced November 2020.

Comments: 4 pages, 6 figures. This work has been accepted for publication in the 2021 IEEE EMBC

arXiv:1912.08340 [pdf]

Multiplex stimulated Raman scattering imaging cytometry reveals cancer metabolic signatures in a spatially, temporally, and spectrally resolved manner

Authors: Kai-Chih Huang, Junjie Li, Chi Zhang, Yuying Tan, Ji-Xin Cheng

Abstract: In situ measurement of cellular metabolites is still a challenge in biology. Conventional methods, such as mass spectrometry or fluorescence microscopy, would either destruct the sample or introduce strong perturbations to the functions of target molecules. Here, we present multiplex stimulated Raman scattering (SRS) imaging cytometry as a label-free single-cell analysis platform with chemical spe… ▽ More In situ measurement of cellular metabolites is still a challenge in biology. Conventional methods, such as mass spectrometry or fluorescence microscopy, would either destruct the sample or introduce strong perturbations to the functions of target molecules. Here, we present multiplex stimulated Raman scattering (SRS) imaging cytometry as a label-free single-cell analysis platform with chemical specifity, and high-throughput capabilities. Cellular compartments such as lipid droplets, endoplasmic reticulum, and nuclei are seperated from the cytoplasm. Based on these chemical segmentations, 260 features from both morphology and molecular composition were generated and analyzed for each cell. Using SRS imaging cytometry, we studied the metabolic responses of human pancreatic cancer cells under stress by starvation and chemotherapy drug treatments. We unveiled lipid-facilitated protrusion as a metabolic marker for stress-resistant cancer cells through statistical analysis of thousands of cells. Our findings also demonstrate the potential of targeting lipid metabolism for selective treatment of starvation-resistant and chemotherapy-resistant cancers. These results highlight our SRS imaging cytometry as a powerful label-free tool for biological discoveries with a high-throughput, high-content capacity. △ Less

Submitted 17 December, 2019; originally announced December 2019.

Comments: 42 pages, 21 figures

arXiv:1905.07680 [pdf]

doi 10.1371/journal.pcbi.1006222

Predicting 3D structure and stability of RNA pseudoknots in monovalent and divalent ion solutions

Authors: Ya-Zhou Shi, Lei **, Chen-Jie Feng, Ya-Lan Tan, Zhi-Jie Tan

Abstract: RNA pseudoknots are a kind of minimal RNA tertiary structural motifs, and their three-dimensional (3D) structures and stability play essential roles in a variety of biological functions. Therefore, to predict 3D structures and stability of RNA pseudoknots is essential for understanding their functions. In the work, we employed our previously developed coarse-grained model with implicit salt to mak… ▽ More RNA pseudoknots are a kind of minimal RNA tertiary structural motifs, and their three-dimensional (3D) structures and stability play essential roles in a variety of biological functions. Therefore, to predict 3D structures and stability of RNA pseudoknots is essential for understanding their functions. In the work, we employed our previously developed coarse-grained model with implicit salt to make extensive predictions and comprehensive analyses on the 3D structures and stability for RNA pseudoknots in monovalent/divalent ion solutions. The comparisons with available experimental data show that our model can successfully predict the 3D structures of RNA pseudoknots from their sequences, and can also make reliable predictions for the stability of RNA pseudoknots with different lengths and sequences over a wide range of monovalent/divalent ion concentrations. Furthermore, we made comprehensive analyses on the unfolding pathway for various RNA pseudoknots in ion solutions. Our analyses for extensive pseudokonts and the wide range of monovalent/divalent ion concentrations verify that the unfolding pathway of RNA pseudoknots is mainly dependent on the relative stability of unfolded intermediate states, and show that the unfolding pathway of RNA pseudoknots can be significantly modulated by their sequences and solution ion conditions. △ Less

Submitted 18 May, 2019; originally announced May 2019.

Comments: 23 pages, 8 figures

Journal ref: PLOS Computational Biology, 14(6): e1006222, 2018

arXiv:1902.03429 [pdf]

Clustering Bioactive Molecules in 3D Chemical Space with Unsupervised Deep Learning

Authors: Chu Qin, Ying Tan, Shang Ying Chen, Xian Zeng, Xingxing Qi, Tian **, Huan Shi, Yiwei Wan, Yu Chen, **gfeng Li, Weidong He, Yali Wang, Peng Zhang, Feng Zhu, Hong** Zhao, Yuyang Jiang, Yuzong Chen

Abstract: Unsupervised clustering has broad applications in data stratification, pattern investigation and new discovery beyond existing knowledge. In particular, clustering of bioactive molecules facilitates chemical space map**, structure-activity studies, and drug discovery. These tasks, conventionally conducted by similarity-based methods, are complicated by data complexity and diversity. We ex-plored… ▽ More Unsupervised clustering has broad applications in data stratification, pattern investigation and new discovery beyond existing knowledge. In particular, clustering of bioactive molecules facilitates chemical space map**, structure-activity studies, and drug discovery. These tasks, conventionally conducted by similarity-based methods, are complicated by data complexity and diversity. We ex-plored the superior learning capability of deep autoencoders for unsupervised clustering of 1.39 mil-lion bioactive molecules into band-clusters in a 3-dimensional latent chemical space. These band-clusters, displayed by a space-navigation simulation software, band molecules of selected bioactivity classes into individual band-clusters possessing unique sets of common sub-structural features beyond structural similarity. These sub-structural features form the frameworks of the literature-reported pharmacophores and privileged fragments. Within each band-cluster, molecules are further banded into selected sub-regions with respect to their bioactivity target, sub-structural features and molecular scaffolds. Our method is potentially applicable for big data clustering tasks of different fields. △ Less

Submitted 9 February, 2019; originally announced February 2019.

arXiv:1710.02346 [pdf, other]

doi 10.1103/PhysRevE.96.052401

Particle transport across a channel via an oscillating potential

Authors: Yizhou Tan, Leonardo Dagdug, Jannes Gladrow, Ulrich F. Keyser, Stefano Pagliara

Abstract: Membrane protein transporters alternate their substrate-binding sites between the extracellular and cytosolic side of the membrane according to the alternating access mechanism. Inspired by this intriguing mechanism devised by nature, we study particle transport through a channel coupled with an energy well that oscillates its position between the two entrances of the channel. We optimize particle… ▽ More Membrane protein transporters alternate their substrate-binding sites between the extracellular and cytosolic side of the membrane according to the alternating access mechanism. Inspired by this intriguing mechanism devised by nature, we study particle transport through a channel coupled with an energy well that oscillates its position between the two entrances of the channel. We optimize particle transport across the channel by adjusting the oscillation frequency. At the optimal oscillation frequency, the translocation rate through the channel is a hundred times higher with respect to free diffusion across the channel. Our findings reveal the effect of time dependent potentials on particle transport across a channel and will be relevant for membrane transport and microfluidics application. △ Less

Submitted 6 October, 2017; originally announced October 2017.

Journal ref: Phys. Rev. E 96, 052401 (2017)

arXiv:q-bio/0607036 [pdf]

doi 10.1016/j.neuroscience.2007.01.019

Unbalanced synaptic inhibition can create intensity-tuned auditory cortex neurons

Authors: Andrew Y. Y. Tan, Craig A. Atencio, Daniel B. Polley, Michael M. Merzenich, Christoph E. Schreiner

Abstract: Intensity-tuned auditory cortex neurons may be formed by intensity-tuned synaptic excitation. Synaptic inhibition has also been shown to enhance, and possibly even create intensity-tuned neurons. Here we show, using in vivo whole cell recordings in pentobarbital-anesthetized rats, that some intensity-tuned neurons are indeed created solely through disproportionally large inhibition at high inten… ▽ More Intensity-tuned auditory cortex neurons may be formed by intensity-tuned synaptic excitation. Synaptic inhibition has also been shown to enhance, and possibly even create intensity-tuned neurons. Here we show, using in vivo whole cell recordings in pentobarbital-anesthetized rats, that some intensity-tuned neurons are indeed created solely through disproportionally large inhibition at high intensities, without any intensity-tuned excitation. Since inhibition is essentially cortical in origin, these neurons provide examples of auditory feature-selectivity arising de novo at the cortex. △ Less

Submitted 21 July, 2006; originally announced July 2006.

Comments: 22 pages, 5 figures

Journal ref: Neuroscience 146: 449-462 (2007)

Showing 1–24 of 24 results for author: Tan, Y