-
Transmission IR Microscopy for the Quantitation of Biomolecular Mass In Live Cells
Authors:
Yow-Ren Chang,
Seong-Min Kim,
Young Jong Lee
Abstract:
Absolute quantity imaging of biomolecules on a single cell level is critical for measurement assurance in biosciences and bioindustries. While infrared (IR) transmission microscopy is a powerful label-free imaging modality capable of chemical quantification, its applicability to hydrated biological samples remains challenging due to the strong water absorption. We overcome this challenge by applyi…
▽ More
Absolute quantity imaging of biomolecules on a single cell level is critical for measurement assurance in biosciences and bioindustries. While infrared (IR) transmission microscopy is a powerful label-free imaging modality capable of chemical quantification, its applicability to hydrated biological samples remains challenging due to the strong water absorption. We overcome this challenge by applying a solvent absorption compensation (SAC) technique to a home-built quantum cascade laser IR microscope. SAC-IR microscopy improves the chemical sensitivity considerably by adjusting the incident light intensity to pre-compensate the IR absorption by water while retaining the full dynamic range. We demonstrate the label-free chemical imaging of key biomolecules of a cell, such as protein, fatty acid, and nucleic acid, with sub-cellular spatial resolution. By imaging live fibroblast cells over twelve hours, we monitor the mass change of the three molecular species of single cells at various phases, including cell division. While the current live-cell imaging demonstration involved three wavenumbers, more wavenumber images could measure more biomolecules in live cells with higher accuracy. As a label-free method to measure absolute quantities of various molecules in a cell, SAC-IR microscopy can potentially become a standard chemical characterization tool for live cells in biology, medicine, and biotechnology.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Solvent: A Framework for Protein Folding
Authors:
Jaemyung Lee,
Kyeongtak Han,
Jaehoon Kim,
Hasun Yu,
Youhan Lee
Abstract:
Consistency and reliability are crucial for conducting AI research. Many famous research fields, such as object detection, have been compared and validated with solid benchmark frameworks. After AlphaFold2, the protein folding task has entered a new phase, and many methods are proposed based on the component of AlphaFold2. The importance of a unified research framework in protein folding contains…
▽ More
Consistency and reliability are crucial for conducting AI research. Many famous research fields, such as object detection, have been compared and validated with solid benchmark frameworks. After AlphaFold2, the protein folding task has entered a new phase, and many methods are proposed based on the component of AlphaFold2. The importance of a unified research framework in protein folding contains implementations and benchmarks to consistently and fairly compare various approaches. To achieve this, we present Solvent, a protein folding framework that supports significant components of state-of-the-art models in the manner of an off-the-shelf interface Solvent contains different models implemented in a unified codebase and supports training and evaluation for defined models on the same dataset. We benchmark well-known algorithms and their components and provide experiments that give helpful insights into the protein structure modeling field. We hope that Solvent will increase the reliability and consistency of proposed models and give efficiency in both speed and costs, resulting in acceleration on protein folding modeling research. The code is available at https://github.com/kakaobrain/solvent, and the project will continue to be developed.
△ Less
Submitted 31 July, 2023; v1 submitted 7 July, 2023;
originally announced July 2023.
-
ProtFIM: Fill-in-Middle Protein Sequence Design via Protein Language Models
Authors:
Youhan Lee,
Hasun Yu
Abstract:
Protein language models (pLMs), pre-trained via causal language modeling on protein sequences, have been a promising tool for protein sequence design. In real-world protein engineering, there are many cases where the amino acids in the middle of a protein sequence are optimized while maintaining other residues. Unfortunately, because of the left-to-right nature of pLMs, existing pLMs modify suffix…
▽ More
Protein language models (pLMs), pre-trained via causal language modeling on protein sequences, have been a promising tool for protein sequence design. In real-world protein engineering, there are many cases where the amino acids in the middle of a protein sequence are optimized while maintaining other residues. Unfortunately, because of the left-to-right nature of pLMs, existing pLMs modify suffix residues by prompting prefix residues, which are insufficient for the infilling task that considers the whole surrounding context. To find the more effective pLMs for protein engineering, we design a new benchmark, Secondary structureE InFilling rEcoveRy, SEIFER, which approximates infilling sequence design scenarios. With the evaluation of existing models on the benchmark, we reveal the weakness of existing language models and show that language models trained via fill-in-middle transformation, called ProtFIM, are more appropriate for protein engineering. Also, we prove that ProtFIM generates protein sequences with decent protein representations through exhaustive experiments and visualizations.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Deep learning models for predicting RNA degradation via dual crowdsourcing
Authors:
Hannah K. Wayment-Steele,
Wipapat Kladwang,
Andrew M. Watkins,
Do Soon Kim,
Bojan Tunguz,
Walter Reade,
Maggie Demkin,
Jonathan Romano,
Roger Wellington-Oguri,
John J. Nicol,
Jiayang Gao,
Kazuki Onodera,
Kazuki Fujikawa,
Hanfei Mao,
Gilles Vandewiele,
Michele Tinti,
Bram Steenwinckel,
Takuya Ito,
Taiga Noumi,
Shujun He,
Keiichiro Ishi,
Youhan Lee,
Fatih Öztürk,
Anthony Chiu,
Emin Öztürk
, et al. (4 additional authors not shown)
Abstract:
Messenger RNA-based medicines hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a ke…
▽ More
Messenger RNA-based medicines hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition ("Stanford OpenVaccine") on Kaggle, involving single-nucleotide resolution measurements on 6043 102-130-nucleotide diverse RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6 months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504-1588 nucleotides) with improved accuracy compared to previously published models. Top teams integrated natural language processing architectures and data augmentation techniques with predictions from previous dynamic programming models for RNA secondary structure. These results indicate that such models are capable of representing in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for data set creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales.
△ Less
Submitted 22 April, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Directionally Dependent Multi-View Clustering Using Copula Model
Authors:
Kahkashan Afrin,
Ashif S. Iquebal,
Mostafa Karimi,
Allyson Souris,
Se Yoon Lee,
Bani K. Mallick
Abstract:
In recent biomedical scientific problems, it is a fundamental issue to integratively cluster a set of objects from multiple sources of datasets. Such problems are mostly encountered in genomics, where data is collected from various sources, and typically represent distinct yet complementary information. Integrating these data sources for multi-source clustering is challenging due to their complex…
▽ More
In recent biomedical scientific problems, it is a fundamental issue to integratively cluster a set of objects from multiple sources of datasets. Such problems are mostly encountered in genomics, where data is collected from various sources, and typically represent distinct yet complementary information. Integrating these data sources for multi-source clustering is challenging due to their complex dependence structure including directional dependency. Particularly in genomics studies, it is known that there is certain directional dependence between DNA expression, DNA methylation, and RNA expression, widely called The Central Dogma.
Most of the existing multi-view clustering methods either assume an independent structure or pair-wise (non-directional) dependency, thereby ignoring the directional relationship. Motivated by this, we propose a copula-based multi-view clustering model where a copula enables the model to accommodate the directional dependence existing in the datasets. We conduct a simulation experiment where the simulated datasets exhibiting inherent directional dependence: it turns out that ignoring the directional dependence negatively affects the clustering performance. As a real application, we applied our model to the breast cancer tumor samples collected from The Cancer Genome Altas (TCGA).
△ Less
Submitted 22 August, 2020; v1 submitted 16 March, 2020;
originally announced March 2020.
-
Quantifying myelin in brain tissue using color spatial light interference microscopy (cSLIM)
Authors:
Michael Fanous,
Megan P. Caputo,
Young Jae Lee,
Laurie A. Rund,
Catherine Best-Popescu,
Mikhail E. Kandel,
Rodney W. Johnson,
Tapas Das,
Matthew J. Kuchan,
Gabriel Popescu
Abstract:
Deficient myelination of the brain is associated with neurodevelopmental delays, particularly in high-risk infants, such as those born small in relation to their gestational age (SGA). New methods are needed to further study this condition. Here, we employ Color Spatial Light Interference Microscopy (cSLIM), which uses a brightfield objective and RGB camera to generate pathlength-maps with nanosca…
▽ More
Deficient myelination of the brain is associated with neurodevelopmental delays, particularly in high-risk infants, such as those born small in relation to their gestational age (SGA). New methods are needed to further study this condition. Here, we employ Color Spatial Light Interference Microscopy (cSLIM), which uses a brightfield objective and RGB camera to generate pathlength-maps with nanoscale sensitivity in conjunction with a regular brightfield image. Using tissue sections stained with Luxol Fast Blue, the myelin structures were segmented from a brightfield image. Using a binary mask, those portions were quantitatively analyzed in the corresponding phase maps. We first used the CLARITY method to remove tissue lipids and validate the sensitivity of cSLIM to lipid content. We then applied cSLIM to brain histology slices. These specimens are from a previous MRI study, which demonstrated that appropriate for gestational age (AGA) piglets have increased internal capsule myelination (ICM) compared to small for gestational age (SGA) piglets and that a hydrolyzed fat diet improved ICM in both. The identity of samples was blinded until after statistical analyses.
△ Less
Submitted 2 March, 2020;
originally announced March 2020.
-
Real-time forecasts of the 2019-nCoV epidemic in China from February 5th to February 24th, 2020
Authors:
K. Roosa,
Y. Lee,
R. Luo,
A. Kirpich,
R. Rothenberg,
J. M. Hyman,
P. Yan,
G. Chowell
Abstract:
The initial cluster of severe pneumonia cases that triggered the 2019-nCoV epidemic was identified in Wuhan, China in December 2019. While early cases of the disease were linked to a wet market, human-to-human transmission has driven the rapid spread of the virus throughout China. The ongoing outbreak presents a challenge for modelers, as limited data are available on the early growth trajectory,…
▽ More
The initial cluster of severe pneumonia cases that triggered the 2019-nCoV epidemic was identified in Wuhan, China in December 2019. While early cases of the disease were linked to a wet market, human-to-human transmission has driven the rapid spread of the virus throughout China. The ongoing outbreak presents a challenge for modelers, as limited data are available on the early growth trajectory, and the epidemiological characteristics of the novel coronavirus are yet to be fully elucidated. We provide timely short-term forecasts of the cumulative number of confirmed reported cases in Hubei province, the epicenter of the epidemic, and for the overall trajectory in China, excluding the province of Hubei. We collect daily reported cumulative case data for the 2019-nCoV outbreak for each Chinese province from the National Health Commission of China. Here, we provide 5, 10, and 15 day forecasts for five consecutive days, February 5th through February 9th, with quantified uncertainty based on a generalized logistic growth model, the Richards growth model, and a sub-epidemic wave model. Our most recent forecasts reported here based on data up until February 9, 2020, largely agree across the three models presented and suggest an average range of 7,409-7,496 additional cases in Hubei and 1,128-1,929 additional cases in other provinces within the next five days. Models also predict an average total cumulative case count between 37,415 - 38,028 in Hubei and 11,588 - 13,499 in other provinces by February 24, 2020. Mean estimates and uncertainty bounds for both Hubei and other provinces have remained relatively stable in the last three reporting dates (February 7th - 9th). Our forecasts suggest that the containment strategies implemented in China are successfully reducing transmission and that the epidemic growth has slowed in recent days.
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
RNASeqR: an R package for automated two-group RNA-Seq analysis workflow
Authors:
Kuan-Hao Chao,
Yi-Wen Hsiao,
Yi-Fang Lee,
Chien-Yueh Lee,
Liang-Chuan Lai,
Mong-Hsun Tsai,
Tzu-Pin Lu,
Eric Y. Chuang
Abstract:
RNA-Seq analysis has revolutionized researchers' understanding of the transcriptome in biological research. Assessing the differences in transcriptomic profiles between tissue samples or patient groups enables researchers to explore the underlying biological impact of transcription. RNA-Seq analysis requires multiple processing steps and huge computational capabilities. There are many well-develop…
▽ More
RNA-Seq analysis has revolutionized researchers' understanding of the transcriptome in biological research. Assessing the differences in transcriptomic profiles between tissue samples or patient groups enables researchers to explore the underlying biological impact of transcription. RNA-Seq analysis requires multiple processing steps and huge computational capabilities. There are many well-developed R packages for individual steps; however, there are few R/Bioconductor packages that integrate existing software tools into a comprehensive RNA-Seq analysis and provide fundamental end-to-end results in pure R environment so that researchers can quickly and easily get fundamental information in big sequencing data. To address this need, we have developed the open source R/Bioconductor package, RNASeqR. It allows users to run an automated RNA-Seq analysis with only six steps, producing essential tabular and graphical results for further biological interpretation. The features of RNASeqR include: six-step analysis, comprehensive visualization, background execution version, and the integration of both R and command-line software. RNASeqR provides fast, light-weight, and easy-to-run RNA-Seq analysis pipeline in pure R environment. It allows users to efficiently utilize popular software tools, including both R/Bioconductor and command-line tools, without predefining the resources or environments. RNASeqR is freely available for Linux and macOS operating systems from Bioconductor (https://bioconductor.org/packages/release/bioc/html/RNASeqR.html).
△ Less
Submitted 9 May, 2019;
originally announced May 2019.
-
Pro-arrhythmogenic effects of heterogeneous tissue curvature: A suggestion for role of left atrial appendage in atrial fibrillation
Authors:
Jun-Seop Song,
Jaehyeok Kim,
Byounghyun Lim,
Young-Seon Lee,
Minki Hwang,
Boyoung Joung,
Eun Bo Shim,
Hui-Nam Pak
Abstract:
Background: The arrhythmogenic role of atrial complex morphology has not yet been clearly elucidated. We hypothesized that bumpy tissue geometry can induce action potential duration (APD) dispersion and wavebreak in atrial fibrillation (AF).
Methods and Results: We simulated 2D-bumpy atrial model by varying the degree of bumpiness, and 3D-left atrial (LA) models integrated by LA computed tomogra…
▽ More
Background: The arrhythmogenic role of atrial complex morphology has not yet been clearly elucidated. We hypothesized that bumpy tissue geometry can induce action potential duration (APD) dispersion and wavebreak in atrial fibrillation (AF).
Methods and Results: We simulated 2D-bumpy atrial model by varying the degree of bumpiness, and 3D-left atrial (LA) models integrated by LA computed tomographic (CT) images taken from 14 patients with persistent AF. We also analyzed wave-dynamic parameters with bipolar electrograms during AF and compared them with LA-CT geometry in 30 patients with persistent AF. In 2D-bumpy model, APD dispersion increased (p<0.001) and wavebreak occurred spontaneously when the surface bumpiness was higher, showing phase transition-like behavior (p<0.001). Bumpiness gradient 2D-model showed that spiral wave drifted in the direction of higher bumpiness, and phase singularity (PS) points were mostly located in areas with higher bumpiness. In 3D-LA model, PS density was higher in LA appendage (LAA) compared to other LA parts (p<0.05). In 30 persistent AF patients, the surface bumpiness of LAA was 5.8-times that of other LA parts (p<0.001), and exceeded critical bumpiness to induce wavebreak. Wave dynamics complexity parameters were consistently dominant in LAA (p<0.001).
Conclusion: The bumpy tissue geometry promotes APD dispersion, wavebreak, and spiral wave drift in in silico human atrial tissue, and corresponds to clinical electro-anatomical maps.
△ Less
Submitted 14 September, 2018; v1 submitted 5 March, 2018;
originally announced March 2018.
-
Volume entropy and information flow in a brain graph
Authors:
Hyekyoung Lee,
Eunkyung Kim,
Hye** Kang,
Youngmin Huh,
Youngjo Lee,
Seonhee Lim,
Dong Soo Lee
Abstract:
Entropy is a classical measure to quantify the amount of information or complexity of a system. Various entropy-based measures such as functional and spectral entropies have been proposed in brain network analysis. However, they are less widely used than traditional graph theoretic measures such as global and local efficiencies because either they are not well-defined on a graph or difficult to in…
▽ More
Entropy is a classical measure to quantify the amount of information or complexity of a system. Various entropy-based measures such as functional and spectral entropies have been proposed in brain network analysis. However, they are less widely used than traditional graph theoretic measures such as global and local efficiencies because either they are not well-defined on a graph or difficult to interpret its biological meaning. In this paper, we propose a new entropy-based graph invariant, called volume entropy. It measures the exponential growth rate of the number of paths in a graph, which is a relevant measure if information flows through the graph forever. We model the information propagation on a graph by the generalized Markov system associated to the weighted edge-transition matrix. We estimate the volume entropy using the stationary equation of the generalized Markov system. A prominent advantage of using the stationary equation is that it assigns certain distribution of weights on the edges of the brain graph, which we call the stationary distribution. The stationary distribution shows the information capacity of edges and the direction of information flow on a brain graph. The simulation results show that the volume entropy distinguishes the underlying graph topology and geometry better than the existing graph measures. In brain imaging data application, the volume entropy of brain graphs was significantly related to healthy normal aging from 20s to 60s. In addition, the stationary distribution of information propagation gives a new insight into the information flow of functional brain graph.
△ Less
Submitted 7 March, 2018; v1 submitted 28 January, 2018;
originally announced January 2018.
-
Ultraslow water-mediated transmembrane interactions regulate the activation of A$_{\text{2A}}$ adenosine receptor
Authors:
Yoonji Lee,
Songmi Kim,
Sun Choi,
Changbong Hyeon
Abstract:
Water molecules inside G-protein coupled receptor have recently been spotlighted in a series of crystal structures. To decipher the dynamics and functional roles of internal waters in GPCR activity, we studied A$_{\text{2A}}$ adenosine receptor using $μ$sec-molecular dynamics simulations. Our study finds that the amount of water flux across the transmembrane (TM) domain varies depending on the rec…
▽ More
Water molecules inside G-protein coupled receptor have recently been spotlighted in a series of crystal structures. To decipher the dynamics and functional roles of internal waters in GPCR activity, we studied A$_{\text{2A}}$ adenosine receptor using $μ$sec-molecular dynamics simulations. Our study finds that the amount of water flux across the transmembrane (TM) domain varies depending on the receptor state, and that the water molecules of the TM channel in the active state flow three times slower than those in the inactive state. Depending on the location in solvent-protein interface as well as the receptor state, the average residence time of water in each residue varies from $\sim\mathcal{O}(10^2)$ psec to $\sim\mathcal{O}(10^2)$ nsec. Especially, water molecules, exhibiting ultraslow relaxation ($\sim\mathcal{O}(10^2)$ nsec) in the active state, are found around the microswitch residues that are considered activity hotspots for GPCR function. A continuous allosteric network spanning the TM domain, arising from water-mediated contacts, is unique in the active state, underscoring the importance of slow waters in the GPCR activation.
△ Less
Submitted 4 August, 2016;
originally announced August 2016.
-
Communication over the network of binary switches regulates the activation of A$_{2A}$ adenosine receptor
Authors:
Yoonji Lee,
Sun Choi,
Changbong Hyeon
Abstract:
Dynamics and functions of G-protein coupled receptors (GPCRs) are accurately regulated by the type of ligands that bind to the orthosteric or allosteric binding sites. To glean the structural and dynamical origin of ligand-dependent modulation of GPCR activity, we performed total $\sim$ 5 $μ$sec molecular dynamics simulations of A$_{2A}$ adenosine receptor (A$_{2A}$AR) in its apo, antagonist-bound…
▽ More
Dynamics and functions of G-protein coupled receptors (GPCRs) are accurately regulated by the type of ligands that bind to the orthosteric or allosteric binding sites. To glean the structural and dynamical origin of ligand-dependent modulation of GPCR activity, we performed total $\sim$ 5 $μ$sec molecular dynamics simulations of A$_{2A}$ adenosine receptor (A$_{2A}$AR) in its apo, antagonist-bound, and agonist-bound forms in an explicit water and membrane environment, and examined the corresponding dynamics and correlation between the 10 key structural motifs that serve as the allosteric hotspots in intramolecular signaling network. We dubbed these 10 structural motifs "binary switches" as they display molecular interactions that switch between two distinct states. By projecting the receptor dynamics on these binary switches that yield $2^{10}$ microstates, we show that (i) the receptors in apo, antagonist-bound, and agonist-bound states explore vastly different conformational space; (ii) among the three receptor states the apo state explores the broadest range of microstates; (iii) in the presence of the agonist, the active conformation is maintained through coherent couplings among the binary switches; and (iv) to be most specific, our analysis shows that W246, located deep inside the binding cleft, can serve as both an agonist sensor and actuator of ensuing intramolecular signaling for the receptor activation.Finally, our analysis of multiple trajectories generated by inserting an agonist to the apo state underscores that the transition of the receptor from inactive to active form requires the disruption of ionic-lock in the DRY motif.
△ Less
Submitted 19 November, 2014;
originally announced November 2014.
-
Genetic Studies of Physiological Traits with Their Application to Sleep Apnea
Authors:
D. Y. Lee,
C. Hanis,
G. I. Bell,
D. A. Aguilar,
S. Redline,
J. Below,
M. M. Xiong
Abstract:
Advances of modern sensing and sequencing technologies generate a deluge of high dimensional space-temporal physiological and next-generation sequencing (NGS) data. Physiological traits are observed either as continuous random functions, or on a dense grid and referred to as function-valued traits. Both physiological and NGS data are highly correlated data with their inherent order, spacing, and f…
▽ More
Advances of modern sensing and sequencing technologies generate a deluge of high dimensional space-temporal physiological and next-generation sequencing (NGS) data. Physiological traits are observed either as continuous random functions, or on a dense grid and referred to as function-valued traits. Both physiological and NGS data are highly correlated data with their inherent order, spacing, and functional nature which are ignored by traditional summary-based univariate and multivariate regression methods designed for quantitative genetic analysis of scalar trait and common variants. To capture morphological and dynamic features of the data and utilize their dependent structure, we propose a functional linear model (FLM) in which a trait curve is modeled as a response function, the genetic variation in a genomic region or gene is modeled as a functional predictor, and the genetic effects are modeled as a function of both time and genomic position (FLMF) for genetic analysis of function-valued trait with both GWAS and NGS data. By extensive simulations, we demonstrate that the FLMF has the correct type 1 error rates and much higher power to detect association than the existing methods. The FLMF is applied to sleep data from Starr County health studies where oxygen saturation were measured in 22,670 seconds on average for 833 individuals. We found 65 genes that were significantly associated with oxygen saturation functional trait with P-values ranging from 2.40E-06 to 2.53E-21. The results clearly demonstrate that the FLMF substantially outperforms the traditional genetic models with scalar trait.
△ Less
Submitted 27 October, 2014;
originally announced October 2014.
-
Map** the intramolecular signal transduction of G-protein coupled receptors
Authors:
Yoonji Lee,
Sun Choi,
Changbong Hyeon
Abstract:
G-protein coupled receptors (GPCRs), a major gatekeeper of extracellular signals on plasma membrane, are unarguably one of the most important therapeutic targets. Given the recent discoveries of allosteric modulations, an allosteric wiring diagram of intramolecular signal transductions would be of great use to glean the mechanism of receptor regulation. Here, by evaluating betweenness centrality (…
▽ More
G-protein coupled receptors (GPCRs), a major gatekeeper of extracellular signals on plasma membrane, are unarguably one of the most important therapeutic targets. Given the recent discoveries of allosteric modulations, an allosteric wiring diagram of intramolecular signal transductions would be of great use to glean the mechanism of receptor regulation. Here, by evaluating betweenness centrality ($C_B$) of each residue, we calculate maps of information flow in GPCRs and identify key residues for signal transductions and their pathways. Compared with preexisting approaches, the allosteric hotspots that our $C_B$-based analysis detects for A$_{2A}$ adenosine receptor (A$_{2A}$AR) and bovine rhodopsin are better correlated with biochemical data. In particular, our analysis outperforms other methods in locating the rotameric microswitches, which are generally deemed critical for mediating orthosteric signaling in class A GPCRs. For A$_{2A}$AR, the inter-residue cross-correlation map, calculated using equilibrium structural ensemble from molecular dynamics simulations, reveals that strong signals of long-range transmembrane communications exist only in the agonist-bound state. A seemingly subtle variation in structure, found in different GPCR subtypes or imparted by agonist bindings or a point mutation at an allosteric site, can lead to a drastic difference in the map of signaling pathways and protein activity. The signaling map of GPCRs provides valuable insights into allosteric modulations as well as reliable identifications of orthosteric signaling pathways.
△ Less
Submitted 15 October, 2013;
originally announced October 2013.
-
Stochastic simulation of biochemical systems with randomly fluctuating rate constants
Authors:
Chia Ying Lee
Abstract:
In an experimental study of single enzyme reactions, it has been proposed that the rate constants of the enzymatic reactions fluctuate randomly, according to a given distribution. To quantify the uncertainty arising from random rate constants, it is necessary to investigate how one can simulate such a biochemical system. To do this, we will take the Gillespie's stochastic simulation algorithm for…
▽ More
In an experimental study of single enzyme reactions, it has been proposed that the rate constants of the enzymatic reactions fluctuate randomly, according to a given distribution. To quantify the uncertainty arising from random rate constants, it is necessary to investigate how one can simulate such a biochemical system. To do this, we will take the Gillespie's stochastic simulation algorithm for simulating the evolution of the state of a chemical system, and study a modification of the algorithm that incorporates the random rate constants. In addition to simulating the waiting time of each reaction step, the modified algorithm also involves simulating the random fluctuation of the rate constant at each reaction time. We consider the modified algorithm in a general framework, then specialize it to two contrasting physical models, one in which the fluctuations occur on a much faster time scale than the reaction step, and the other in which the fluctuations occur much more slowly. The latter case was applied to the single enzyme reaction system, using in part the Metropolis-Hastings algorithm to enact the given distribution on the random rate constants. The modified algorithm is shown to produce simulation outputs that are corroborated by the experimental results. It is hoped that this modified algorithm can subsequently be used as a tool for the estimation or calibration of parameters in the system using experimental data.
△ Less
Submitted 6 February, 2012;
originally announced February 2012.
-
Link between allosteric signal transduction and functional dynamics in a multi-subunit enzyme: S-adenosylhomocysteine hydrolase
Authors:
Yoonji Lee,
Lak Shin Jeong,
Sun Choi,
Changbong Hyeon
Abstract:
S-adenosylhomocysteine hydrolase (SAHH), a cellular enzyme that plays a key role in methylation reactions including those required for maturation of viral mRNA, is an important drug target in the discovery of antiviral agents. While targeting the active site is a straightforward strategy of enzyme inhibition, evidences of allosteric modulation of active site in many enzymes underscore the molecula…
▽ More
S-adenosylhomocysteine hydrolase (SAHH), a cellular enzyme that plays a key role in methylation reactions including those required for maturation of viral mRNA, is an important drug target in the discovery of antiviral agents. While targeting the active site is a straightforward strategy of enzyme inhibition, evidences of allosteric modulation of active site in many enzymes underscore the molecular origin of signal transduction. Information of co-evolving sequences in SAHH family and the key residues for functional dynamics that can be identified using native topology of the enzyme provide glimpses into how the allosteric signaling network, dispersed over the molecular structure, coordinates intra- and inter-subunit conformational dynamics. To study the link between the allosteric communication and functional dynamics of SAHHs, we performed Brownian dynamics simulations by building a coarse-grained model based on the holo and ligand-bound structures. The simulations of ligand-induced transition revealed that the signal of intra-subunit closure dynamics is transmitted to form inter-subunit contacts, which in turn invoke a precise alignment of active site, followed by the dimer-dimer rotation that compacts the whole tetrameric structure. Further analyses of SAHH dynamics associated with ligand binding provided evidence of both induced fit and population shift mechanisms, and also showed that the transition state ensemble is akin to the ligand-bound state. Besides the formation of enzyme-ligand contacts at the active site, the allosteric couplings from the residues distal to the active site is vital to the enzymatic function.
△ Less
Submitted 27 October, 2011;
originally announced October 2011.
-
Removing System Noise from Comparative Genomic Hybridization Data by Self-Self Analysis
Authors:
Yoon-ha Lee,
Michael Ronemus,
Jude Kendall,
B. Lakshmi,
Anthony Leotta,
Dan Levy,
Diane Esposito,
Vladimir Grubor,
Kenny Ye,
Michael Wigler,
Boris Yamrom
Abstract:
Genomic copy number variation (CNV) is a large source of variation between organisms, and its consequences include phenotypic differences and genetic disorders. CNVs are commonly detected by hybridizing genomic DNA to microarrays of nucleic acid probes. System noise caused by operational and probe performance variability complicates the interpretation of these data. To minimize the distortion of g…
▽ More
Genomic copy number variation (CNV) is a large source of variation between organisms, and its consequences include phenotypic differences and genetic disorders. CNVs are commonly detected by hybridizing genomic DNA to microarrays of nucleic acid probes. System noise caused by operational and probe performance variability complicates the interpretation of these data. To minimize the distortion of genetic signal by system noise, we have explored the latter in an archive of hybridizations in which no genetic signal is expected. This archive is obtained by comparative genomic hybridization (CGH) of a sample in one channel to the same sample in the other channel, or 'self-self' data. These self-self hybridizations trap a variety of system noise inherent in sample-reference (test) data. Through singular value decomposition (SVD) of self-self data, we have determined the principal components of system noise. Assuming simple linear models of noise generation, the linear correction of test data with self-self data -or 'system normalization'- reduces local and long-range correlations and improves signal-to-noise metrics, yet does not introduce detectable spurious signal. Using this method, 90% of hybridizations displayed improved signal-to-noise ratios with an average increase of 7.0%, due mainly to a reduced median average deviation (MAD). In addition, we have found that principal component loadings correlate with specific probe variables including array coordinates, base composition, and proximity to the 5' ends of genes. The correlation of the principal component loadings with the test data depends on operational variables, such as the temporal order of processing and the localization of individual samples within 96-well plates.
△ Less
Submitted 4 May, 2011;
originally announced May 2011.
-
Metabolite essentiality elucidates robustness of Escherichia coli metabolism
Authors:
Pan-Jun Kim,
Dong-Yup Lee,
Tae Yong Kim,
Kwang Ho Lee,
Hawoong Jeong,
Sang Yup Lee,
Sunwon Park
Abstract:
Complex biological systems are very robust to genetic and environmental changes at all levels of organization. Many biological functions of Escherichia coli metabolism can be sustained against single-gene or even multiple-gene mutations by using redundant or alternative pathways. Thus, only a limited number of genes have been identified to be lethal to the cell. In this regard, the reaction-cent…
▽ More
Complex biological systems are very robust to genetic and environmental changes at all levels of organization. Many biological functions of Escherichia coli metabolism can be sustained against single-gene or even multiple-gene mutations by using redundant or alternative pathways. Thus, only a limited number of genes have been identified to be lethal to the cell. In this regard, the reaction-centric gene deletion study has a limitation in understanding the metabolic robustness. Here, we report the use of flux-sum, which is the summation of all incoming or outgoing fluxes around a particular metabolite under pseudo-steady state conditions, as a good conserved property for elucidating such robustness of E. coli from the metabolite point of view. The functional behavior, as well as the structural and evolutionary properties of metabolites essential to the cell survival, was investigated by means of a constraints-based flux analysis under perturbed conditions. The essential metabolites are capable of maintaining a steady flux-sum even against severe perturbation by actively redistributing the relevant fluxes. Disrupting the flux-sum maintenance was found to suppress cell growth. This approach of analyzing metabolite essentiality provides insight into cellular robustness and concomitant fragility, which can be used for several applications, including the development of new drugs for treating pathogens.
△ Less
Submitted 14 August, 2007;
originally announced August 2007.
-
Symmetry-Breaking Motility
Authors:
Allen Lee,
Ha Youn Lee,
Mehran Kardar
Abstract:
Locomotion of bacteria by actin polymerization, and in vitro motion of spherical beads coated with a protein catalyzing polymerization, are examples of active motility. Starting from a simple model of forces locally normal to the surface of a bead, we construct a phenomenological equation for its motion. The singularities at a continuous transition between moving and stationary beads are shown t…
▽ More
Locomotion of bacteria by actin polymerization, and in vitro motion of spherical beads coated with a protein catalyzing polymerization, are examples of active motility. Starting from a simple model of forces locally normal to the surface of a bead, we construct a phenomenological equation for its motion. The singularities at a continuous transition between moving and stationary beads are shown to be related to the symmetries of its shape. Universal features of the phase behavior are calculated analytically and confirmed by simulations. Fluctuations in velocity are shown to be generically non-Maxwellian and correlated to the shape of the bead.
△ Less
Submitted 29 July, 2004; v1 submitted 28 July, 2004;
originally announced July 2004.
-
Statistics of lines of natural images and implications for visual detection
Authors:
Ha Youn Lee,
Mehran Kardar
Abstract:
As borders between different regions, lines are an important element of natural images. Already at the level of the mammalian primary visual cortex (V1), neurons respond best to lines of a given orientation. We reduce a set of images to linear segments and analyze their statistical properties. In particular, appropriately defined Fourier spectra show more power in their transverse component than…
▽ More
As borders between different regions, lines are an important element of natural images. Already at the level of the mammalian primary visual cortex (V1), neurons respond best to lines of a given orientation. We reduce a set of images to linear segments and analyze their statistical properties. In particular, appropriately defined Fourier spectra show more power in their transverse component than in the longitudinal one. We then characterize filters that are best suited for extracting information from such images, and find some qualitative consistency with neural connections in V1. We also demonstrate that such filters are efficient in reconstructing missing lines in an image.
△ Less
Submitted 29 June, 2004;
originally announced June 2004.
-
Symmetry considerations and development of pinwheels in visual maps
Authors:
Ha Youn Lee,
Mehdi Yahyanejad,
Mehran Kardar
Abstract:
Neurons in the visual cortex respond best to rod-like stimuli of given orientation. While the preferred orientation varies continuously across most of the cortex, there are prominent pinwheel centers around which all orientations a re present. Oriented segments abound in natural images, and tend to be collinear}; neurons are also more likely to be connected if their preferred orientations are al…
▽ More
Neurons in the visual cortex respond best to rod-like stimuli of given orientation. While the preferred orientation varies continuously across most of the cortex, there are prominent pinwheel centers around which all orientations a re present. Oriented segments abound in natural images, and tend to be collinear}; neurons are also more likely to be connected if their preferred orientations are aligned to their topographic separation. These are indications of a reduced symmetry requiring joint rotations of both orientation preference and the underl ying topography. We verify that this requirement extends to cortical maps of mo nkey and cat by direct statistical analysis. Furthermore, analytical arguments and numerical studies indicate that pinwheels are generically stable in evolving field models which couple orientation and topography.
△ Less
Submitted 19 December, 2003;
originally announced December 2003.
-
Sequence Space Localization in the Immune System Response to Vaccination and Disease
Authors:
Michael W. Deem,
Ha Youn Lee
Abstract:
We introduce a model of protein evolution to explain limitations in the immune system response to vaccination and disease. The phenomenon of original antigenic sin, wherein vaccination creates memory sequences that can \emph{increase} susceptibility to future exposures to the same disease, is explained as stemming from localization of the immune system response in antibody sequence space. This l…
▽ More
We introduce a model of protein evolution to explain limitations in the immune system response to vaccination and disease. The phenomenon of original antigenic sin, wherein vaccination creates memory sequences that can \emph{increase} susceptibility to future exposures to the same disease, is explained as stemming from localization of the immune system response in antibody sequence space. This localization is a result of the roughness in sequence space of the evolved antibody affinity constant for antigen and is observed for diseases with high year-to-year mutation rates, such as influenza.
△ Less
Submitted 29 August, 2003;
originally announced August 2003.
-
Macroscopic equations for pattern formation in mixtures of microtubules and motors
Authors:
Ha Youn Lee,
Mehran Kardar
Abstract:
Inspired by patterns observed in mixtures of microtubules and molecular motors, we propose continuum equations for the evolution of motor density, and microtubule orientation. The chief ingredients are the transport of motors along tubules, and the alignment of tubules in the process. The macroscopic equations lead to aster and vortex patterns in qualitative agreement with experiments. While the…
▽ More
Inspired by patterns observed in mixtures of microtubules and molecular motors, we propose continuum equations for the evolution of motor density, and microtubule orientation. The chief ingredients are the transport of motors along tubules, and the alignment of tubules in the process. The macroscopic equations lead to aster and vortex patterns in qualitative agreement with experiments. While the early stages of evolution of tubules are similar to coarsening of spins following a quench, the rearrangement of motors leads to arrested coarsening at low densities. Even in one dimension, the equations exhibit a variety of interesting behaviors, such as symmetry breaking, moving fronts, and motor localization.
△ Less
Submitted 16 February, 2001;
originally announced February 2001.
-
Genetic Polymorphism in Evolving Population
Authors:
H. Y. Lee,
D. Kim,
M. Y. Choi
Abstract:
We present a model for evolving population which maintains genetic polymorphism. By introducing random mutation in the model population at a constant rate, we observe that the population does not become extinct but survives, kee** diversity in the gene pool under abrupt environmental changes. The model provides reasonable estimates for the proportions of polymorphic and heterozygous loci and f…
▽ More
We present a model for evolving population which maintains genetic polymorphism. By introducing random mutation in the model population at a constant rate, we observe that the population does not become extinct but survives, kee** diversity in the gene pool under abrupt environmental changes. The model provides reasonable estimates for the proportions of polymorphic and heterozygous loci and for the mutation rate, as observed in nature.
△ Less
Submitted 8 January, 1998;
originally announced January 1998.
-
Entropic Sampling and Natural Selection in Biological Evolution
Authors:
M. Y. Choi,
H. Y. Lee,
D. Kim,
S. H. Park
Abstract:
With a view to connecting random mutation on the molecular level to punctuated equilibrium behavior on the phenotype level, we propose a new model for biological evolution, which incorporates random mutation and natural selection. In this scheme the system evolves continuously into new configurations, yielding non-stationary behavior of the total fitness. Further, both the waiting time distribut…
▽ More
With a view to connecting random mutation on the molecular level to punctuated equilibrium behavior on the phenotype level, we propose a new model for biological evolution, which incorporates random mutation and natural selection. In this scheme the system evolves continuously into new configurations, yielding non-stationary behavior of the total fitness. Further, both the waiting time distribution of species and the avalanche size distribution display power-law behaviors with exponents close to two, which are consistent with the fossil data. These features are rather robust, indicating the key role of entropy.
△ Less
Submitted 9 January, 1998; v1 submitted 12 July, 1996;
originally announced July 1996.