Search | arXiv e-print repository

A frequentist test of proportional colocalization after selecting relevant genetic variants

Authors: Ashish Patel, John C. Whittaker, Stephen Burgess

Abstract: Colocalization analyses assess whether two traits are affected by the same or distinct causal genetic variants in a single gene region. A class of Bayesian colocalization tests are now routinely used in practice; for example, for genetic analyses in drug development pipelines. In this work, we consider an alternative frequentist approach to colocalization testing that examines the proportionality… ▽ More Colocalization analyses assess whether two traits are affected by the same or distinct causal genetic variants in a single gene region. A class of Bayesian colocalization tests are now routinely used in practice; for example, for genetic analyses in drug development pipelines. In this work, we consider an alternative frequentist approach to colocalization testing that examines the proportionality of genetic associations with each trait. The proportional colocalization approach uses markedly different assumptions to Bayesian colocalization tests, and therefore can provide valuable complementary evidence in cases where Bayesian colocalization results are inconclusive or sensitive to priors. We propose a novel conditional test of proportional colocalization, prop-coloc-cond, that aims to account for the uncertainty in variant selection, in order to recover accurate type I error control. The test can be implemented straightforwardly, requiring only summary data on genetic associations. Simulation evidence and an empirical investigation into GLP1R gene expression demonstrates how tests of proportional colocalization can offer important insights in conjunction with Bayesian colocalization tests. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.02656 [pdf, other]

RACER: An LLM-powered Methodology for Scalable Analysis of Semi-structured Mental Health Interviews

Authors: Satpreet Harcharan Singh, Kevin Jiang, Kanchan Bhasin, Ashutosh Sabharwal, Nidal Moukaddam, Ankit B Patel

Abstract: Semi-structured interviews (SSIs) are a commonly employed data-collection method in healthcare research, offering in-depth qualitative insights into subject experiences. Despite their value, the manual analysis of SSIs is notoriously time-consuming and labor-intensive, in part due to the difficulty of extracting and categorizing emotional responses, and challenges in scaling human evaluation for l… ▽ More Semi-structured interviews (SSIs) are a commonly employed data-collection method in healthcare research, offering in-depth qualitative insights into subject experiences. Despite their value, the manual analysis of SSIs is notoriously time-consuming and labor-intensive, in part due to the difficulty of extracting and categorizing emotional responses, and challenges in scaling human evaluation for large populations. In this study, we develop RACER, a Large Language Model (LLM) based expert-guided automated pipeline that efficiently converts raw interview transcripts into insightful domain-relevant themes and sub-themes. We used RACER to analyze SSIs conducted with 93 healthcare professionals and trainees to assess the broad personal and professional mental health impacts of the COVID-19 crisis. RACER achieves moderately high agreement with two human evaluators (72%), which approaches the human inter-rater agreement (77%). Interestingly, LLMs and humans struggle with similar content involving nuanced emotional, ambivalent/dialectical, and psychological statements. Our study highlights the opportunities and challenges in using LLMs to improve research efficiency and opens new avenues for scalable analysis of SSIs in healthcare research. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2310.18278 [pdf, other]

Navigating protein landscapes with a machine-learned transferable coarse-grained model

Authors: Nicholas E. Charron, Felix Musil, Andrea Guljas, Yaoyi Chen, Klara Bonneau, Aldo S. Pasos-Trejo, Jacopo Venturin, Daria Gusew, Iryna Zaporozhets, Andreas Krämer, Clark Templeton, Atharva Kelkar, Aleksander E. P. Durumeric, Simon Olsson, Adrià Pérez, Maciej Majewski, Brooke E. Husic, Ankit Patel, Gianni De Fabritiis, Frank Noé, Cecilia Clementi

Abstract: The most popular and universally predictive protein simulation models employ all-atom molecular dynamics (MD), but they come at extreme computational cost. The development of a universal, computationally efficient coarse-grained (CG) model with similar prediction performance has been a long-standing challenge. By combining recent deep learning methods with a large and diverse training set of all-a… ▽ More The most popular and universally predictive protein simulation models employ all-atom molecular dynamics (MD), but they come at extreme computational cost. The development of a universal, computationally efficient coarse-grained (CG) model with similar prediction performance has been a long-standing challenge. By combining recent deep learning methods with a large and diverse training set of all-atom protein simulations, we here develop a bottom-up CG force field with chemical transferability, which can be used for extrapolative molecular dynamics on new sequences not used during model parametrization. We demonstrate that the model successfully predicts folded structures, intermediates, metastable folded and unfolded basins, and the fluctuations of intrinsically disordered proteins while it is several orders of magnitude faster than an all-atom model. This showcases the feasibility of a universal and computationally efficient machine-learned CG model for proteins. △ Less

Submitted 27 October, 2023; originally announced October 2023.

arXiv:2306.15794 [pdf, other]

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution

Authors: Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, Clayton Rabideau, Stefano Massaroli, Yoshua Bengio, Stefano Ermon, Stephen A. Baccus, Chris Ré

Abstract: Genomic (DNA) sequences encode an enormous amount of information for gene regulation and protein synthesis. Similar to natural language models, researchers have proposed foundation models in genomics to learn generalizable features from unlabeled genome data that can then be fine-tuned for downstream tasks such as identifying regulatory elements. Due to the quadratic scaling of attention, previous… ▽ More Genomic (DNA) sequences encode an enormous amount of information for gene regulation and protein synthesis. Similar to natural language models, researchers have proposed foundation models in genomics to learn generalizable features from unlabeled genome data that can then be fine-tuned for downstream tasks such as identifying regulatory elements. Due to the quadratic scaling of attention, previous Transformer-based genomic models have used 512 to 4k tokens as context (<0.001% of the human genome), significantly limiting the modeling of long-range interactions in DNA. In addition, these methods rely on tokenizers or fixed k-mers to aggregate meaningful DNA units, losing single nucleotide resolution where subtle genetic variations can completely alter protein function via single nucleotide polymorphisms (SNPs). Recently, Hyena, a large language model based on implicit convolutions was shown to match attention in quality while allowing longer context lengths and lower time complexity. Leveraging Hyena's new long-range capabilities, we present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level - an up to 500x increase over previous dense attention-based models. HyenaDNA scales sub-quadratically in sequence length (training up to 160x faster than Transformer), uses single nucleotide tokens, and has full global context at each layer. We explore what longer context enables - including the first use of in-context learning in genomics. On fine-tuned benchmarks from the Nucleotide Transformer, HyenaDNA reaches state-of-the-art (SotA) on 12 of 18 datasets using a model with orders of magnitude less parameters and pretraining data. On the GenomicBenchmarks, HyenaDNA surpasses SotA on 7 of 8 datasets on average by +10 accuracy points. Code at https://github.com/HazyResearch/hyena-dna. △ Less

Submitted 14 November, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023 (Spotlight)

arXiv:2203.09371 [pdf, other]

Transforming Gait: Video-Based Spatiotemporal Gait Analysis

Authors: R. James Cotton, Emoonah McClerklin, Anthony Cimorelli, Ankit Patel, Tasos Karakostas

Abstract: Human pose estimation from monocular video is a rapidly advancing field that offers great promise to human movement science and rehabilitation. This potential is tempered by the smaller body of work ensuring the outputs are clinically meaningful and properly calibrated. Gait analysis, typically performed in a dedicated lab, produces precise measurements including kinematics and step timing. Using… ▽ More Human pose estimation from monocular video is a rapidly advancing field that offers great promise to human movement science and rehabilitation. This potential is tempered by the smaller body of work ensuring the outputs are clinically meaningful and properly calibrated. Gait analysis, typically performed in a dedicated lab, produces precise measurements including kinematics and step timing. Using over 7000 monocular video from an instrumented gait analysis lab, we trained a neural network to map 3D joint trajectories and the height of individuals onto interpretable biomechanical outputs including gait cycle timing and sagittal plane joint kinematics and spatiotemporal trajectories. This task specific layer produces accurate estimates of the timing of foot contact and foot off events. After parsing the kinematic outputs into individual gait cycles, it also enables accurate cycle-by-cycle estimates of cadence, step time, double and single support time, walking speed and step length. △ Less

Submitted 17 March, 2022; originally announced March 2022.

arXiv:2103.13282 [pdf, other]

doi 10.1109/ICRA48506.2021.9561338

AcinoSet: A 3D Pose Estimation Dataset and Baseline Models for Cheetahs in the Wild

Authors: Daniel Joska, Liam Clark, Naoya Muramatsu, Ricardo Jericevich, Fred Nicolls, Alexander Mathis, Mackenzie W. Mathis, Amir Patel

Abstract: Animals are capable of extreme agility, yet understanding their complex dynamics, which have ecological, biomechanical and evolutionary implications, remains challenging. Being able to study this incredible agility will be critical for the development of next-generation autonomous legged robots. In particular, the cheetah (acinonyx jubatus) is supremely fast and maneuverable, yet quantifying its w… ▽ More Animals are capable of extreme agility, yet understanding their complex dynamics, which have ecological, biomechanical and evolutionary implications, remains challenging. Being able to study this incredible agility will be critical for the development of next-generation autonomous legged robots. In particular, the cheetah (acinonyx jubatus) is supremely fast and maneuverable, yet quantifying its whole-body 3D kinematic data during locomotion in the wild remains a challenge, even with new deep learning-based methods. In this work we present an extensive dataset of free-running cheetahs in the wild, called AcinoSet, that contains 119,490 frames of multi-view synchronized high-speed video footage, camera calibration files and 7,588 human-annotated frames. We utilize markerless animal pose estimation to provide 2D keypoints. Then, we use three methods that serve as strong baselines for 3D pose estimation tool development: traditional sparse bundle adjustment, an Extended Kalman Filter, and a trajectory optimization-based method we call Full Trajectory Estimation. The resulting 3D trajectories, human-checked 3D ground truth, and an interactive tool to inspect the data is also provided. We believe this dataset will be useful for a diverse range of fields such as ecology, neuroscience, robotics, biomechanics as well as computer vision. △ Less

Submitted 24 March, 2021; originally announced March 2021.

Comments: Code and data can be found at: https://github.com/African-Robotics-Unit/AcinoSet

Journal ref: 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 13901-13908

arXiv:2004.12028 [pdf, other]

doi 10.1111/biom.13424

Two-Stage Penalized Regression Screening to Detect Biomarker-Treatment Interactions in Randomized Clinical Trials

Authors: Jixiong Wang, Ashish Patel, James M. S. Wason, Paul J. Newcombe

Abstract: High-dimensional biomarkers such as genomics are increasingly being measured in randomized clinical trials. Consequently, there is a growing interest in develo** methods that improve the power to detect biomarker-treatment interactions. We adapt recently proposed two-stage interaction detecting procedures in the setting of randomized clinical trials. We also propose a new stage 1 multivariate sc… ▽ More High-dimensional biomarkers such as genomics are increasingly being measured in randomized clinical trials. Consequently, there is a growing interest in develo** methods that improve the power to detect biomarker-treatment interactions. We adapt recently proposed two-stage interaction detecting procedures in the setting of randomized clinical trials. We also propose a new stage 1 multivariate screening strategy using ridge regression to account for correlations among biomarkers. For this multivariate screening, we prove the asymptotic between-stage independence, required for family-wise error rate control, under biomarker-treatment independence. Simulation results show that in various scenarios, the ridge regression screening procedure can provide substantially greater power than the traditional one-biomarker-at-a-time screening procedure in highly correlated data. We also exemplify our approach in two real clinical trial data applications. △ Less

Submitted 28 April, 2021; v1 submitted 24 April, 2020; originally announced April 2020.

Comments: Accepted version, to be published in Biometrics

arXiv:1904.07032 [pdf]

doi 10.1038/s41591-020-0870-z

Deep neural networks can predict mortality from 12-lead electrocardiogram voltage data

Authors: Sushravya Raghunath, Alvaro E. Ulloa Cerna, Linyuan **g, David P. vanMaanen, Joshua Stough, Dustin N. Hartzel, Joseph B. Leader, H. Lester Kirchner, Christopher W. Good, Aalpen A. Patel, Brian P. Delisle, Amro Alsaid, Dominik Beer, Christopher M. Haggerty, Brandon K. Fornwalt

Abstract: The electrocardiogram (ECG) is a widely-used medical test, typically consisting of 12 voltage versus time traces collected from surface recordings over the heart. Here we hypothesize that a deep neural network can predict an important future clinical event (one-year all-cause mortality) from ECG voltage-time traces. We show good performance for predicting one-year mortality with an average AUC of… ▽ More The electrocardiogram (ECG) is a widely-used medical test, typically consisting of 12 voltage versus time traces collected from surface recordings over the heart. Here we hypothesize that a deep neural network can predict an important future clinical event (one-year all-cause mortality) from ECG voltage-time traces. We show good performance for predicting one-year mortality with an average AUC of 0.85 from a model cross-validated on 1,775,926 12-lead resting ECGs, that were collected over a 34-year period in a large regional health system. Even within the large subset of ECGs interpreted as 'normal' by a physician (n=297,548), the model performance to predict one-year mortality remained high (AUC=0.84), and Cox Proportional Hazard model revealed a hazard ratio of 6.6 (p<0.005) for the two predicted groups (dead vs alive one year after ECG) over a 30-year follow-up period. A blinded survey of three cardiologists suggested that the patterns captured by the model were generally not visually apparent to cardiologists even after being shown 240 paired examples of labeled true positives (dead) and true negatives (alive). In summary, deep learning can add significant prognostic information to the interpretation of 12-lead resting ECGs, even in cases that are interpreted as 'normal' by physicians. △ Less

Submitted 11 May, 2020; v1 submitted 15 April, 2019; originally announced April 2019.

Comments: An updated version of this paper is now published with Nature Medicine (2020)

arXiv:1811.10553 [pdf]

A deep neural network to enhance prediction of 1-year mortality using echocardiographic videos of the heart

Authors: Alvaro Ulloa, Linyuan **g, Christopher W Good, David P vanMaanen, Sushravya Raghunath, Jonathan D Suever, Christopher D Nevius, Gregory J Wehner, Dustin Hartzel, Joseph B Leader, Amro Alsaid, Aalpen A Patel, H Lester Kirchner, Marios S Pattichis, Christopher M Haggerty, Brandon K Fornwalt

Abstract: Predicting future clinical events helps physicians guide appropriate intervention. Machine learning has tremendous promise to assist physicians with predictions based on the discovery of complex patterns from historical data, such as large, longitudinal electronic health records (EHR). This study is a first attempt to demonstrate such capabilities using raw echocardiographic videos of the heart. W… ▽ More Predicting future clinical events helps physicians guide appropriate intervention. Machine learning has tremendous promise to assist physicians with predictions based on the discovery of complex patterns from historical data, such as large, longitudinal electronic health records (EHR). This study is a first attempt to demonstrate such capabilities using raw echocardiographic videos of the heart. We show that a large dataset of 723,754 clinically-acquired echocardiographic videos (~45 million images) linked to longitudinal follow-up data in 27,028 patients can be used to train a deep neural network to predict 1-year mortality with good accuracy (area under the curve (AUC) in an independent test set = 0.839). Prediction accuracy was further improved by adding EHR data (AUC = 0.858). Finally, we demonstrate that the trained neural network was more accurate in mortality prediction than two expert cardiologists. These results highlight the potential of neural networks to add new power to clinical predictions. △ Less

Submitted 14 May, 2019; v1 submitted 26 November, 2018; originally announced November 2018.

Comments: We updated results with improved performance after dropout bug in tensorflow v1.12. We also added learning curves showing promise in video model with more samples

arXiv:1710.07841 [pdf, other]

doi 10.1049/iet-syb.2018.0008

Non-normality Can Facilitate Pulsing in Biomolecular Circuits

Authors: Abhilash Patel, Shaunak Sen

Abstract: Non-normality can underlie pulse dynamics in many engineering contexts. However, its role in pulses generated in biomolecular contexts is generally unclear. Here, we address this issue using the mathematical tools of linear algebra and systems theory on simple computational models of biomolecular circuits. We find that non-normality is present in standard models of feedforward loops. We used a gen… ▽ More Non-normality can underlie pulse dynamics in many engineering contexts. However, its role in pulses generated in biomolecular contexts is generally unclear. Here, we address this issue using the mathematical tools of linear algebra and systems theory on simple computational models of biomolecular circuits. We find that non-normality is present in standard models of feedforward loops. We used a generalized framework and pseudospectrum analysis to identify non-normality in larger biomolecular circuit models, finding that it correlates well with pulsing dynamics. Finally, we illustrate how these methods can be used to provide analytical support to numerical screens for pulsing dynamics as well as provide guidelines for design. △ Less

Submitted 2 June, 2018; v1 submitted 21 October, 2017; originally announced October 2017.

arXiv:1608.07499 [pdf]

Stem Cell Therapy for Alzheimer's Disease

Authors: Ankur Patel, Grishma joshi, Rupali Ugile

Abstract: The loss of neuronal cells in the central nervous system may happen in numerous neurodegenerative illnesses. Alzheimer's Disease (AD) is an intricate, irreversible, dynamic neurodegenerative sickness. It is the main source of age-related dementia, influencing roughly 5.3 million individuals in the United States alone. Promotion is a typical feeble ailment in individuals more than 65 years, bringin… ▽ More The loss of neuronal cells in the central nervous system may happen in numerous neurodegenerative illnesses. Alzheimer's Disease (AD) is an intricate, irreversible, dynamic neurodegenerative sickness. It is the main source of age-related dementia, influencing roughly 5.3 million individuals in the United States alone. Promotion is a typical feeble ailment in individuals more than 65 years, bringing on disability described by decrease in memory, failure to learn and do every day exercises, intellectual weakness and influences the personal satisfaction of patients. Pathologic qualities of AD are an irregular development of specific proteins called Beta-amyloid "plaques" and Tau "Tangles" in the mind. Notwithstanding, current treatments against AD are just to calm manifestations and palliative yet are not the cure and a few promising medications competitors have fizzled in late clinical trials. There is consequently a critical need to enhance our comprehension for pathogenesis of this sickness, making new and creative prescient models with powerful treatments. As of late, stem cell treatment has been appeared to have a potential way to deal with different illnesses, including neurodegenerative disorders. In light of the far reaching nature of AD pathology, stem cell substitution procedures have been seen as an extraordinarily difficult and impossible treatment approach. Stem Cell may likewise offer an effective new way to deal with model and concentrate AD. Patient derived induced Pluripotent Stem Cells (iPSCs), for instance, may propel our comprehension of disease mechanism. In this review we will examine the capability of stem cells to help in these testing tries. △ Less

Submitted 26 August, 2016; originally announced August 2016.

arXiv:1307.7941 [pdf, other]

Sibelia: A scalable and comprehensive synteny block generation tool for closely related microbial genomes

Authors: Ilya Minkin, Anand Patel, Mikhail Kolmogorov, Nikolay Vyahhi, Son Pham

Abstract: Comparing strains within the same microbial species has proven effective in the identification of genes and genomic regions responsible for virulence, as well as in the diagnosis and treatment of infectious diseases. In this paper, we present Sibelia, a tool for finding synteny blocks in multiple closely related microbial genomes using iterative de Bruijn graphs. Unlike most other tools, Sibelia c… ▽ More Comparing strains within the same microbial species has proven effective in the identification of genes and genomic regions responsible for virulence, as well as in the diagnosis and treatment of infectious diseases. In this paper, we present Sibelia, a tool for finding synteny blocks in multiple closely related microbial genomes using iterative de Bruijn graphs. Unlike most other tools, Sibelia can find synteny blocks that are repeated within genomes as well as blocks shared by multiple genomes. It represents synteny blocks in a hierarchy structure with multiple layers, each of which representing a different granularity level. Sibelia has been designed to work efficiently with a large number of microbial genomes; it finds synteny blocks in 31 S. aureus genomes within 31 minutes and in 59 E.coli genomes within 107 minutes on a standard desktop. Sibelia software is distributed under the GNU GPL v2 license and is available at: https://github.com/bioinf/Sibelia Sibelia's web-server is available at: http://etool.me/software/sibelia △ Less

Submitted 30 July, 2013; originally announced July 2013.

Comments: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013)

arXiv:1109.4431 [pdf, other]

doi 10.1021/jp2107523

Sitting at the edge: How biomolecules use hydrophobicity to tune their interactions and function

Authors: Amish J. Patel, Patrick Varilly, Sumanth N. Jamadagni, Michael F. Hagan, David Chandler, Shekhar Garde

Abstract: Water near hydrophobic surfaces is like that at a liquid-vapor interface, where fluctuations in water density are substantially enhanced compared to that in bulk water. Here we use molecular simulations with specialized sampling techniques to show that water density fluctuations are similarly enhanced, even near hydrophobic surfaces of complex biomolecules, situating them at the edge of a dewettin… ▽ More Water near hydrophobic surfaces is like that at a liquid-vapor interface, where fluctuations in water density are substantially enhanced compared to that in bulk water. Here we use molecular simulations with specialized sampling techniques to show that water density fluctuations are similarly enhanced, even near hydrophobic surfaces of complex biomolecules, situating them at the edge of a dewetting transition. Consequently, water near these surfaces is sensitive to subtle changes in surface conformation, topology, and chemistry, any of which can tip the balance towards or away from the wet state, and thus significantly alter biomolecular interactions and function. Our work also resolves the long-standing puzzle of why some biological surfaces dewet and other seemingly similar surfaces do not. △ Less

Submitted 20 September, 2011; originally announced September 2011.

Comments: 12 pages, 4 figures

Journal ref: J. Phys. Chem. B 116, 2498-2503 (2012)

arXiv:1104.1253 [pdf, ps, other]

doi 10.1063/1.3635850

Efficient Energy Transport in Photosynthesis: Roles of Coherence and Entanglement

Authors: Apoorva D. Patel

Abstract: Recently it has been discovered---contrary to expectations of physicists as well as biologists---that the energy transport during photosynthesis, from the chlorophyll pigment that captures the photon to the reaction centre where glucose is synthesised from carbon dioxide and water, is highly coherent even at ambient temperature and in the cellular environment. This process and the key molecular in… ▽ More Recently it has been discovered---contrary to expectations of physicists as well as biologists---that the energy transport during photosynthesis, from the chlorophyll pigment that captures the photon to the reaction centre where glucose is synthesised from carbon dioxide and water, is highly coherent even at ambient temperature and in the cellular environment. This process and the key molecular ingredients that it depends on are described. By looking at the process from the computer science view-point, we can study what has been optimised and how. A spatial search algorithmic model based on robust features of wave dynamics is presented. △ Less

Submitted 7 April, 2011; originally announced April 2011.

Comments: 6 pages, 3 figures, to appear in the proceedings of the Symposium "75 Years of Quantum Entanglement: Foundations and Information Theoretic Applications", January 2011, Kolkata, India

arXiv:0705.3895 [pdf, ps, other]

doi 10.1142/9781848162556_0010

Towards Understanding the Origin of Genetic Languages

Authors: Apoorva D. Patel

Abstract: Molecular biology is a nanotechnology that works--it has worked for billions of years and in an amazing variety of circumstances. At its core is a system for acquiring, processing and communicating information that is universal, from viruses and bacteria to human beings. Advances in genetics and experience in designing computers have taken us to a stage where we can understand the optimisation p… ▽ More Molecular biology is a nanotechnology that works--it has worked for billions of years and in an amazing variety of circumstances. At its core is a system for acquiring, processing and communicating information that is universal, from viruses and bacteria to human beings. Advances in genetics and experience in designing computers have taken us to a stage where we can understand the optimisation principles at the root of this system, from the availability of basic building blocks to the execution of tasks. The languages of DNA and proteins are argued to be the optimal solutions to the information processing tasks they carry out. The analysis also suggests simpler predecessors to these languages, and provides fascinating clues about their origin. Obviously, a comprehensive unraveling of the puzzle of life would have a lot to say about what we may design or convert ourselves into. △ Less

Submitted 28 October, 2008; v1 submitted 26 May, 2007; originally announced May 2007.

Comments: (v1) 33 pages, contributed chapter to "Quantum Aspects of Life", edited by D. Abbott, P. Davies and A. Pati, (v2) published version with some editing

arXiv:q-bio/0403036 [pdf, ps, other]

The Triplet Genetic Code had a Doublet Predecessor

Authors: Apoorva Patel

Abstract: Information theoretic analysis of genetic languages indicates that the naturally occurring 20 amino acids and the triplet genetic code arose by duplication of 10 amino acids of class-II and a doublet genetic code having codons NNY and anticodons $\overleftarrow{\rm GNN}$. Evidence for this scenario is presented based on the properties of aminoacyl-tRNA synthetases, amino acids and nucleotide bas… ▽ More Information theoretic analysis of genetic languages indicates that the naturally occurring 20 amino acids and the triplet genetic code arose by duplication of 10 amino acids of class-II and a doublet genetic code having codons NNY and anticodons $\overleftarrow{\rm GNN}$. Evidence for this scenario is presented based on the properties of aminoacyl-tRNA synthetases, amino acids and nucleotide bases. △ Less

Submitted 28 October, 2004; v1 submitted 25 March, 2004; originally announced March 2004.

Comments: 10 pages (v2) Expanded to include additional features, including likely relation to the operational code of the tRNA-acceptor stem. Version to be published in Journal of Theoretical Biology

Journal ref: Journal of Theoretical Biology 233 (2005) 527-532

arXiv:quant-ph/0206014 [pdf, ps, other]

Survival of the Fittest and Zero Sum Games

Authors: Apoorva Patel

Abstract: Competition for available resources is natural amongst coexisting species, and the fittest contenders dominate over the rest in evolution. The dynamics of this selection is studied using a simple linear model. It has similarities to features of quantum computation, in particular conservation laws leading to destructive interference. Compared to an altruistic scenario, competition introduces inst… ▽ More Competition for available resources is natural amongst coexisting species, and the fittest contenders dominate over the rest in evolution. The dynamics of this selection is studied using a simple linear model. It has similarities to features of quantum computation, in particular conservation laws leading to destructive interference. Compared to an altruistic scenario, competition introduces instability and eliminates the weaker species in a finite time. △ Less

Submitted 9 January, 2003; v1 submitted 3 June, 2002; originally announced June 2002.

Comments: 6 pages, formatted according to journal style. Special Issue on Game Theory and Evolutionary Processes. (v2) Published version. Some clarifications added. Topological interpretation pointed out

Report number: IISc-CTS-5/02

Journal ref: Fluctuation and Noise Letters 2 (2002) L279-L284

arXiv:quant-ph/0202022 [pdf, ps, other]

Mathematical Physics and Life

Authors: Apoorva Patel

Abstract: It is a fascinating subject to explore how well we can understand the processes of life on the basis of fundamental laws of physics. It is emphasised that viewing biological processes as manipulation of information extracts their essential features. This information processing can be analysed using well-known methods of computer science. The lowest level of biological information processing, inv… ▽ More It is a fascinating subject to explore how well we can understand the processes of life on the basis of fundamental laws of physics. It is emphasised that viewing biological processes as manipulation of information extracts their essential features. This information processing can be analysed using well-known methods of computer science. The lowest level of biological information processing, involving DNA and proteins, is the easiest one to link to physical properties. Physical underpinnings of the genetic information that could have led to the universal language of 4 nucleotide bases and 20 amino acids are pointed out. Generalisations of Boolean logic, especially features of quantum dynamics, play a crucial role. △ Less

Submitted 3 April, 2003; v1 submitted 4 February, 2002; originally announced February 2002.

Comments: 20 pages, latex, Review article to appear in {\it Computing and Information Sciences: Recent Trends}, ed. J.C.Misra, Narosa Publishing House. (v2) Typos corrected, published version, p.271-294 (2003)

Report number: IISc-CTS-1/02

arXiv:quant-ph/0105001 [pdf, ps, other]

Why Genetic Information Processing could have a Quantum Basis

Authors: Apoorva Patel

Abstract: Living organisms are not just random collections of organic molecules. There is continuous information processing going on in the apparent bouncing around of molecules of life. Optimisation criteria in this information processing can be searched for using the laws of physics. Quantum dynamics can explain why living organisms have 4 nucleotide bases and 20 amino acids, as optimal solutions of the… ▽ More Living organisms are not just random collections of organic molecules. There is continuous information processing going on in the apparent bouncing around of molecules of life. Optimisation criteria in this information processing can be searched for using the laws of physics. Quantum dynamics can explain why living organisms have 4 nucleotide bases and 20 amino acids, as optimal solutions of the molecular assembly process. Experiments should be able to tell whether evolution indeed took advantage of quantum dynamics or not. △ Less

Submitted 11 June, 2001; v1 submitted 1 May, 2001; originally announced May 2001.

Comments: 6 pages, latex, formatted according to journal style. This is an introductory article, aimed at biologists. (v2) Minor grammatical changes. Title slightly modified. Published version

Report number: IISc-CTS-5/01

Journal ref: Journal of Biosciences 26 (2001) 145-151

arXiv:quant-ph/0103017 [pdf, ps, other]

Carbon--The First Frontier of Information Processing

Authors: Apoorva Patel

Abstract: Information is often encoded as an aperiodic chain of building blocks. Modern digital computers use bits as the building blocks, but in general the choice of building blocks depends on the nature of the information to be encoded. What are the optimal building blocks to encode structural information? This can be analysed by substituting the operations of addition and multiplication of conventiona… ▽ More Information is often encoded as an aperiodic chain of building blocks. Modern digital computers use bits as the building blocks, but in general the choice of building blocks depends on the nature of the information to be encoded. What are the optimal building blocks to encode structural information? This can be analysed by substituting the operations of addition and multiplication of conventional arithmetic with translation and rotation. It is argued that at the molecular level, the best component for encoding discretised structural information is carbon. Living organisms discovered this billions of years ago, and used carbon as the back-bone for constructing proteins that function according to their structure. Structural analysis of polypeptide chains shows that an efficient and versatile structural language of 20 building blocks is needed to implement all the tasks carried out by proteins. Properties of amino acids indicate that the present triplet genetic code was preceded by a more primitive one, coding for 10 amino acids using two nucleotide bases. △ Less

Submitted 10 June, 2002; v1 submitted 5 March, 2001; originally announced March 2001.

Comments: (v1) 9 pages, revtex. (v2) 10 pages. Several arguments expanded to make the article self-contained and to increase clarity. Applications pointed out. (v3) 11 pages. Published version. Well-known properties of proteins shifted to an appendix. Reformatted according to journal style

Report number: IISc-CTS-8/01

Journal ref: Journal of Biosciences 27 (2002) 207-218

arXiv:quant-ph/0102034 [pdf, ps, other]

Testing Quantum Dynamics in Genetic Information Processing

Authors: Apoorva Patel

Abstract: Does quantum dynamics play a role in DNA replication? What type of tests would reveal that? Some statistical checks that distinguish classical and quantum dynamics in DNA replication are proposed. Does quantum dynamics play a role in DNA replication? What type of tests would reveal that? Some statistical checks that distinguish classical and quantum dynamics in DNA replication are proposed. △ Less

Submitted 26 July, 2001; v1 submitted 6 February, 2001; originally announced February 2001.

Comments: 4 pages, latex. (v2) Several points elaborated. Published version, formatted according to the journal style

Report number: IISc-CTS-4/01

Journal ref: Journal of Genetics 80 (2001) 39-43

arXiv:quant-ph/0002037 [pdf, ps, other]

doi 10.1007/s12043-001-0131-8

Quantum Algorithms and the Genetic Code

Authors: Apoorva Patel

Abstract: Replication of DNA and synthesis of proteins are studied from the view-point of quantum database search. Identification of a base-pairing with a quantum query gives a natural (and first ever) explanation of why living organisms have 4 nucleotide bases and 20 amino acids. It is amazing that these numbers arise as solutions to an optimisation problem. Components of the DNA structure which implemen… ▽ More Replication of DNA and synthesis of proteins are studied from the view-point of quantum database search. Identification of a base-pairing with a quantum query gives a natural (and first ever) explanation of why living organisms have 4 nucleotide bases and 20 amino acids. It is amazing that these numbers arise as solutions to an optimisation problem. Components of the DNA structure which implement Grover's algorithm are identified, and a physical scenario is presented for the execution of the quantum algorithm. It is proposed that enzymes play a crucial role in maintaining quantum coherence of the process. Experimental tests that can verify this scenario are pointed out. △ Less

Submitted 6 February, 2001; v1 submitted 14 February, 2000; originally announced February 2000.

Comments: Revtex, 11 pages, 3 figures. Invited lectures presented at the Winter Institute on ``Foundations of Quantum Theory and Quantum Optics'', 1-13 January 2000, S.N. Bose National Centre for Basic Sciences, Calcutta, India. To appear in the proceedings. (v2) Corrected typo in Fig.2 caption. (v3) Published version. Corrected typos and added some comments about enzymes

Report number: IISc-CTS-2/00

Journal ref: Pramana 56 (2001) 365

Showing 1–22 of 22 results for author: Patel, A