-
A frequentist test of proportional colocalization after selecting relevant genetic variants
Authors:
Ashish Patel,
John C. Whittaker,
Stephen Burgess
Abstract:
Colocalization analyses assess whether two traits are affected by the same or distinct causal genetic variants in a single gene region. A class of Bayesian colocalization tests are now routinely used in practice; for example, for genetic analyses in drug development pipelines. In this work, we consider an alternative frequentist approach to colocalization testing that examines the proportionality…
▽ More
Colocalization analyses assess whether two traits are affected by the same or distinct causal genetic variants in a single gene region. A class of Bayesian colocalization tests are now routinely used in practice; for example, for genetic analyses in drug development pipelines. In this work, we consider an alternative frequentist approach to colocalization testing that examines the proportionality of genetic associations with each trait. The proportional colocalization approach uses markedly different assumptions to Bayesian colocalization tests, and therefore can provide valuable complementary evidence in cases where Bayesian colocalization results are inconclusive or sensitive to priors. We propose a novel conditional test of proportional colocalization, prop-coloc-cond, that aims to account for the uncertainty in variant selection, in order to recover accurate type I error control. The test can be implemented straightforwardly, requiring only summary data on genetic associations. Simulation evidence and an empirical investigation into GLP1R gene expression demonstrates how tests of proportional colocalization can offer important insights in conjunction with Bayesian colocalization tests.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
RACER: An LLM-powered Methodology for Scalable Analysis of Semi-structured Mental Health Interviews
Authors:
Satpreet Harcharan Singh,
Kevin Jiang,
Kanchan Bhasin,
Ashutosh Sabharwal,
Nidal Moukaddam,
Ankit B Patel
Abstract:
Semi-structured interviews (SSIs) are a commonly employed data-collection method in healthcare research, offering in-depth qualitative insights into subject experiences. Despite their value, the manual analysis of SSIs is notoriously time-consuming and labor-intensive, in part due to the difficulty of extracting and categorizing emotional responses, and challenges in scaling human evaluation for l…
▽ More
Semi-structured interviews (SSIs) are a commonly employed data-collection method in healthcare research, offering in-depth qualitative insights into subject experiences. Despite their value, the manual analysis of SSIs is notoriously time-consuming and labor-intensive, in part due to the difficulty of extracting and categorizing emotional responses, and challenges in scaling human evaluation for large populations. In this study, we develop RACER, a Large Language Model (LLM) based expert-guided automated pipeline that efficiently converts raw interview transcripts into insightful domain-relevant themes and sub-themes. We used RACER to analyze SSIs conducted with 93 healthcare professionals and trainees to assess the broad personal and professional mental health impacts of the COVID-19 crisis. RACER achieves moderately high agreement with two human evaluators (72%), which approaches the human inter-rater agreement (77%). Interestingly, LLMs and humans struggle with similar content involving nuanced emotional, ambivalent/dialectical, and psychological statements. Our study highlights the opportunities and challenges in using LLMs to improve research efficiency and opens new avenues for scalable analysis of SSIs in healthcare research.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Navigating protein landscapes with a machine-learned transferable coarse-grained model
Authors:
Nicholas E. Charron,
Felix Musil,
Andrea Guljas,
Yaoyi Chen,
Klara Bonneau,
Aldo S. Pasos-Trejo,
Jacopo Venturin,
Daria Gusew,
Iryna Zaporozhets,
Andreas Krämer,
Clark Templeton,
Atharva Kelkar,
Aleksander E. P. Durumeric,
Simon Olsson,
Adrià Pérez,
Maciej Majewski,
Brooke E. Husic,
Ankit Patel,
Gianni De Fabritiis,
Frank Noé,
Cecilia Clementi
Abstract:
The most popular and universally predictive protein simulation models employ all-atom molecular dynamics (MD), but they come at extreme computational cost. The development of a universal, computationally efficient coarse-grained (CG) model with similar prediction performance has been a long-standing challenge. By combining recent deep learning methods with a large and diverse training set of all-a…
▽ More
The most popular and universally predictive protein simulation models employ all-atom molecular dynamics (MD), but they come at extreme computational cost. The development of a universal, computationally efficient coarse-grained (CG) model with similar prediction performance has been a long-standing challenge. By combining recent deep learning methods with a large and diverse training set of all-atom protein simulations, we here develop a bottom-up CG force field with chemical transferability, which can be used for extrapolative molecular dynamics on new sequences not used during model parametrization. We demonstrate that the model successfully predicts folded structures, intermediates, metastable folded and unfolded basins, and the fluctuations of intrinsically disordered proteins while it is several orders of magnitude faster than an all-atom model. This showcases the feasibility of a universal and computationally efficient machine-learned CG model for proteins.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution
Authors:
Eric Nguyen,
Michael Poli,
Marjan Faizi,
Armin Thomas,
Callum Birch-Sykes,
Michael Wornow,
Aman Patel,
Clayton Rabideau,
Stefano Massaroli,
Yoshua Bengio,
Stefano Ermon,
Stephen A. Baccus,
Chris Ré
Abstract:
Genomic (DNA) sequences encode an enormous amount of information for gene regulation and protein synthesis. Similar to natural language models, researchers have proposed foundation models in genomics to learn generalizable features from unlabeled genome data that can then be fine-tuned for downstream tasks such as identifying regulatory elements. Due to the quadratic scaling of attention, previous…
▽ More
Genomic (DNA) sequences encode an enormous amount of information for gene regulation and protein synthesis. Similar to natural language models, researchers have proposed foundation models in genomics to learn generalizable features from unlabeled genome data that can then be fine-tuned for downstream tasks such as identifying regulatory elements. Due to the quadratic scaling of attention, previous Transformer-based genomic models have used 512 to 4k tokens as context (<0.001% of the human genome), significantly limiting the modeling of long-range interactions in DNA. In addition, these methods rely on tokenizers or fixed k-mers to aggregate meaningful DNA units, losing single nucleotide resolution where subtle genetic variations can completely alter protein function via single nucleotide polymorphisms (SNPs). Recently, Hyena, a large language model based on implicit convolutions was shown to match attention in quality while allowing longer context lengths and lower time complexity. Leveraging Hyena's new long-range capabilities, we present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level - an up to 500x increase over previous dense attention-based models. HyenaDNA scales sub-quadratically in sequence length (training up to 160x faster than Transformer), uses single nucleotide tokens, and has full global context at each layer. We explore what longer context enables - including the first use of in-context learning in genomics. On fine-tuned benchmarks from the Nucleotide Transformer, HyenaDNA reaches state-of-the-art (SotA) on 12 of 18 datasets using a model with orders of magnitude less parameters and pretraining data. On the GenomicBenchmarks, HyenaDNA surpasses SotA on 7 of 8 datasets on average by +10 accuracy points. Code at https://github.com/HazyResearch/hyena-dna.
△ Less
Submitted 14 November, 2023; v1 submitted 27 June, 2023;
originally announced June 2023.
-
Transforming Gait: Video-Based Spatiotemporal Gait Analysis
Authors:
R. James Cotton,
Emoonah McClerklin,
Anthony Cimorelli,
Ankit Patel,
Tasos Karakostas
Abstract:
Human pose estimation from monocular video is a rapidly advancing field that offers great promise to human movement science and rehabilitation. This potential is tempered by the smaller body of work ensuring the outputs are clinically meaningful and properly calibrated. Gait analysis, typically performed in a dedicated lab, produces precise measurements including kinematics and step timing. Using…
▽ More
Human pose estimation from monocular video is a rapidly advancing field that offers great promise to human movement science and rehabilitation. This potential is tempered by the smaller body of work ensuring the outputs are clinically meaningful and properly calibrated. Gait analysis, typically performed in a dedicated lab, produces precise measurements including kinematics and step timing. Using over 7000 monocular video from an instrumented gait analysis lab, we trained a neural network to map 3D joint trajectories and the height of individuals onto interpretable biomechanical outputs including gait cycle timing and sagittal plane joint kinematics and spatiotemporal trajectories. This task specific layer produces accurate estimates of the timing of foot contact and foot off events. After parsing the kinematic outputs into individual gait cycles, it also enables accurate cycle-by-cycle estimates of cadence, step time, double and single support time, walking speed and step length.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
AcinoSet: A 3D Pose Estimation Dataset and Baseline Models for Cheetahs in the Wild
Authors:
Daniel Joska,
Liam Clark,
Naoya Muramatsu,
Ricardo Jericevich,
Fred Nicolls,
Alexander Mathis,
Mackenzie W. Mathis,
Amir Patel
Abstract:
Animals are capable of extreme agility, yet understanding their complex dynamics, which have ecological, biomechanical and evolutionary implications, remains challenging. Being able to study this incredible agility will be critical for the development of next-generation autonomous legged robots. In particular, the cheetah (acinonyx jubatus) is supremely fast and maneuverable, yet quantifying its w…
▽ More
Animals are capable of extreme agility, yet understanding their complex dynamics, which have ecological, biomechanical and evolutionary implications, remains challenging. Being able to study this incredible agility will be critical for the development of next-generation autonomous legged robots. In particular, the cheetah (acinonyx jubatus) is supremely fast and maneuverable, yet quantifying its whole-body 3D kinematic data during locomotion in the wild remains a challenge, even with new deep learning-based methods. In this work we present an extensive dataset of free-running cheetahs in the wild, called AcinoSet, that contains 119,490 frames of multi-view synchronized high-speed video footage, camera calibration files and 7,588 human-annotated frames. We utilize markerless animal pose estimation to provide 2D keypoints. Then, we use three methods that serve as strong baselines for 3D pose estimation tool development: traditional sparse bundle adjustment, an Extended Kalman Filter, and a trajectory optimization-based method we call Full Trajectory Estimation. The resulting 3D trajectories, human-checked 3D ground truth, and an interactive tool to inspect the data is also provided. We believe this dataset will be useful for a diverse range of fields such as ecology, neuroscience, robotics, biomechanics as well as computer vision.
△ Less
Submitted 24 March, 2021;
originally announced March 2021.
-
Two-Stage Penalized Regression Screening to Detect Biomarker-Treatment Interactions in Randomized Clinical Trials
Authors:
Jixiong Wang,
Ashish Patel,
James M. S. Wason,
Paul J. Newcombe
Abstract:
High-dimensional biomarkers such as genomics are increasingly being measured in randomized clinical trials. Consequently, there is a growing interest in develo** methods that improve the power to detect biomarker-treatment interactions. We adapt recently proposed two-stage interaction detecting procedures in the setting of randomized clinical trials. We also propose a new stage 1 multivariate sc…
▽ More
High-dimensional biomarkers such as genomics are increasingly being measured in randomized clinical trials. Consequently, there is a growing interest in develo** methods that improve the power to detect biomarker-treatment interactions. We adapt recently proposed two-stage interaction detecting procedures in the setting of randomized clinical trials. We also propose a new stage 1 multivariate screening strategy using ridge regression to account for correlations among biomarkers. For this multivariate screening, we prove the asymptotic between-stage independence, required for family-wise error rate control, under biomarker-treatment independence. Simulation results show that in various scenarios, the ridge regression screening procedure can provide substantially greater power than the traditional one-biomarker-at-a-time screening procedure in highly correlated data. We also exemplify our approach in two real clinical trial data applications.
△ Less
Submitted 28 April, 2021; v1 submitted 24 April, 2020;
originally announced April 2020.
-
Deep neural networks can predict mortality from 12-lead electrocardiogram voltage data
Authors:
Sushravya Raghunath,
Alvaro E. Ulloa Cerna,
Linyuan **g,
David P. vanMaanen,
Joshua Stough,
Dustin N. Hartzel,
Joseph B. Leader,
H. Lester Kirchner,
Christopher W. Good,
Aalpen A. Patel,
Brian P. Delisle,
Amro Alsaid,
Dominik Beer,
Christopher M. Haggerty,
Brandon K. Fornwalt
Abstract:
The electrocardiogram (ECG) is a widely-used medical test, typically consisting of 12 voltage versus time traces collected from surface recordings over the heart. Here we hypothesize that a deep neural network can predict an important future clinical event (one-year all-cause mortality) from ECG voltage-time traces. We show good performance for predicting one-year mortality with an average AUC of…
▽ More
The electrocardiogram (ECG) is a widely-used medical test, typically consisting of 12 voltage versus time traces collected from surface recordings over the heart. Here we hypothesize that a deep neural network can predict an important future clinical event (one-year all-cause mortality) from ECG voltage-time traces. We show good performance for predicting one-year mortality with an average AUC of 0.85 from a model cross-validated on 1,775,926 12-lead resting ECGs, that were collected over a 34-year period in a large regional health system. Even within the large subset of ECGs interpreted as 'normal' by a physician (n=297,548), the model performance to predict one-year mortality remained high (AUC=0.84), and Cox Proportional Hazard model revealed a hazard ratio of 6.6 (p<0.005) for the two predicted groups (dead vs alive one year after ECG) over a 30-year follow-up period. A blinded survey of three cardiologists suggested that the patterns captured by the model were generally not visually apparent to cardiologists even after being shown 240 paired examples of labeled true positives (dead) and true negatives (alive). In summary, deep learning can add significant prognostic information to the interpretation of 12-lead resting ECGs, even in cases that are interpreted as 'normal' by physicians.
△ Less
Submitted 11 May, 2020; v1 submitted 15 April, 2019;
originally announced April 2019.
-
A deep neural network to enhance prediction of 1-year mortality using echocardiographic videos of the heart
Authors:
Alvaro Ulloa,
Linyuan **g,
Christopher W Good,
David P vanMaanen,
Sushravya Raghunath,
Jonathan D Suever,
Christopher D Nevius,
Gregory J Wehner,
Dustin Hartzel,
Joseph B Leader,
Amro Alsaid,
Aalpen A Patel,
H Lester Kirchner,
Marios S Pattichis,
Christopher M Haggerty,
Brandon K Fornwalt
Abstract:
Predicting future clinical events helps physicians guide appropriate intervention. Machine learning has tremendous promise to assist physicians with predictions based on the discovery of complex patterns from historical data, such as large, longitudinal electronic health records (EHR). This study is a first attempt to demonstrate such capabilities using raw echocardiographic videos of the heart. W…
▽ More
Predicting future clinical events helps physicians guide appropriate intervention. Machine learning has tremendous promise to assist physicians with predictions based on the discovery of complex patterns from historical data, such as large, longitudinal electronic health records (EHR). This study is a first attempt to demonstrate such capabilities using raw echocardiographic videos of the heart. We show that a large dataset of 723,754 clinically-acquired echocardiographic videos (~45 million images) linked to longitudinal follow-up data in 27,028 patients can be used to train a deep neural network to predict 1-year mortality with good accuracy (area under the curve (AUC) in an independent test set = 0.839). Prediction accuracy was further improved by adding EHR data (AUC = 0.858). Finally, we demonstrate that the trained neural network was more accurate in mortality prediction than two expert cardiologists. These results highlight the potential of neural networks to add new power to clinical predictions.
△ Less
Submitted 14 May, 2019; v1 submitted 26 November, 2018;
originally announced November 2018.
-
Non-normality Can Facilitate Pulsing in Biomolecular Circuits
Authors:
Abhilash Patel,
Shaunak Sen
Abstract:
Non-normality can underlie pulse dynamics in many engineering contexts. However, its role in pulses generated in biomolecular contexts is generally unclear. Here, we address this issue using the mathematical tools of linear algebra and systems theory on simple computational models of biomolecular circuits. We find that non-normality is present in standard models of feedforward loops. We used a gen…
▽ More
Non-normality can underlie pulse dynamics in many engineering contexts. However, its role in pulses generated in biomolecular contexts is generally unclear. Here, we address this issue using the mathematical tools of linear algebra and systems theory on simple computational models of biomolecular circuits. We find that non-normality is present in standard models of feedforward loops. We used a generalized framework and pseudospectrum analysis to identify non-normality in larger biomolecular circuit models, finding that it correlates well with pulsing dynamics. Finally, we illustrate how these methods can be used to provide analytical support to numerical screens for pulsing dynamics as well as provide guidelines for design.
△ Less
Submitted 2 June, 2018; v1 submitted 21 October, 2017;
originally announced October 2017.
-
Stem Cell Therapy for Alzheimer's Disease
Authors:
Ankur Patel,
Grishma joshi,
Rupali Ugile
Abstract:
The loss of neuronal cells in the central nervous system may happen in numerous neurodegenerative illnesses. Alzheimer's Disease (AD) is an intricate, irreversible, dynamic neurodegenerative sickness. It is the main source of age-related dementia, influencing roughly 5.3 million individuals in the United States alone. Promotion is a typical feeble ailment in individuals more than 65 years, bringin…
▽ More
The loss of neuronal cells in the central nervous system may happen in numerous neurodegenerative illnesses. Alzheimer's Disease (AD) is an intricate, irreversible, dynamic neurodegenerative sickness. It is the main source of age-related dementia, influencing roughly 5.3 million individuals in the United States alone. Promotion is a typical feeble ailment in individuals more than 65 years, bringing on disability described by decrease in memory, failure to learn and do every day exercises, intellectual weakness and influences the personal satisfaction of patients. Pathologic qualities of AD are an irregular development of specific proteins called Beta-amyloid "plaques" and Tau "Tangles" in the mind. Notwithstanding, current treatments against AD are just to calm manifestations and palliative yet are not the cure and a few promising medications competitors have fizzled in late clinical trials. There is consequently a critical need to enhance our comprehension for pathogenesis of this sickness, making new and creative prescient models with powerful treatments. As of late, stem cell treatment has been appeared to have a potential way to deal with different illnesses, including neurodegenerative disorders. In light of the far reaching nature of AD pathology, stem cell substitution procedures have been seen as an extraordinarily difficult and impossible treatment approach. Stem Cell may likewise offer an effective new way to deal with model and concentrate AD. Patient derived induced Pluripotent Stem Cells (iPSCs), for instance, may propel our comprehension of disease mechanism. In this review we will examine the capability of stem cells to help in these testing tries.
△ Less
Submitted 26 August, 2016;
originally announced August 2016.
-
Sibelia: A scalable and comprehensive synteny block generation tool for closely related microbial genomes
Authors:
Ilya Minkin,
Anand Patel,
Mikhail Kolmogorov,
Nikolay Vyahhi,
Son Pham
Abstract:
Comparing strains within the same microbial species has proven effective in the identification of genes and genomic regions responsible for virulence, as well as in the diagnosis and treatment of infectious diseases. In this paper, we present Sibelia, a tool for finding synteny blocks in multiple closely related microbial genomes using iterative de Bruijn graphs. Unlike most other tools, Sibelia c…
▽ More
Comparing strains within the same microbial species has proven effective in the identification of genes and genomic regions responsible for virulence, as well as in the diagnosis and treatment of infectious diseases. In this paper, we present Sibelia, a tool for finding synteny blocks in multiple closely related microbial genomes using iterative de Bruijn graphs. Unlike most other tools, Sibelia can find synteny blocks that are repeated within genomes as well as blocks shared by multiple genomes. It represents synteny blocks in a hierarchy structure with multiple layers, each of which representing a different granularity level. Sibelia has been designed to work efficiently with a large number of microbial genomes; it finds synteny blocks in 31 S. aureus genomes within 31 minutes and in 59 E.coli genomes within 107 minutes on a standard desktop. Sibelia software is distributed under the GNU GPL v2 license and is available at: https://github.com/bioinf/Sibelia Sibelia's web-server is available at: http://etool.me/software/sibelia
△ Less
Submitted 30 July, 2013;
originally announced July 2013.
-
Sitting at the edge: How biomolecules use hydrophobicity to tune their interactions and function
Authors:
Amish J. Patel,
Patrick Varilly,
Sumanth N. Jamadagni,
Michael F. Hagan,
David Chandler,
Shekhar Garde
Abstract:
Water near hydrophobic surfaces is like that at a liquid-vapor interface, where fluctuations in water density are substantially enhanced compared to that in bulk water. Here we use molecular simulations with specialized sampling techniques to show that water density fluctuations are similarly enhanced, even near hydrophobic surfaces of complex biomolecules, situating them at the edge of a dewettin…
▽ More
Water near hydrophobic surfaces is like that at a liquid-vapor interface, where fluctuations in water density are substantially enhanced compared to that in bulk water. Here we use molecular simulations with specialized sampling techniques to show that water density fluctuations are similarly enhanced, even near hydrophobic surfaces of complex biomolecules, situating them at the edge of a dewetting transition. Consequently, water near these surfaces is sensitive to subtle changes in surface conformation, topology, and chemistry, any of which can tip the balance towards or away from the wet state, and thus significantly alter biomolecular interactions and function. Our work also resolves the long-standing puzzle of why some biological surfaces dewet and other seemingly similar surfaces do not.
△ Less
Submitted 20 September, 2011;
originally announced September 2011.
-
Efficient Energy Transport in Photosynthesis: Roles of Coherence and Entanglement
Authors:
Apoorva D. Patel
Abstract:
Recently it has been discovered---contrary to expectations of physicists as well as biologists---that the energy transport during photosynthesis, from the chlorophyll pigment that captures the photon to the reaction centre where glucose is synthesised from carbon dioxide and water, is highly coherent even at ambient temperature and in the cellular environment. This process and the key molecular in…
▽ More
Recently it has been discovered---contrary to expectations of physicists as well as biologists---that the energy transport during photosynthesis, from the chlorophyll pigment that captures the photon to the reaction centre where glucose is synthesised from carbon dioxide and water, is highly coherent even at ambient temperature and in the cellular environment. This process and the key molecular ingredients that it depends on are described. By looking at the process from the computer science view-point, we can study what has been optimised and how. A spatial search algorithmic model based on robust features of wave dynamics is presented.
△ Less
Submitted 7 April, 2011;
originally announced April 2011.
-
Towards Understanding the Origin of Genetic Languages
Authors:
Apoorva D. Patel
Abstract:
Molecular biology is a nanotechnology that works--it has worked for billions of years and in an amazing variety of circumstances. At its core is a system for acquiring, processing and communicating information that is universal, from viruses and bacteria to human beings. Advances in genetics and experience in designing computers have taken us to a stage where we can understand the optimisation p…
▽ More
Molecular biology is a nanotechnology that works--it has worked for billions of years and in an amazing variety of circumstances. At its core is a system for acquiring, processing and communicating information that is universal, from viruses and bacteria to human beings. Advances in genetics and experience in designing computers have taken us to a stage where we can understand the optimisation principles at the root of this system, from the availability of basic building blocks to the execution of tasks. The languages of DNA and proteins are argued to be the optimal solutions to the information processing tasks they carry out. The analysis also suggests simpler predecessors to these languages, and provides fascinating clues about their origin. Obviously, a comprehensive unraveling of the puzzle of life would have a lot to say about what we may design or convert ourselves into.
△ Less
Submitted 28 October, 2008; v1 submitted 26 May, 2007;
originally announced May 2007.
-
The Triplet Genetic Code had a Doublet Predecessor
Authors:
Apoorva Patel
Abstract:
Information theoretic analysis of genetic languages indicates that the naturally occurring 20 amino acids and the triplet genetic code arose by duplication of 10 amino acids of class-II and a doublet genetic code having codons NNY and anticodons $\overleftarrow{\rm GNN}$. Evidence for this scenario is presented based on the properties of aminoacyl-tRNA synthetases, amino acids and nucleotide bas…
▽ More
Information theoretic analysis of genetic languages indicates that the naturally occurring 20 amino acids and the triplet genetic code arose by duplication of 10 amino acids of class-II and a doublet genetic code having codons NNY and anticodons $\overleftarrow{\rm GNN}$. Evidence for this scenario is presented based on the properties of aminoacyl-tRNA synthetases, amino acids and nucleotide bases.
△ Less
Submitted 28 October, 2004; v1 submitted 25 March, 2004;
originally announced March 2004.
-
Survival of the Fittest and Zero Sum Games
Authors:
Apoorva Patel
Abstract:
Competition for available resources is natural amongst coexisting species, and the fittest contenders dominate over the rest in evolution. The dynamics of this selection is studied using a simple linear model. It has similarities to features of quantum computation, in particular conservation laws leading to destructive interference. Compared to an altruistic scenario, competition introduces inst…
▽ More
Competition for available resources is natural amongst coexisting species, and the fittest contenders dominate over the rest in evolution. The dynamics of this selection is studied using a simple linear model. It has similarities to features of quantum computation, in particular conservation laws leading to destructive interference. Compared to an altruistic scenario, competition introduces instability and eliminates the weaker species in a finite time.
△ Less
Submitted 9 January, 2003; v1 submitted 3 June, 2002;
originally announced June 2002.
-
Mathematical Physics and Life
Authors:
Apoorva Patel
Abstract:
It is a fascinating subject to explore how well we can understand the processes of life on the basis of fundamental laws of physics. It is emphasised that viewing biological processes as manipulation of information extracts their essential features. This information processing can be analysed using well-known methods of computer science. The lowest level of biological information processing, inv…
▽ More
It is a fascinating subject to explore how well we can understand the processes of life on the basis of fundamental laws of physics. It is emphasised that viewing biological processes as manipulation of information extracts their essential features. This information processing can be analysed using well-known methods of computer science. The lowest level of biological information processing, involving DNA and proteins, is the easiest one to link to physical properties. Physical underpinnings of the genetic information that could have led to the universal language of 4 nucleotide bases and 20 amino acids are pointed out. Generalisations of Boolean logic, especially features of quantum dynamics, play a crucial role.
△ Less
Submitted 3 April, 2003; v1 submitted 4 February, 2002;
originally announced February 2002.
-
Why Genetic Information Processing could have a Quantum Basis
Authors:
Apoorva Patel
Abstract:
Living organisms are not just random collections of organic molecules. There is continuous information processing going on in the apparent bouncing around of molecules of life. Optimisation criteria in this information processing can be searched for using the laws of physics. Quantum dynamics can explain why living organisms have 4 nucleotide bases and 20 amino acids, as optimal solutions of the…
▽ More
Living organisms are not just random collections of organic molecules. There is continuous information processing going on in the apparent bouncing around of molecules of life. Optimisation criteria in this information processing can be searched for using the laws of physics. Quantum dynamics can explain why living organisms have 4 nucleotide bases and 20 amino acids, as optimal solutions of the molecular assembly process. Experiments should be able to tell whether evolution indeed took advantage of quantum dynamics or not.
△ Less
Submitted 11 June, 2001; v1 submitted 1 May, 2001;
originally announced May 2001.
-
Carbon--The First Frontier of Information Processing
Authors:
Apoorva Patel
Abstract:
Information is often encoded as an aperiodic chain of building blocks. Modern digital computers use bits as the building blocks, but in general the choice of building blocks depends on the nature of the information to be encoded. What are the optimal building blocks to encode structural information? This can be analysed by substituting the operations of addition and multiplication of conventiona…
▽ More
Information is often encoded as an aperiodic chain of building blocks. Modern digital computers use bits as the building blocks, but in general the choice of building blocks depends on the nature of the information to be encoded. What are the optimal building blocks to encode structural information? This can be analysed by substituting the operations of addition and multiplication of conventional arithmetic with translation and rotation. It is argued that at the molecular level, the best component for encoding discretised structural information is carbon. Living organisms discovered this billions of years ago, and used carbon as the back-bone for constructing proteins that function according to their structure. Structural analysis of polypeptide chains shows that an efficient and versatile structural language of 20 building blocks is needed to implement all the tasks carried out by proteins. Properties of amino acids indicate that the present triplet genetic code was preceded by a more primitive one, coding for 10 amino acids using two nucleotide bases.
△ Less
Submitted 10 June, 2002; v1 submitted 5 March, 2001;
originally announced March 2001.
-
Testing Quantum Dynamics in Genetic Information Processing
Authors:
Apoorva Patel
Abstract:
Does quantum dynamics play a role in DNA replication? What type of tests would reveal that? Some statistical checks that distinguish classical and quantum dynamics in DNA replication are proposed.
Does quantum dynamics play a role in DNA replication? What type of tests would reveal that? Some statistical checks that distinguish classical and quantum dynamics in DNA replication are proposed.
△ Less
Submitted 26 July, 2001; v1 submitted 6 February, 2001;
originally announced February 2001.
-
Quantum Algorithms and the Genetic Code
Authors:
Apoorva Patel
Abstract:
Replication of DNA and synthesis of proteins are studied from the view-point of quantum database search. Identification of a base-pairing with a quantum query gives a natural (and first ever) explanation of why living organisms have 4 nucleotide bases and 20 amino acids. It is amazing that these numbers arise as solutions to an optimisation problem. Components of the DNA structure which implemen…
▽ More
Replication of DNA and synthesis of proteins are studied from the view-point of quantum database search. Identification of a base-pairing with a quantum query gives a natural (and first ever) explanation of why living organisms have 4 nucleotide bases and 20 amino acids. It is amazing that these numbers arise as solutions to an optimisation problem. Components of the DNA structure which implement Grover's algorithm are identified, and a physical scenario is presented for the execution of the quantum algorithm. It is proposed that enzymes play a crucial role in maintaining quantum coherence of the process. Experimental tests that can verify this scenario are pointed out.
△ Less
Submitted 6 February, 2001; v1 submitted 14 February, 2000;
originally announced February 2000.