-
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA
Authors:
Kaiyuan Yang,
Fabio Musio,
Yihui Ma,
Norman Juchler,
Johannes C. Paetzold,
Rami Al-Maskari,
Luciano Höher,
Hongwei Bran Li,
Ibrahim Ethem Hamamci,
Anjany Sekuboyina,
Suprosanna Shit,
Hou**g Huang,
Chinmay Prabhakar,
Ezequiel de la Rosa,
Diana Waldmannstetter,
Florian Kofler,
Fernando Navarro,
Martin Menten,
Ivan Ezhov,
Daniel Rueckert,
Iris Vos,
Ynte Ruigrok,
Birgitta Velthuis,
Hugo Kuijf,
Julien Hämmerli
, et al. (59 additional authors not shown)
Abstract:
The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neuro-vascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two angiographic imaging modaliti…
▽ More
The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neuro-vascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two angiographic imaging modalities, magnetic resonance angiography (MRA) and computed tomography angiography (CTA), but there exist limited public datasets with annotations on CoW anatomy, especially for CTA. Therefore we organized the TopCoW Challenge in 2023 with the release of an annotated CoW dataset. The TopCoW dataset was the first public dataset with voxel-level annotations for thirteen possible CoW vessel components, enabled by virtual-reality (VR) technology. It was also the first large dataset with paired MRA and CTA from the same patients. TopCoW challenge formalized the CoW characterization problem as a multiclass anatomical segmentation task with an emphasis on topological metrics. We invited submissions worldwide for the CoW segmentation task, which attracted over 140 registered participants from four continents. The top performing teams managed to segment many CoW components to Dice scores around 90%, but with lower scores for communicating arteries and rare variants. There were also topological mistakes for predictions with high Dice scores. Additional topological analysis revealed further areas for improvement in detecting certain CoW components and matching CoW variant topology accurately. TopCoW represented a first attempt at benchmarking the CoW anatomical segmentation task for MRA and CTA, both morphologically and topologically.
△ Less
Submitted 29 April, 2024; v1 submitted 29 December, 2023;
originally announced December 2023.
-
Blind Biological Sequence Denoising with Self-Supervised Set Learning
Authors:
Nathan Ng,
Ji Won Park,
Jae Hyeon Lee,
Ryan Lewis Kelly,
Stephen Ra,
Kyunghyun Cho
Abstract:
Biological sequence analysis relies on the ability to denoise the imprecise output of sequencing platforms. We consider a common setting where a short sequence is read out repeatedly using a high-throughput long-read platform to generate multiple subreads, or noisy observations of the same sequence. Denoising these subreads with alignment-based approaches often fails when too few subreads are avai…
▽ More
Biological sequence analysis relies on the ability to denoise the imprecise output of sequencing platforms. We consider a common setting where a short sequence is read out repeatedly using a high-throughput long-read platform to generate multiple subreads, or noisy observations of the same sequence. Denoising these subreads with alignment-based approaches often fails when too few subreads are available or error rates are too high. In this paper, we propose a novel method for blindly denoising sets of sequences without directly observing clean source sequence labels. Our method, Self-Supervised Set Learning (SSSL), gathers subreads together in an embedding space and estimates a single set embedding as the midpoint of the subreads in both the latent and sequence spaces. This set embedding represents the "average" of the subreads and can be decoded into a prediction of the clean sequence. In experiments on simulated long-read DNA data, SSSL methods denoise small reads of $\leq 6$ subreads with 17% fewer errors and large reads of $>6$ subreads with 8% fewer errors compared to the best baseline. On a real dataset of antibody sequences, SSSL improves over baselines on two self-supervised metrics, with a significant improvement on difficult small reads that comprise over 60% of the test set. By accurately denoising these reads, SSSL promises to better realize the potential of high-throughput DNA sequencing data for downstream scientific applications.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
OpenProteinSet: Training data for structural biology at scale
Authors:
Gustaf Ahdritz,
Nazim Bouatta,
Sachin Kadyan,
Lukas Jarosch,
Daniel Berenberg,
Ian Fisk,
Andrew M. Watkins,
Stephen Ra,
Richard Bonneau,
Mohammed AlQuraishi
Abstract:
Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally…
▽ More
Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
Protein Discovery with Discrete Walk-Jump Sampling
Authors:
Nathan C. Frey,
Daniel Berenberg,
Karina Zadorozhny,
Joseph Kleinhenz,
Julien Lafrance-Vanasse,
Isidro Hotzel,
Yan Wu,
Stephen Ra,
Richard Bonneau,
Kyunghyun Cho,
Andreas Loukas,
Vladimir Gligorijevic,
Saeed Saremi
Abstract:
We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the true data manifold with one-step denoising. Our Discrete Walk-Jump Sampling formalism combines the contrastive divergence training of an energy-based model and imp…
▽ More
We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the true data manifold with one-step denoising. Our Discrete Walk-Jump Sampling formalism combines the contrastive divergence training of an energy-based model and improved sample quality of a score-based model, while simplifying training and sampling by requiring only a single noise level. We evaluate the robustness of our approach on generative modeling of antibody proteins and introduce the distributional conformity score to benchmark protein generative models. By optimizing and sampling from our models for the proposed distributional conformity score, 97-100% of generated samples are successfully expressed and purified and 70% of functional designs show equal or improved binding affinity compared to known functional antibodies on the first attempt in a single round of laboratory experiments. We also report the first demonstration of long-run fast-mixing MCMC chains where diverse antibody protein classes are visited in a single MCMC chain.
△ Less
Submitted 15 March, 2024; v1 submitted 8 June, 2023;
originally announced June 2023.
-
3D molecule generation by denoising voxel grids
Authors:
Pedro O. Pinheiro,
Joshua Rackers,
Joseph Kleinhenz,
Michael Maser,
Omar Mahmood,
Andrew Martin Watkins,
Stephen Ra,
Vishnu Sresht,
Saeed Saremi
Abstract:
We propose a new score-based approach to generate 3D molecules represented as atomic densities on regular grids. First, we train a denoising neural network that learns to map from a smooth distribution of noisy molecules to the distribution of real molecules. Then, we follow the neural empirical Bayes framework (Saremi and Hyvarinen, 19) and generate molecules in two steps: (i) sample noisy densit…
▽ More
We propose a new score-based approach to generate 3D molecules represented as atomic densities on regular grids. First, we train a denoising neural network that learns to map from a smooth distribution of noisy molecules to the distribution of real molecules. Then, we follow the neural empirical Bayes framework (Saremi and Hyvarinen, 19) and generate molecules in two steps: (i) sample noisy density grids from a smooth distribution via underdamped Langevin Markov chain Monte Carlo, and (ii) recover the "clean" molecule by denoising the noisy grid with a single step. Our method, VoxMol, generates molecules in a fundamentally different way than the current state of the art (ie, diffusion models applied to atom point clouds). It differs in terms of the data representation, the noise model, the network architecture and the generative modeling algorithm. Our experiments show that VoxMol captures the distribution of drug-like molecules better than state of the art, while being faster to generate samples.
△ Less
Submitted 8 March, 2024; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Learning Causal Representations of Single Cells via Sparse Mechanism Shift Modeling
Authors:
Romain Lopez,
Nataša Tagasovska,
Stephen Ra,
Kyunghyn Cho,
Jonathan K. Pritchard,
Aviv Regev
Abstract:
Latent variable models such as the Variational Auto-Encoder (VAE) have become a go-to tool for analyzing biological data, especially in the field of single-cell genomics. One remaining challenge is the interpretability of latent variables as biological processes that define a cell's identity. Outside of biological applications, this problem is commonly referred to as learning disentangled represen…
▽ More
Latent variable models such as the Variational Auto-Encoder (VAE) have become a go-to tool for analyzing biological data, especially in the field of single-cell genomics. One remaining challenge is the interpretability of latent variables as biological processes that define a cell's identity. Outside of biological applications, this problem is commonly referred to as learning disentangled representations. Although several disentanglement-promoting variants of the VAE were introduced, and applied to single-cell genomics data, this task has been shown to be infeasible from independent and identically distributed measurements, without additional structure. Instead, recent methods propose to leverage non-stationary data, as well as the sparse mechanism shift assumption in order to learn disentangled representations with a causal semantic. Here, we extend the application of these methodological advances to the analysis of single-cell genomics data with genetic or chemical perturbations. More precisely, we propose a deep generative model of single-cell gene expression data for which each perturbation is treated as a stochastic intervention targeting an unknown, but sparse, subset of latent variables. We benchmark these methods on simulated single-cell data to evaluate their performance at latent units recovery, causal target identification and out-of-domain generalization. Finally, we apply those approaches to two real-world large-scale gene perturbation data sets and find that models that exploit the sparse mechanism shift hypothesis surpass contemporary methods on a transfer learning task. We implement our new model and benchmarks using the scvi-tools library, and release it as open-source software at https://github.com/Genentech/sVAE.
△ Less
Submitted 16 February, 2023; v1 submitted 7 November, 2022;
originally announced November 2022.
-
A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences
Authors:
Nataša Tagasovska,
Nathan C. Frey,
Andreas Loukas,
Isidro Hötzel,
Julien Lafrance-Vanasse,
Ryan Lewis Kelly,
Yan Wu,
Arvind Rajpal,
Richard Bonneau,
Kyunghyun Cho,
Stephen Ra,
Vladimir Gligorijević
Abstract:
Deep generative models have emerged as a popular machine learning-based approach for inverse design problems in the life sciences. However, these problems often require sampling new designs that satisfy multiple properties of interest in addition to learning the data distribution. This multi-objective optimization becomes more challenging when properties are independent or orthogonal to each other…
▽ More
Deep generative models have emerged as a popular machine learning-based approach for inverse design problems in the life sciences. However, these problems often require sampling new designs that satisfy multiple properties of interest in addition to learning the data distribution. This multi-objective optimization becomes more challenging when properties are independent or orthogonal to each other. In this work, we propose a Pareto-compositional energy-based model (pcEBM), a framework that uses multiple gradient descent for sampling new designs that adhere to various constraints in optimizing distinct properties. We demonstrate its ability to learn non-convex Pareto fronts and generate sequences that simultaneously satisfy multiple desired properties across a series of real-world antibody design tasks.
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design
Authors:
Ji Won Park,
Samuel Stanton,
Saeed Saremi,
Andrew Watkins,
Henri Dwyer,
Vladimir Gligorijevic,
Richard Bonneau,
Stephen Ra,
Kyunghyun Cho
Abstract:
Bayesian optimization offers a sample-efficient framework for navigating the exploration-exploitation trade-off in the vast design space of biological sequences. Whereas it is possible to optimize the various properties of interest jointly using a multi-objective acquisition function, such as the expected hypervolume improvement (EHVI), this approach does not account for objectives with a hierarch…
▽ More
Bayesian optimization offers a sample-efficient framework for navigating the exploration-exploitation trade-off in the vast design space of biological sequences. Whereas it is possible to optimize the various properties of interest jointly using a multi-objective acquisition function, such as the expected hypervolume improvement (EHVI), this approach does not account for objectives with a hierarchical dependency structure. We consider a common use case where some regions of the Pareto frontier are prioritized over others according to a specified $\textit{partial ordering}$ in the objectives. For instance, when designing antibodies, we would like to maximize the binding affinity to a target antigen only if it can be expressed in live cell culture -- modeling the experimental dependency in which affinity can only be measured for antibodies that can be expressed and thus produced in viable quantities. In general, we may want to confer a partial ordering to the properties such that each property is optimized conditioned on its parent properties satisfying some feasibility condition. To this end, we present PropertyDAG, a framework that operates on top of the traditional multi-objective BO to impose this desired ordering on the objectives, e.g. expression $\rightarrow$ affinity. We demonstrate its performance over multiple simulated active learning iterations on a penicillin production task, toy numerical problem, and a real-world antibody design task.
△ Less
Submitted 8 October, 2022;
originally announced October 2022.
-
Active information, missing data and prevalence estimation
Authors:
Ola Hössjer,
Daniel Andrés Díaz-Pachón,
Chen Zhao,
J. Sunil Rao
Abstract:
The topic of this paper is prevalence estimation from the perspective of active information. Prevalence among tested individuals has an upward bias under the assumption that individuals' willingness to be tested for the disease increases with the strength of their symptoms. Active information due to testing bias quantifies the degree at which the willingness to be tested correlates with infection…
▽ More
The topic of this paper is prevalence estimation from the perspective of active information. Prevalence among tested individuals has an upward bias under the assumption that individuals' willingness to be tested for the disease increases with the strength of their symptoms. Active information due to testing bias quantifies the degree at which the willingness to be tested correlates with infection status. Interpreting incomplete testing as a missing data problem, the missingness mechanism impacts the degree at which the bias of the original prevalence estimate can be removed. The reduction in prevalence, when testing bias is adjusted for, translates into an active information due to bias correction, with opposite sign to active information due to testing bias. Prevalence and active information estimates are asymptotically normal, a behavior also illustrated through simulations.
△ Less
Submitted 10 June, 2022;
originally announced June 2022.
-
Multi-segment preserving sampling for deep manifold sampler
Authors:
Daniel Berenberg,
Jae Hyeon Lee,
Simon Kelow,
Ji Won Park,
Andrew Watkins,
Vladimir Gligorijević,
Richard Bonneau,
Stephen Ra,
Kyunghyun Cho
Abstract:
Deep generative modeling for biological sequences presents a unique challenge in reconciling the bias-variance trade-off between explicit biological insight and model flexibility. The deep manifold sampler was recently proposed as a means to iteratively sample variable-length protein sequences by exploiting the gradients from a function predictor. We introduce an alternative approach to this guide…
▽ More
Deep generative modeling for biological sequences presents a unique challenge in reconciling the bias-variance trade-off between explicit biological insight and model flexibility. The deep manifold sampler was recently proposed as a means to iteratively sample variable-length protein sequences by exploiting the gradients from a function predictor. We introduce an alternative approach to this guided sampling procedure, multi-segment preserving sampling, that enables the direct inclusion of domain-specific knowledge by designating preserved and non-preserved segments along the input sequence, thereby restricting variation to only select regions. We present its effectiveness in the context of antibody design by training two models: a deep manifold sampler and a GPT-2 language model on nearly six million heavy chain sequences annotated with the IGHV1-18 gene. During sampling, we restrict variation to only the complementarity-determining region 3 (CDR3) of the input. We obtain log probability scores from a GPT-2 model for each sampled CDR3 and demonstrate that multi-segment preserving sampling generates reasonable designs while maintaining the desired, preserved regions.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
"Back to the future" projections for COVID-19 surges
Authors:
J. Sunil Rao,
Tianhao Liu,
Daniel Andrés Díaz-Pachón
Abstract:
We argue that information from countries who had earlier COVID-19 surges can be used to inform another country's current model, then generating what we call back-to-the-future (BTF) projections. We show that these projections can be used to accurately predict future COVID-19 surges prior to an inflection point of the daily infection curve. We show, across 12 different countries from all populated…
▽ More
We argue that information from countries who had earlier COVID-19 surges can be used to inform another country's current model, then generating what we call back-to-the-future (BTF) projections. We show that these projections can be used to accurately predict future COVID-19 surges prior to an inflection point of the daily infection curve. We show, across 12 different countries from all populated continents around the world, that our method can often predict future surges in scenarios where the traditional approaches would always predict no future surges. However, as expected, BTF projections cannot accurately predict a surge due to the emergence of a new variant. To generate BTF projections, we make use of a matching scheme for asynchronous time series combined with a response coaching SIR model.
△ Less
Submitted 17 February, 2022;
originally announced February 2022.
-
Optimal governance and implementation of vaccination programmes to contain the COVID-19 pandemic
Authors:
Mahendra Piraveenan,
Shailendra Sawleshwarkar,
Michael Walsh,
Iryna Zablotska,
Samit Bhattacharyya,
Habib Hassan Farooqui,
Tarun Bhatnagar,
Anup Karan,
Manoj Murhekar,
Sanjay Zodpey,
K. S. Mallikarjuna Rao,
Philippa Pattison,
Albert Zomaya,
Matjaz Perc
Abstract:
Since the recent introduction of several viable vaccines for SARS-CoV-2, vaccination uptake has become the key factor that will determine our success in containing the COVID-19 pandemic. We argue that game theory and social network models should be used to guide decisions pertaining to vaccination programmes for the best possible results. In the months following the introduction of vaccines, their…
▽ More
Since the recent introduction of several viable vaccines for SARS-CoV-2, vaccination uptake has become the key factor that will determine our success in containing the COVID-19 pandemic. We argue that game theory and social network models should be used to guide decisions pertaining to vaccination programmes for the best possible results. In the months following the introduction of vaccines, their availability and the human resources needed to run the vaccination programmes have been scarce in many countries. Vaccine hesitancy is also being encountered from some sections of the general public. We emphasize that decision-making under uncertainty and imperfect information, and with only conditionally optimal outcomes, is a unique forte of established game-theoretic modelling. Therefore, we can use this approach to obtain the best framework for modelling and simulating vaccination prioritization and uptake that will be readily available to inform important policy decisions for the optimal control of the COVID-19 pandemic.
△ Less
Submitted 9 June, 2021; v1 submitted 12 November, 2020;
originally announced November 2020.
-
A simple correction for COVID-19 sampling bias
Authors:
Daniel Andrés Díaz-Pachón,
J Sunil Rao
Abstract:
COVID-19 testing has become a standard approach for estimating prevalence which then assist in public health decision making to contain and mitigate the spread of the disease. The sampling designs used are often biased in that they do not reflect the true underlying populations. For instance, individuals with strong symptoms are more likely to be tested than those with no symptoms. This results in…
▽ More
COVID-19 testing has become a standard approach for estimating prevalence which then assist in public health decision making to contain and mitigate the spread of the disease. The sampling designs used are often biased in that they do not reflect the true underlying populations. For instance, individuals with strong symptoms are more likely to be tested than those with no symptoms. This results in biased estimates of prevalence (too high). Typical post-sampling corrections are not always possible. Here we present a simple bias correction methodology derived and adapted from a correction for publication bias in meta analysis studies. The methodology is general enough to allow a wide variety of customization making it more useful in practice. Implementation is easily done using already collected information. Via a simulation and two real datasets, we show that the bias corrections can provide dramatic reductions in estimation error.
△ Less
Submitted 11 January, 2021; v1 submitted 14 July, 2020;
originally announced July 2020.
-
Analysis of genetic differences between psychiatric disorders: Exploring pathways and cell-types/tissues involved and ability to differentiate the disorders by polygenic scores
Authors:
Shitao Rao,
Liangying Yin,
Yong Xiang,
Hon-Cheong So
Abstract:
Although displaying genetic correlations, psychiatric disorders are clinically defined as categorical entities as they each have distinguishing clinical features and may involve different treatments. Identifying differential genetic variations between these disorders may reveal how the disorders differ biologically and help to guide more personalized treatment.
Here we presented a comprehensive…
▽ More
Although displaying genetic correlations, psychiatric disorders are clinically defined as categorical entities as they each have distinguishing clinical features and may involve different treatments. Identifying differential genetic variations between these disorders may reveal how the disorders differ biologically and help to guide more personalized treatment.
Here we presented a comprehensive analysis to identify genetic markers differentially associated with various psychiatric disorders/traits based on GWAS summary statistics, covering 18 psychiatric traits/disorders and 26 comparisons. We also conducted comprehensive analysis to unravel the genes, pathways and SNP functional categories involved, and the cell types and tissues implicated. We also assessed how well one could distinguish between psychiatric disorders by polygenic risk scores (PRS).
SNP-based heritabilities (h2SNP) were significantly larger than zero for most comparisons. Based on current GWAS data, PRS have mostly modest power to distinguish between psychiatric disorders. For example, we estimated that AUC for distinguishing schizophrenia from major depressive disorder (MDD), bipolar disorder (BPD) from MDD and schizophrenia from BPD were 0.694, 0.602 and 0.618 respectively, while the maximum AUC (based on h2SNP) were 0.763, 0.749 and 0.726 respectively. We also uncovered differences in each pair of studied traits in terms of their differences in genetic correlation with comorbid traits. For example, clinically-defined MDD appeared to more strongly genetically correlated with other psychiatric disorders and heart disease, when compared to non-clinically-defined depression in UK Biobank.
Our findings highlight genetic differences between psychiatric disorders and the mechanisms involved. PRS may aid differential diagnosis of selected psychiatric disorders in the future with larger GWAS samples.
△ Less
Submitted 20 May, 2021; v1 submitted 16 March, 2020;
originally announced March 2020.
-
Helminth Dynamics: Mean Number of Worms, Reproductive Rates
Authors:
Arni S. R. Srinivasa Rao,
Roy M. Anderson
Abstract:
We derive formulas to compute mean number of worms in a newly Helminth infected population before secondary infections are started (population is closed). We have proved the two types of growth functions arise in this process as measurable functions.
We derive formulas to compute mean number of worms in a newly Helminth infected population before secondary infections are started (population is closed). We have proved the two types of growth functions arise in this process as measurable functions.
△ Less
Submitted 25 February, 2019;
originally announced February 2019.
-
Computation of life expectancy from incomplete data
Authors:
Arni S. R. Srinivasa Rao,
James R. Carey
Abstract:
Estimating the human longevity and computing of life expectancy are central to the population dynamics. These aspects were studied seriously by scientists since fifteenth century, including renowned astronomer Edmund Halley. From basic principles of population dynamics, we propose a method to compute life expectancy from incomplete data.
Estimating the human longevity and computing of life expectancy are central to the population dynamics. These aspects were studied seriously by scientists since fifteenth century, including renowned astronomer Edmund Halley. From basic principles of population dynamics, we propose a method to compute life expectancy from incomplete data.
△ Less
Submitted 7 November, 2018;
originally announced November 2018.
-
On the Three Properties of Stationary Populations and knotting with Non-Stationary Populations
Authors:
Arni S. R. Srinivasa Rao,
James R. Carey
Abstract:
A population is considered stationary if the growth rate is zero and the age structure is constant. It thus follows that a population is considered non-stationary if either its growth rate is non-zero and/or its age structure is non-constant. We propose three properties that are related to the stationary population identity (SPI) of population biology by connecting it with stationary populations a…
▽ More
A population is considered stationary if the growth rate is zero and the age structure is constant. It thus follows that a population is considered non-stationary if either its growth rate is non-zero and/or its age structure is non-constant. We propose three properties that are related to the stationary population identity (SPI) of population biology by connecting it with stationary populations and non-stationary populations which are approaching stationarity. One of these important properties is that SPI can be applied to partition a population into stationary and non-stationary components. These properties provide deeper insights into cohort formation in real-world populations and the length of the duration for which stationary and non-stationary conditions hold. The new concepts are based on the time gap between the occurrence of stationary and non-stationary populations within the SPI framework that we refer to as Oscillatory SPI and the Amplitude of SPI. This article will appear in Bulletin of Mathematical Biology (Springer)
△ Less
Submitted 19 July, 2019; v1 submitted 7 November, 2018;
originally announced November 2018.
-
A Partition Theorem for a Randomly Selected Large Population
Authors:
Arni S. R. Srinivasa Rao
Abstract:
We state and prove a theorem on the partitioning of a randomly selected large population into stationary and non-stationary components by using a property of stationary population identity. Applications of this theorem for practical purposes is summarized at the end.
We state and prove a theorem on the partitioning of a randomly selected large population into stationary and non-stationary components by using a property of stationary population identity. Applications of this theorem for practical purposes is summarized at the end.
△ Less
Submitted 1 July, 2021; v1 submitted 2 October, 2018;
originally announced October 2018.
-
True Epidemic Growth Construction Through Harmonic Analysis
Authors:
Steven G. Krantz,
Peter Polyakov,
Arni S. R. Srinivasa Rao
Abstract:
In this paper, we have proposed a two phase procedure (combining discrete graphs and wavelets) for constructing a true epidemic growth. In the first phase graph theory based approach was developed to update partial data available and in the second phase we used this partial data to generate a plausible complete data through wavelets. This procedure although novel and implementable, still leave som…
▽ More
In this paper, we have proposed a two phase procedure (combining discrete graphs and wavelets) for constructing a true epidemic growth. In the first phase graph theory based approach was developed to update partial data available and in the second phase we used this partial data to generate a plausible complete data through wavelets. This procedure although novel and implementable, still leave some questions unanswered.
△ Less
Submitted 4 September, 2018; v1 submitted 29 August, 2018;
originally announced August 2018.
-
Population Stability and Momentum
Authors:
Arni S. R. Srinivasa Rao
Abstract:
A new approach is developed to understand stability of a population and further understanding of population momentum.
A new approach is developed to understand stability of a population and further understanding of population momentum.
△ Less
Submitted 10 February, 2015;
originally announced March 2015.
-
Generalization of Carey's Equality and a Theorem on Stationary Population
Authors:
Arni S. R. Srinivasa Rao,
James R. Carey
Abstract:
Carey's Equality pertaining to stationary models is well known. In this paper, we have stated and proved a fundamental theorem related to the formation of this Equality. This theorem will provide an in-depth understanding of the role of each captive subject, and their corresponding follow-up duration in a stationary population. We have demonstrated a numerical example of a captive cohort and the s…
▽ More
Carey's Equality pertaining to stationary models is well known. In this paper, we have stated and proved a fundamental theorem related to the formation of this Equality. This theorem will provide an in-depth understanding of the role of each captive subject, and their corresponding follow-up duration in a stationary population. We have demonstrated a numerical example of a captive cohort and the survival pattern of medfly populations. These results can be adopted to understand age-structure and aging process in stationary and non-stationary population population models. Key words: Captive cohort, life expectancy, symmetric patterns.
△ Less
Submitted 10 February, 2015;
originally announced February 2015.
-
Model-order reduction of biochemical reaction networks
Authors:
Shodhan Rao,
Arjan van der Schaft,
Karen van Eunen,
Barbara M. Bakker,
Bayu Jayawardhana
Abstract:
In this paper we propose a model-order reduction method for chemical reaction networks governed by general enzyme kinetics, including the mass-action and Michaelis-Menten kinetics. The model-order reduction method is based on the Kron reduction of the weighted Laplacian matrix which describes the graph structure of complexes in the chemical reaction network. We apply our method to a yeast glycolys…
▽ More
In this paper we propose a model-order reduction method for chemical reaction networks governed by general enzyme kinetics, including the mass-action and Michaelis-Menten kinetics. The model-order reduction method is based on the Kron reduction of the weighted Laplacian matrix which describes the graph structure of complexes in the chemical reaction network. We apply our method to a yeast glycolysis model, where the simulation result shows that the transient behaviour of a number of key metabolites of the reduced-order model is in good agreement with those of the full-order model.
△ Less
Submitted 11 December, 2012;
originally announced December 2012.
-
Understanding Theoretically The Impact of Reporting of Disease Cases in Epidemiology
Authors:
Arni S. R. Srinivasa Rao
Abstract:
In conducting preliminary analysis during an epidemic, data on reported disease cases offer key information in guiding the direction to the in-depth analysis. Models for growth and transmission dynamics are heavily dependent on preliminary analysis results. When a particular disease case is reported more than once or alternatively is never reported or detected in the population, then in such a sit…
▽ More
In conducting preliminary analysis during an epidemic, data on reported disease cases offer key information in guiding the direction to the in-depth analysis. Models for growth and transmission dynamics are heavily dependent on preliminary analysis results. When a particular disease case is reported more than once or alternatively is never reported or detected in the population, then in such a situation, there is a possibility of existence of multiple reporting or under reporting in the population. In this work, a theoretical approach for studying reporting error in epidemiology is explored. The upper bound for the error that arises due to multiple reporting is higher than that which arises due to under reporting. Numerical examples are provided to support the arguments. This article mainly treats reporting error as deterministic and one can explore a stochastic model for the same.
△ Less
Submitted 28 February, 2012; v1 submitted 13 February, 2012;
originally announced February 2012.
-
Evolutionary Stability Against Multiple Mutations
Authors:
Anirban Ghatak,
K. S. Mallikarjuna Rao,
A. J. Shaiju
Abstract:
It is known (see e.g. Weibull (1995)) that ESS is not robust against multiple mutations. In this article, we introduce robustness against multiple mutations and study some equivalent formulations and consequences.
It is known (see e.g. Weibull (1995)) that ESS is not robust against multiple mutations. In this article, we introduce robustness against multiple mutations and study some equivalent formulations and consequences.
△ Less
Submitted 11 January, 2012;
originally announced January 2012.
-
Multivariate Force of Mortality
Authors:
Swagata Mitra,
Pratyush Singh,
Arni S. R. Srinivasa Rao
Abstract:
In usual demographic analysis, force of mortality is a function of one variable, that is, of age. In this article bi-variate and multivariate force of mortality functions are introduced for the first time to explain mortality differentials. The pattern of mortality in a population is one of the strong influencing factors in determining the life expectancies at various ages in the population. Consi…
▽ More
In usual demographic analysis, force of mortality is a function of one variable, that is, of age. In this article bi-variate and multivariate force of mortality functions are introduced for the first time to explain mortality differentials. The pattern of mortality in a population is one of the strong influencing factors in determining the life expectancies at various ages in the population. Considering univariate functions of age only to understand the human mortality data without associating with other variables could lead to incomplete analysis. The reasons behind declining forces of mortality globally could be studied using the proposed functions. Other applications of multivariate forces of mortality could be in actuarial sciences.
△ Less
Submitted 25 February, 2019; v1 submitted 22 November, 2011;
originally announced November 2011.
-
On the Mathematical Structure of Balanced Chemical Reaction Networks Governed by Mass Action Kinetics
Authors:
Arjan van der Schaft,
Shodhan Rao,
Bayu Jayawardhana
Abstract:
Motivated by recent progress on the interplay between graph theory, dynamics, and systems theory, we revisit the analysis of chemical reaction networks described by mass action kinetics. For reaction networks possessing a thermodynamic equilibrium we derive a compact formulation exhibiting at the same time the structure of the complex graph and the stoichiometry of the network, and which admits a…
▽ More
Motivated by recent progress on the interplay between graph theory, dynamics, and systems theory, we revisit the analysis of chemical reaction networks described by mass action kinetics. For reaction networks possessing a thermodynamic equilibrium we derive a compact formulation exhibiting at the same time the structure of the complex graph and the stoichiometry of the network, and which admits a direct thermodynamical interpretation. This formulation allows us to easily characterize the set of equilibria and their stability properties. Furthermore, we develop a framework for interconnection of chemical reaction networks. Finally we discuss how the established framework leads to a new approach for model reduction.
△ Less
Submitted 27 October, 2011;
originally announced October 2011.
-
Biometric Cards for Indian Population: Role of Mathematical Models in Assisting and Planning
Authors:
Arni S. R. Srinivasa Rao
Abstract:
Mathematical models could be helpful in assisting the Indian Government's new initiative of issuing biometric cards to its citizens. In this note, we look into the role of mathematical models in estimating the missing, non-enumerated population numbers, estimating annual numbers of cards required by age, gender and regions in India. The linkage between National Population Register and biometric ca…
▽ More
Mathematical models could be helpful in assisting the Indian Government's new initiative of issuing biometric cards to its citizens. In this note, we look into the role of mathematical models in estimating the missing, non-enumerated population numbers, estimating annual numbers of cards required by age, gender and regions in India. The linkage between National Population Register and biometric cards is also highlighted. See technical Appendices. There are other scientific issues, namely, electronic, data storage management, identity verification etc, which we do not address in this paper.
△ Less
Submitted 28 April, 2011; v1 submitted 10 April, 2011;
originally announced April 2011.
-
Theoretical Framework and Empirical Modeling for Time Required to Vaccinate a Population in an Epidemic
Authors:
Arni S. R. Srinivasa Rao,
Thomas Kurien
Abstract:
The paper describes a method to understand time required to vaccinate against viruses in total as well as subpopulations. As a demonstration, a model based estimate for time required to vaccinate H1N1 in India, given its administrative difficulties is provided. We have proved novel theorems for the time functions defined in the paper. Such results are useful in planning for future epidemics. The n…
▽ More
The paper describes a method to understand time required to vaccinate against viruses in total as well as subpopulations. As a demonstration, a model based estimate for time required to vaccinate H1N1 in India, given its administrative difficulties is provided. We have proved novel theorems for the time functions defined in the paper. Such results are useful in planning for future epidemics. The number of days required to vaccinate entire high risk population in three subpopulations (villages, tehsils and towns) are noted to be 84, 89 and 88 respectively. There exists state wise disparities in the health infrastructure and capacities to deliver vaccines and hence national estimates need to be re-evaluated based on individual performances in the states.
△ Less
Submitted 18 February, 2011;
originally announced February 2011.
-
Integration of a Phosphatase Cascade with the MAP Kinase Pathway provides for a Novel Signal Processing Function
Authors:
Virendra K. Chaudhri,
Dhiraj Kumar,
Manjari Misra,
Raina Dua,
Kanury V. S. Rao
Abstract:
We mathematically modeled the receptor-activated MAP kinase signaling by incorporating the regulation through cellular phosphatases. Activation induced the alignment of a phosphatase cascade in parallel with the MAP kinase pathway. A novel regulatory motif was thus generated, providing for the combinatorial control of each MAPK intermediate. This ensured a non-linear mode of signal transmission…
▽ More
We mathematically modeled the receptor-activated MAP kinase signaling by incorporating the regulation through cellular phosphatases. Activation induced the alignment of a phosphatase cascade in parallel with the MAP kinase pathway. A novel regulatory motif was thus generated, providing for the combinatorial control of each MAPK intermediate. This ensured a non-linear mode of signal transmission with the output being shaped by the balance between the strength of input signal, and the activity gradient along the phosphatase axis. Shifts in this balance yielded modulations in topology of the motif, thereby expanding the repertoire of output responses. Thus we identify an added dimension to signal processing, wherein the output response to an external stimulus is additionally filtered through indicators that define the phenotypic status of the cell.
△ Less
Submitted 11 August, 2009;
originally announced August 2009.
-
Incubation periods under various anti-retroviral therapies in homogeneous mixing and age-structured dynamical models: A theoretical approach
Authors:
Arni S. R. Srinivasa Rao
Abstract:
With the launch of second line anti-retroviral therapy for HIV infected individuals, there has been an increased expectation on surviving period of people with HIV. We consider previously well-known models in HIV epidemiology where the parameter for incubation period is used as one of the important components to explain the dynamics of the variables. Such models are extended here to explain the dy…
▽ More
With the launch of second line anti-retroviral therapy for HIV infected individuals, there has been an increased expectation on surviving period of people with HIV. We consider previously well-known models in HIV epidemiology where the parameter for incubation period is used as one of the important components to explain the dynamics of the variables. Such models are extended here to explain the dynamics with respect to a given therapy that prolongs life of an HIV infected individual. A deconvolution method is demonstrated for estimation of parameters in the situations when no-therapy and multiple therapies are given to the infected population. The models and deconvolution method are extended in order to study the impact of therapy in age-structured populations. A generalization for a situation when n-types of therapies are available is given. Models are demonstrated using hypothetical data and sensitivity of the parameters are also computed.
△ Less
Submitted 2 May, 2013; v1 submitted 15 August, 2006;
originally announced August 2006.
-
Simplifying the mosaic description of DNA sequences
Authors:
Rajeev K. Azad,
J. Subba Rao,
Wentian Li,
Ramakrishna Ramaswamy
Abstract:
By using the Jensen-Shannon divergence, genomic DNA can be divided into compositionally distinct domains through a standard recursive segmentation procedure. Each domain, while significantly different from its neighbours, may however share compositional similarity with one or more distant (non--neighbouring) domains. We thus obtain a coarse--grained description of the given DNA string in terms o…
▽ More
By using the Jensen-Shannon divergence, genomic DNA can be divided into compositionally distinct domains through a standard recursive segmentation procedure. Each domain, while significantly different from its neighbours, may however share compositional similarity with one or more distant (non--neighbouring) domains. We thus obtain a coarse--grained description of the given DNA string in terms of a smaller set of distinct domain labels. This yields a minimal domain description of a given DNA sequence, significantly reducing its organizational complexity. This procedure gives a new means of evaluating genomic complexity as one examines organisms ranging from bacteria to human. The mosaic organization of DNA sequences could have originated from the insertion of fragments of one genome (the parasite) inside another (the host), and we present numerical experiments that are suggestive of this scenario.
△ Less
Submitted 27 July, 2002;
originally announced July 2002.
-
Long range correlations in DNA sequences
Authors:
A. K. Mohanty,
A. V. S. S. Narayana Rao
Abstract:
The so called long range correlation properties of DNA sequences are studied using the variance analyses of the density distribution of a single or a group of nucleotides in a model independent way. This new method which was suggested earlier has been applied to extract slope parameters that characterize the correlation properties for several intron containing and intron less DNA sequences. An i…
▽ More
The so called long range correlation properties of DNA sequences are studied using the variance analyses of the density distribution of a single or a group of nucleotides in a model independent way. This new method which was suggested earlier has been applied to extract slope parameters that characterize the correlation properties for several intron containing and intron less DNA sequences. An important aspect of all the DNA sequences is the properties of complimentarity by virtue of which any two complimentary distributions (like GA is complimentary to TC or G is complimentary to ATC) have identical fluctuations at all scales although their distribution functions need not be identical. Due to this complimentarity, the famous DNA walk representation whose statistical interpretation is still unresolved is shown to be a special case of the present formalism with a density distribution corresponding to a purine or a pyrimidine group. Another interesting aspect of most of the DNA sequences is that the factorial moments as a function of length exceed unity around a region where the variance versus length in a log-log plot shows a bending. This is a pure phenomenological observation which is found for several DNA sequences with a few exception. Therefore, this length scale has been used as an approximate measure to exclude the bending regions from the slope analyses. The asymmetries in the nucleotide contents or the patchy structure as a possible origin of the long range correlations has also been investigated.
△ Less
Submitted 20 March, 2002; v1 submitted 28 February, 2002;
originally announced February 2002.
-
Protein Folding and Spin Glass
Authors:
S. Suresh Rao,
Somendra M. Bhattacharjee
Abstract:
We explicitly show the connection between the protein folding problem and spin glass transition. This is then used to identify appropriate quantities that are required to describe the transition. A possible way of observing the spin glass transition is proposed.
We explicitly show the connection between the protein folding problem and spin glass transition. This is then used to identify appropriate quantities that are required to describe the transition. A possible way of observing the spin glass transition is proposed.
△ Less
Submitted 5 April, 1995; v1 submitted 30 March, 1995;
originally announced March 1995.