Search | arXiv e-print repository

arXiv:2312.17670 [pdf, other]

Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA

Authors: Kaiyuan Yang, Fabio Musio, Yihui Ma, Norman Juchler, Johannes C. Paetzold, Rami Al-Maskari, Luciano Höher, Hongwei Bran Li, Ibrahim Ethem Hamamci, Anjany Sekuboyina, Suprosanna Shit, Hou**g Huang, Chinmay Prabhakar, Ezequiel de la Rosa, Diana Waldmannstetter, Florian Kofler, Fernando Navarro, Martin Menten, Ivan Ezhov, Daniel Rueckert, Iris Vos, Ynte Ruigrok, Birgitta Velthuis, Hugo Kuijf, Julien Hämmerli , et al. (59 additional authors not shown)

Abstract: The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neuro-vascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two angiographic imaging modaliti… ▽ More The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neuro-vascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two angiographic imaging modalities, magnetic resonance angiography (MRA) and computed tomography angiography (CTA), but there exist limited public datasets with annotations on CoW anatomy, especially for CTA. Therefore we organized the TopCoW Challenge in 2023 with the release of an annotated CoW dataset. The TopCoW dataset was the first public dataset with voxel-level annotations for thirteen possible CoW vessel components, enabled by virtual-reality (VR) technology. It was also the first large dataset with paired MRA and CTA from the same patients. TopCoW challenge formalized the CoW characterization problem as a multiclass anatomical segmentation task with an emphasis on topological metrics. We invited submissions worldwide for the CoW segmentation task, which attracted over 140 registered participants from four continents. The top performing teams managed to segment many CoW components to Dice scores around 90%, but with lower scores for communicating arteries and rare variants. There were also topological mistakes for predictions with high Dice scores. Additional topological analysis revealed further areas for improvement in detecting certain CoW components and matching CoW variant topology accurately. TopCoW represented a first attempt at benchmarking the CoW anatomical segmentation task for MRA and CTA, both morphologically and topologically. △ Less

Submitted 29 April, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

Comments: 24 pages, 11 figures, 9 tables. Summary Paper for the MICCAI TopCoW 2023 Challenge

arXiv:2309.01670 [pdf, other]

Blind Biological Sequence Denoising with Self-Supervised Set Learning

Authors: Nathan Ng, Ji Won Park, Jae Hyeon Lee, Ryan Lewis Kelly, Stephen Ra, Kyunghyun Cho

Abstract: Biological sequence analysis relies on the ability to denoise the imprecise output of sequencing platforms. We consider a common setting where a short sequence is read out repeatedly using a high-throughput long-read platform to generate multiple subreads, or noisy observations of the same sequence. Denoising these subreads with alignment-based approaches often fails when too few subreads are avai… ▽ More Biological sequence analysis relies on the ability to denoise the imprecise output of sequencing platforms. We consider a common setting where a short sequence is read out repeatedly using a high-throughput long-read platform to generate multiple subreads, or noisy observations of the same sequence. Denoising these subreads with alignment-based approaches often fails when too few subreads are available or error rates are too high. In this paper, we propose a novel method for blindly denoising sets of sequences without directly observing clean source sequence labels. Our method, Self-Supervised Set Learning (SSSL), gathers subreads together in an embedding space and estimates a single set embedding as the midpoint of the subreads in both the latent and sequence spaces. This set embedding represents the "average" of the subreads and can be decoded into a prediction of the clean sequence. In experiments on simulated long-read DNA data, SSSL methods denoise small reads of $\leq 6$ subreads with 17% fewer errors and large reads of $>6$ subreads with 8% fewer errors compared to the best baseline. On a real dataset of antibody sequences, SSSL improves over baselines on two self-supervised metrics, with a significant improvement on difficult small reads that comprise over 60% of the test set. By accurately denoising these reads, SSSL promises to better realize the potential of high-throughput DNA sequencing data for downstream scientific applications. △ Less

Submitted 4 September, 2023; originally announced September 2023.

arXiv:2308.05326 [pdf, other]

OpenProteinSet: Training data for structural biology at scale

Authors: Gustaf Ahdritz, Nazim Bouatta, Sachin Kadyan, Lukas Jarosch, Daniel Berenberg, Ian Fisk, Andrew M. Watkins, Stephen Ra, Richard Bonneau, Mohammed AlQuraishi

Abstract: Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally… ▽ More Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research. △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2306.12360 [pdf, other]

Protein Discovery with Discrete Walk-Jump Sampling

Authors: Nathan C. Frey, Daniel Berenberg, Karina Zadorozhny, Joseph Kleinhenz, Julien Lafrance-Vanasse, Isidro Hotzel, Yan Wu, Stephen Ra, Richard Bonneau, Kyunghyun Cho, Andreas Loukas, Vladimir Gligorijevic, Saeed Saremi

Abstract: We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the true data manifold with one-step denoising. Our Discrete Walk-Jump Sampling formalism combines the contrastive divergence training of an energy-based model and imp… ▽ More We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the true data manifold with one-step denoising. Our Discrete Walk-Jump Sampling formalism combines the contrastive divergence training of an energy-based model and improved sample quality of a score-based model, while simplifying training and sampling by requiring only a single noise level. We evaluate the robustness of our approach on generative modeling of antibody proteins and introduce the distributional conformity score to benchmark protein generative models. By optimizing and sampling from our models for the proposed distributional conformity score, 97-100% of generated samples are successfully expressed and purified and 70% of functional designs show equal or improved binding affinity compared to known functional antibodies on the first attempt in a single round of laboratory experiments. We also report the first demonstration of long-run fast-mixing MCMC chains where diverse antibody protein classes are visited in a single MCMC chain. △ Less

Submitted 15 March, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: ICLR 2024 oral presentation, top 1.2% of submissions; {ICLR 2023 Physics for Machine Learning, NeurIPS 2023 GenBio, MLCB 2023} Spotlight

arXiv:2306.07473 [pdf, other]

3D molecule generation by denoising voxel grids

Authors: Pedro O. Pinheiro, Joshua Rackers, Joseph Kleinhenz, Michael Maser, Omar Mahmood, Andrew Martin Watkins, Stephen Ra, Vishnu Sresht, Saeed Saremi

Abstract: We propose a new score-based approach to generate 3D molecules represented as atomic densities on regular grids. First, we train a denoising neural network that learns to map from a smooth distribution of noisy molecules to the distribution of real molecules. Then, we follow the neural empirical Bayes framework (Saremi and Hyvarinen, 19) and generate molecules in two steps: (i) sample noisy densit… ▽ More We propose a new score-based approach to generate 3D molecules represented as atomic densities on regular grids. First, we train a denoising neural network that learns to map from a smooth distribution of noisy molecules to the distribution of real molecules. Then, we follow the neural empirical Bayes framework (Saremi and Hyvarinen, 19) and generate molecules in two steps: (i) sample noisy density grids from a smooth distribution via underdamped Langevin Markov chain Monte Carlo, and (ii) recover the "clean" molecule by denoising the noisy grid with a single step. Our method, VoxMol, generates molecules in a fundamentally different way than the current state of the art (ie, diffusion models applied to atom point clouds). It differs in terms of the data representation, the noise model, the network architecture and the generative modeling algorithm. Our experiments show that VoxMol captures the distribution of drug-like molecules better than state of the art, while being faster to generate samples. △ Less

Submitted 8 March, 2024; v1 submitted 12 June, 2023; originally announced June 2023.

arXiv:2211.03553 [pdf, other]

Learning Causal Representations of Single Cells via Sparse Mechanism Shift Modeling

Authors: Romain Lopez, Nataša Tagasovska, Stephen Ra, Kyunghyn Cho, Jonathan K. Pritchard, Aviv Regev

Abstract: Latent variable models such as the Variational Auto-Encoder (VAE) have become a go-to tool for analyzing biological data, especially in the field of single-cell genomics. One remaining challenge is the interpretability of latent variables as biological processes that define a cell's identity. Outside of biological applications, this problem is commonly referred to as learning disentangled represen… ▽ More Latent variable models such as the Variational Auto-Encoder (VAE) have become a go-to tool for analyzing biological data, especially in the field of single-cell genomics. One remaining challenge is the interpretability of latent variables as biological processes that define a cell's identity. Outside of biological applications, this problem is commonly referred to as learning disentangled representations. Although several disentanglement-promoting variants of the VAE were introduced, and applied to single-cell genomics data, this task has been shown to be infeasible from independent and identically distributed measurements, without additional structure. Instead, recent methods propose to leverage non-stationary data, as well as the sparse mechanism shift assumption in order to learn disentangled representations with a causal semantic. Here, we extend the application of these methodological advances to the analysis of single-cell genomics data with genetic or chemical perturbations. More precisely, we propose a deep generative model of single-cell gene expression data for which each perturbation is treated as a stochastic intervention targeting an unknown, but sparse, subset of latent variables. We benchmark these methods on simulated single-cell data to evaluate their performance at latent units recovery, causal target identification and out-of-domain generalization. Finally, we apply those approaches to two real-world large-scale gene perturbation data sets and find that models that exploit the sparse mechanism shift hypothesis surpass contemporary methods on a transfer learning task. We implement our new model and benchmarks using the scvi-tools library, and release it as open-source software at https://github.com/Genentech/sVAE. △ Less

Submitted 16 February, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

Comments: Accepted at CLeaR (Causal Learning and Reasoning) 2023

arXiv:2210.10838 [pdf, other]

A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences

Authors: Nataša Tagasovska, Nathan C. Frey, Andreas Loukas, Isidro Hötzel, Julien Lafrance-Vanasse, Ryan Lewis Kelly, Yan Wu, Arvind Rajpal, Richard Bonneau, Kyunghyun Cho, Stephen Ra, Vladimir Gligorijević

Abstract: Deep generative models have emerged as a popular machine learning-based approach for inverse design problems in the life sciences. However, these problems often require sampling new designs that satisfy multiple properties of interest in addition to learning the data distribution. This multi-objective optimization becomes more challenging when properties are independent or orthogonal to each other… ▽ More Deep generative models have emerged as a popular machine learning-based approach for inverse design problems in the life sciences. However, these problems often require sampling new designs that satisfy multiple properties of interest in addition to learning the data distribution. This multi-objective optimization becomes more challenging when properties are independent or orthogonal to each other. In this work, we propose a Pareto-compositional energy-based model (pcEBM), a framework that uses multiple gradient descent for sampling new designs that adhere to various constraints in optimizing distinct properties. We demonstrate its ability to learn non-convex Pareto fronts and generate sequences that simultaneously satisfy multiple desired properties across a series of real-world antibody design tasks. △ Less

Submitted 19 October, 2022; originally announced October 2022.

arXiv:2210.04096 [pdf, other]

PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design

Authors: Ji Won Park, Samuel Stanton, Saeed Saremi, Andrew Watkins, Henri Dwyer, Vladimir Gligorijevic, Richard Bonneau, Stephen Ra, Kyunghyun Cho

Abstract: Bayesian optimization offers a sample-efficient framework for navigating the exploration-exploitation trade-off in the vast design space of biological sequences. Whereas it is possible to optimize the various properties of interest jointly using a multi-objective acquisition function, such as the expected hypervolume improvement (EHVI), this approach does not account for objectives with a hierarch… ▽ More Bayesian optimization offers a sample-efficient framework for navigating the exploration-exploitation trade-off in the vast design space of biological sequences. Whereas it is possible to optimize the various properties of interest jointly using a multi-objective acquisition function, such as the expected hypervolume improvement (EHVI), this approach does not account for objectives with a hierarchical dependency structure. We consider a common use case where some regions of the Pareto frontier are prioritized over others according to a specified $\textit{partial ordering}$ in the objectives. For instance, when designing antibodies, we would like to maximize the binding affinity to a target antigen only if it can be expressed in live cell culture -- modeling the experimental dependency in which affinity can only be measured for antibodies that can be expressed and thus produced in viable quantities. In general, we may want to confer a partial ordering to the properties such that each property is optimized conditioned on its parent properties satisfying some feasibility condition. To this end, we present PropertyDAG, a framework that operates on top of the traditional multi-objective BO to impose this desired ordering on the objectives, e.g. expression $\rightarrow$ affinity. We demonstrate its performance over multiple simulated active learning iterations on a penicillin production task, toy numerical problem, and a real-world antibody design task. △ Less

Submitted 8 October, 2022; originally announced October 2022.

Comments: 9 pages, 7 figures. Submitted to NeurIPS 2022 AI4Science Workshop

arXiv:2206.05120 [pdf, other]

Active information, missing data and prevalence estimation

Authors: Ola Hössjer, Daniel Andrés Díaz-Pachón, Chen Zhao, J. Sunil Rao

Abstract: The topic of this paper is prevalence estimation from the perspective of active information. Prevalence among tested individuals has an upward bias under the assumption that individuals' willingness to be tested for the disease increases with the strength of their symptoms. Active information due to testing bias quantifies the degree at which the willingness to be tested correlates with infection… ▽ More The topic of this paper is prevalence estimation from the perspective of active information. Prevalence among tested individuals has an upward bias under the assumption that individuals' willingness to be tested for the disease increases with the strength of their symptoms. Active information due to testing bias quantifies the degree at which the willingness to be tested correlates with infection status. Interpreting incomplete testing as a missing data problem, the missingness mechanism impacts the degree at which the bias of the original prevalence estimate can be removed. The reduction in prevalence, when testing bias is adjusted for, translates into an active information due to bias correction, with opposite sign to active information due to testing bias. Prevalence and active information estimates are asymptotically normal, a behavior also illustrated through simulations. △ Less

Submitted 10 June, 2022; originally announced June 2022.

Comments: 18 pages, 5 tables, 2 figures

MSC Class: 62D10; 94A17; 62B10; 62F12; 62P10; 92B15; 94A17; 94A16; 94A20

arXiv:2205.04259 [pdf, other]

Multi-segment preserving sampling for deep manifold sampler

Authors: Daniel Berenberg, Jae Hyeon Lee, Simon Kelow, Ji Won Park, Andrew Watkins, Vladimir Gligorijević, Richard Bonneau, Stephen Ra, Kyunghyun Cho

Abstract: Deep generative modeling for biological sequences presents a unique challenge in reconciling the bias-variance trade-off between explicit biological insight and model flexibility. The deep manifold sampler was recently proposed as a means to iteratively sample variable-length protein sequences by exploiting the gradients from a function predictor. We introduce an alternative approach to this guide… ▽ More Deep generative modeling for biological sequences presents a unique challenge in reconciling the bias-variance trade-off between explicit biological insight and model flexibility. The deep manifold sampler was recently proposed as a means to iteratively sample variable-length protein sequences by exploiting the gradients from a function predictor. We introduce an alternative approach to this guided sampling procedure, multi-segment preserving sampling, that enables the direct inclusion of domain-specific knowledge by designating preserved and non-preserved segments along the input sequence, thereby restricting variation to only select regions. We present its effectiveness in the context of antibody design by training two models: a deep manifold sampler and a GPT-2 language model on nearly six million heavy chain sequences annotated with the IGHV1-18 gene. During sampling, we restrict variation to only the complementarity-determining region 3 (CDR3) of the input. We obtain log probability scores from a GPT-2 model for each sampled CDR3 and demonstrate that multi-segment preserving sampling generates reasonable designs while maintaining the desired, preserved regions. △ Less

Submitted 9 May, 2022; originally announced May 2022.

arXiv:2202.08928 [pdf, other]

"Back to the future" projections for COVID-19 surges

Authors: J. Sunil Rao, Tianhao Liu, Daniel Andrés Díaz-Pachón

Abstract: We argue that information from countries who had earlier COVID-19 surges can be used to inform another country's current model, then generating what we call back-to-the-future (BTF) projections. We show that these projections can be used to accurately predict future COVID-19 surges prior to an inflection point of the daily infection curve. We show, across 12 different countries from all populated… ▽ More We argue that information from countries who had earlier COVID-19 surges can be used to inform another country's current model, then generating what we call back-to-the-future (BTF) projections. We show that these projections can be used to accurately predict future COVID-19 surges prior to an inflection point of the daily infection curve. We show, across 12 different countries from all populated continents around the world, that our method can often predict future surges in scenarios where the traditional approaches would always predict no future surges. However, as expected, BTF projections cannot accurately predict a surge due to the emergence of a new variant. To generate BTF projections, we make use of a matching scheme for asynchronous time series combined with a response coaching SIR model. △ Less

Submitted 17 February, 2022; originally announced February 2022.

Comments: 21 pages, 7 figures

MSC Class: 92D25 (Primary) 92C60 92B15 62P10 62M10 (Secondary)

arXiv:2011.06455 [pdf]

doi 10.1098/rsos.210429

Optimal governance and implementation of vaccination programmes to contain the COVID-19 pandemic

Authors: Mahendra Piraveenan, Shailendra Sawleshwarkar, Michael Walsh, Iryna Zablotska, Samit Bhattacharyya, Habib Hassan Farooqui, Tarun Bhatnagar, Anup Karan, Manoj Murhekar, Sanjay Zodpey, K. S. Mallikarjuna Rao, Philippa Pattison, Albert Zomaya, Matjaz Perc

Abstract: Since the recent introduction of several viable vaccines for SARS-CoV-2, vaccination uptake has become the key factor that will determine our success in containing the COVID-19 pandemic. We argue that game theory and social network models should be used to guide decisions pertaining to vaccination programmes for the best possible results. In the months following the introduction of vaccines, their… ▽ More Since the recent introduction of several viable vaccines for SARS-CoV-2, vaccination uptake has become the key factor that will determine our success in containing the COVID-19 pandemic. We argue that game theory and social network models should be used to guide decisions pertaining to vaccination programmes for the best possible results. In the months following the introduction of vaccines, their availability and the human resources needed to run the vaccination programmes have been scarce in many countries. Vaccine hesitancy is also being encountered from some sections of the general public. We emphasize that decision-making under uncertainty and imperfect information, and with only conditionally optimal outcomes, is a unique forte of established game-theoretic modelling. Therefore, we can use this approach to obtain the best framework for modelling and simulating vaccination prioritization and uptake that will be readily available to inform important policy decisions for the optimal control of the COVID-19 pandemic. △ Less

Submitted 9 June, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

Comments: 15 pages, 1 figure; published in Royal Society Open Science

Journal ref: R. Soc. Open Sci. 8, 210429 (2021)

arXiv:2007.07426 [pdf, ps, other]

doi 10.1016/j.jtbi.2020.110556

A simple correction for COVID-19 sampling bias

Authors: Daniel Andrés Díaz-Pachón, J Sunil Rao

Abstract: COVID-19 testing has become a standard approach for estimating prevalence which then assist in public health decision making to contain and mitigate the spread of the disease. The sampling designs used are often biased in that they do not reflect the true underlying populations. For instance, individuals with strong symptoms are more likely to be tested than those with no symptoms. This results in… ▽ More COVID-19 testing has become a standard approach for estimating prevalence which then assist in public health decision making to contain and mitigate the spread of the disease. The sampling designs used are often biased in that they do not reflect the true underlying populations. For instance, individuals with strong symptoms are more likely to be tested than those with no symptoms. This results in biased estimates of prevalence (too high). Typical post-sampling corrections are not always possible. Here we present a simple bias correction methodology derived and adapted from a correction for publication bias in meta analysis studies. The methodology is general enough to allow a wide variety of customization making it more useful in practice. Implementation is easily done using already collected information. Via a simulation and two real datasets, we show that the bias corrections can provide dramatic reductions in estimation error. △ Less

Submitted 11 January, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

Comments: 14 pages. Title changed. The whole Section 7 with information from Lombardy, Italy, was added (another real dataset). Some typos were corrected. In spite of several lengthy additions, no substantial changes were done to the paper. The goal of the additions was more to clarify than to correct

MSC Class: 62D99

Journal ref: Journal of Theoretical Biology Journal of Theoretical Biology, Volume 512, 7 March 2021, 110556

arXiv:2003.07105 [pdf]

Analysis of genetic differences between psychiatric disorders: Exploring pathways and cell-types/tissues involved and ability to differentiate the disorders by polygenic scores

Authors: Shitao Rao, Liangying Yin, Yong Xiang, Hon-Cheong So

Abstract: Although displaying genetic correlations, psychiatric disorders are clinically defined as categorical entities as they each have distinguishing clinical features and may involve different treatments. Identifying differential genetic variations between these disorders may reveal how the disorders differ biologically and help to guide more personalized treatment. Here we presented a comprehensive… ▽ More Although displaying genetic correlations, psychiatric disorders are clinically defined as categorical entities as they each have distinguishing clinical features and may involve different treatments. Identifying differential genetic variations between these disorders may reveal how the disorders differ biologically and help to guide more personalized treatment. Here we presented a comprehensive analysis to identify genetic markers differentially associated with various psychiatric disorders/traits based on GWAS summary statistics, covering 18 psychiatric traits/disorders and 26 comparisons. We also conducted comprehensive analysis to unravel the genes, pathways and SNP functional categories involved, and the cell types and tissues implicated. We also assessed how well one could distinguish between psychiatric disorders by polygenic risk scores (PRS). SNP-based heritabilities (h2SNP) were significantly larger than zero for most comparisons. Based on current GWAS data, PRS have mostly modest power to distinguish between psychiatric disorders. For example, we estimated that AUC for distinguishing schizophrenia from major depressive disorder (MDD), bipolar disorder (BPD) from MDD and schizophrenia from BPD were 0.694, 0.602 and 0.618 respectively, while the maximum AUC (based on h2SNP) were 0.763, 0.749 and 0.726 respectively. We also uncovered differences in each pair of studied traits in terms of their differences in genetic correlation with comorbid traits. For example, clinically-defined MDD appeared to more strongly genetically correlated with other psychiatric disorders and heart disease, when compared to non-clinically-defined depression in UK Biobank. Our findings highlight genetic differences between psychiatric disorders and the mechanisms involved. PRS may aid differential diagnosis of selected psychiatric disorders in the future with larger GWAS samples. △ Less

Submitted 20 May, 2021; v1 submitted 16 March, 2020; originally announced March 2020.

arXiv:1902.09482 [pdf, ps, other]

doi 10.1016/bs.host.2017.05.003

Helminth Dynamics: Mean Number of Worms, Reproductive Rates

Authors: Arni S. R. Srinivasa Rao, Roy M. Anderson

Abstract: We derive formulas to compute mean number of worms in a newly Helminth infected population before secondary infections are started (population is closed). We have proved the two types of growth functions arise in this process as measurable functions. We derive formulas to compute mean number of worms in a newly Helminth infected population before secondary infections are started (population is closed). We have proved the two types of growth functions arise in this process as measurable functions. △ Less

Submitted 25 February, 2019; originally announced February 2019.

Comments: 13 pages

MSC Class: 92D30; 62P10

Journal ref: Handbook of Statist., 36, Elsevier/North-Holland, Amsterdam, 2017

arXiv:1811.03178 [pdf, ps, other]

doi 10.1016/bs.host.2020.02.001

Computation of life expectancy from incomplete data

Authors: Arni S. R. Srinivasa Rao, James R. Carey

Abstract: Estimating the human longevity and computing of life expectancy are central to the population dynamics. These aspects were studied seriously by scientists since fifteenth century, including renowned astronomer Edmund Halley. From basic principles of population dynamics, we propose a method to compute life expectancy from incomplete data. Estimating the human longevity and computing of life expectancy are central to the population dynamics. These aspects were studied seriously by scientists since fifteenth century, including renowned astronomer Edmund Halley. From basic principles of population dynamics, we propose a method to compute life expectancy from incomplete data. △ Less

Submitted 7 November, 2018; originally announced November 2018.

Comments: An alternate way to compute life expectancy when life tables are not available

MSC Class: 92D20

Journal ref: Handbook of Statistics, Elsevier Volume 43, 2020, Pages 379-389

arXiv:1811.03067 [pdf, ps, other]

doi 10.1007/s11538-019-00652-7

On the Three Properties of Stationary Populations and knotting with Non-Stationary Populations

Authors: Arni S. R. Srinivasa Rao, James R. Carey

Abstract: A population is considered stationary if the growth rate is zero and the age structure is constant. It thus follows that a population is considered non-stationary if either its growth rate is non-zero and/or its age structure is non-constant. We propose three properties that are related to the stationary population identity (SPI) of population biology by connecting it with stationary populations a… ▽ More A population is considered stationary if the growth rate is zero and the age structure is constant. It thus follows that a population is considered non-stationary if either its growth rate is non-zero and/or its age structure is non-constant. We propose three properties that are related to the stationary population identity (SPI) of population biology by connecting it with stationary populations and non-stationary populations which are approaching stationarity. One of these important properties is that SPI can be applied to partition a population into stationary and non-stationary components. These properties provide deeper insights into cohort formation in real-world populations and the length of the duration for which stationary and non-stationary conditions hold. The new concepts are based on the time gap between the occurrence of stationary and non-stationary populations within the SPI framework that we refer to as Oscillatory SPI and the Amplitude of SPI. This article will appear in Bulletin of Mathematical Biology (Springer) △ Less

Submitted 19 July, 2019; v1 submitted 7 November, 2018; originally announced November 2018.

Comments: 26 pages

MSC Class: 92D25; 60H35

Journal ref: Bulletin of Mathematical Biology (2019) (Springer) 81 (10), 4233-4250

arXiv:1810.01452 [pdf, ps, other]

A Partition Theorem for a Randomly Selected Large Population

Authors: Arni S. R. Srinivasa Rao

Abstract: We state and prove a theorem on the partitioning of a randomly selected large population into stationary and non-stationary components by using a property of stationary population identity. Applications of this theorem for practical purposes is summarized at the end. We state and prove a theorem on the partitioning of a randomly selected large population into stationary and non-stationary components by using a property of stationary population identity. Applications of this theorem for practical purposes is summarized at the end. △ Less

Submitted 1 July, 2021; v1 submitted 2 October, 2018; originally announced October 2018.

Comments: 12 pages, 4 figures. A new result in population dynamics

MSC Class: 92D25

Journal ref: Acta Biotheoretica (Springer) 2021

arXiv:1808.09828 [pdf, ps, other]

doi 10.1016/j.jtbi.2020.110243

True Epidemic Growth Construction Through Harmonic Analysis

Authors: Steven G. Krantz, Peter Polyakov, Arni S. R. Srinivasa Rao

Abstract: In this paper, we have proposed a two phase procedure (combining discrete graphs and wavelets) for constructing a true epidemic growth. In the first phase graph theory based approach was developed to update partial data available and in the second phase we used this partial data to generate a plausible complete data through wavelets. This procedure although novel and implementable, still leave som… ▽ More In this paper, we have proposed a two phase procedure (combining discrete graphs and wavelets) for constructing a true epidemic growth. In the first phase graph theory based approach was developed to update partial data available and in the second phase we used this partial data to generate a plausible complete data through wavelets. This procedure although novel and implementable, still leave some questions unanswered. △ Less

Submitted 4 September, 2018; v1 submitted 29 August, 2018; originally announced August 2018.

Comments: 16 pages, new Figure on Meyer wavelets added

MSC Class: 05C90; 42C40; 92D30

Journal ref: Journal of Theoretical Biology Volume 494, 7 June 2020, 110243

arXiv:1503.03373 [pdf, ps, other]

Population Stability and Momentum

Authors: Arni S. R. Srinivasa Rao

Abstract: A new approach is developed to understand stability of a population and further understanding of population momentum. A new approach is developed to understand stability of a population and further understanding of population momentum. △ Less

Submitted 10 February, 2015; originally announced March 2015.

Comments: Research Article

Journal ref: Notices of the American Mathematical Society (2014), 9, 61: 1062-1065

arXiv:1502.03041 [pdf, ps, other]

doi 10.1007/s00285-014-0831-6

Generalization of Carey's Equality and a Theorem on Stationary Population

Authors: Arni S. R. Srinivasa Rao, James R. Carey

Abstract: Carey's Equality pertaining to stationary models is well known. In this paper, we have stated and proved a fundamental theorem related to the formation of this Equality. This theorem will provide an in-depth understanding of the role of each captive subject, and their corresponding follow-up duration in a stationary population. We have demonstrated a numerical example of a captive cohort and the s… ▽ More Carey's Equality pertaining to stationary models is well known. In this paper, we have stated and proved a fundamental theorem related to the formation of this Equality. This theorem will provide an in-depth understanding of the role of each captive subject, and their corresponding follow-up duration in a stationary population. We have demonstrated a numerical example of a captive cohort and the survival pattern of medfly populations. These results can be adopted to understand age-structure and aging process in stationary and non-stationary population population models. Key words: Captive cohort, life expectancy, symmetric patterns. △ Less

Submitted 10 February, 2015; originally announced February 2015.

Journal ref: Journal of Mathematical Biology (2015), 71, 3: 583 - 594

arXiv:1212.2438 [pdf, other]

Model-order reduction of biochemical reaction networks

Authors: Shodhan Rao, Arjan van der Schaft, Karen van Eunen, Barbara M. Bakker, Bayu Jayawardhana

Abstract: In this paper we propose a model-order reduction method for chemical reaction networks governed by general enzyme kinetics, including the mass-action and Michaelis-Menten kinetics. The model-order reduction method is based on the Kron reduction of the weighted Laplacian matrix which describes the graph structure of complexes in the chemical reaction network. We apply our method to a yeast glycolys… ▽ More In this paper we propose a model-order reduction method for chemical reaction networks governed by general enzyme kinetics, including the mass-action and Michaelis-Menten kinetics. The model-order reduction method is based on the Kron reduction of the weighted Laplacian matrix which describes the graph structure of complexes in the chemical reaction network. We apply our method to a yeast glycolysis model, where the simulation result shows that the transient behaviour of a number of key metabolites of the reduced-order model is in good agreement with those of the full-order model. △ Less

Submitted 11 December, 2012; originally announced December 2012.

Comments: 7 pages, 5 figures. arXiv admin note: substantial text overlap with arXiv:1211.6643, arXiv:1110.6078

arXiv:1202.2688 [pdf, ps, other]

doi 10.1016/j.jtbi.2012.02.026

Understanding Theoretically The Impact of Reporting of Disease Cases in Epidemiology

Authors: Arni S. R. Srinivasa Rao

Abstract: In conducting preliminary analysis during an epidemic, data on reported disease cases offer key information in guiding the direction to the in-depth analysis. Models for growth and transmission dynamics are heavily dependent on preliminary analysis results. When a particular disease case is reported more than once or alternatively is never reported or detected in the population, then in such a sit… ▽ More In conducting preliminary analysis during an epidemic, data on reported disease cases offer key information in guiding the direction to the in-depth analysis. Models for growth and transmission dynamics are heavily dependent on preliminary analysis results. When a particular disease case is reported more than once or alternatively is never reported or detected in the population, then in such a situation, there is a possibility of existence of multiple reporting or under reporting in the population. In this work, a theoretical approach for studying reporting error in epidemiology is explored. The upper bound for the error that arises due to multiple reporting is higher than that which arises due to under reporting. Numerical examples are provided to support the arguments. This article mainly treats reporting error as deterministic and one can explore a stochastic model for the same. △ Less

Submitted 28 February, 2012; v1 submitted 13 February, 2012; originally announced February 2012.

Comments: 21 pages, 2 figures. To appear in Journal of Theoretical Biology (Elsevier)

MSC Class: 92D30; 26.70

Journal ref: (2012) Journal of Theoretical Biology 302:89-95

arXiv:1201.2467 [pdf, other]

doi 10.1007/s13235-012-0051-x

Evolutionary Stability Against Multiple Mutations

Authors: Anirban Ghatak, K. S. Mallikarjuna Rao, A. J. Shaiju

Abstract: It is known (see e.g. Weibull (1995)) that ESS is not robust against multiple mutations. In this article, we introduce robustness against multiple mutations and study some equivalent formulations and consequences. It is known (see e.g. Weibull (1995)) that ESS is not robust against multiple mutations. In this article, we introduce robustness against multiple mutations and study some equivalent formulations and consequences. △ Less

Submitted 11 January, 2012; originally announced January 2012.

Comments: Submitted article

MSC Class: 91A22

arXiv:1111.5213 [pdf, ps, other]

Multivariate Force of Mortality

Authors: Swagata Mitra, Pratyush Singh, Arni S. R. Srinivasa Rao

Abstract: In usual demographic analysis, force of mortality is a function of one variable, that is, of age. In this article bi-variate and multivariate force of mortality functions are introduced for the first time to explain mortality differentials. The pattern of mortality in a population is one of the strong influencing factors in determining the life expectancies at various ages in the population. Consi… ▽ More In usual demographic analysis, force of mortality is a function of one variable, that is, of age. In this article bi-variate and multivariate force of mortality functions are introduced for the first time to explain mortality differentials. The pattern of mortality in a population is one of the strong influencing factors in determining the life expectancies at various ages in the population. Considering univariate functions of age only to understand the human mortality data without associating with other variables could lead to incomplete analysis. The reasons behind declining forces of mortality globally could be studied using the proposed functions. Other applications of multivariate forces of mortality could be in actuarial sciences. △ Less

Submitted 25 February, 2019; v1 submitted 22 November, 2011; originally announced November 2011.

Comments: 18 pages, 3 figures. A technical note which introduces 2D and 3D functions of forces of mortality

MSC Class: 91D20; 65.0X

Journal ref: Demography India, Volume 44 (1&2), 2015, pages 1-16

arXiv:1110.6078 [pdf, ps, other]

On the Mathematical Structure of Balanced Chemical Reaction Networks Governed by Mass Action Kinetics

Authors: Arjan van der Schaft, Shodhan Rao, Bayu Jayawardhana

Abstract: Motivated by recent progress on the interplay between graph theory, dynamics, and systems theory, we revisit the analysis of chemical reaction networks described by mass action kinetics. For reaction networks possessing a thermodynamic equilibrium we derive a compact formulation exhibiting at the same time the structure of the complex graph and the stoichiometry of the network, and which admits a… ▽ More Motivated by recent progress on the interplay between graph theory, dynamics, and systems theory, we revisit the analysis of chemical reaction networks described by mass action kinetics. For reaction networks possessing a thermodynamic equilibrium we derive a compact formulation exhibiting at the same time the structure of the complex graph and the stoichiometry of the network, and which admits a direct thermodynamical interpretation. This formulation allows us to easily characterize the set of equilibria and their stability properties. Furthermore, we develop a framework for interconnection of chemical reaction networks. Finally we discuss how the established framework leads to a new approach for model reduction. △ Less

Submitted 27 October, 2011; originally announced October 2011.

arXiv:1104.1775 [pdf]

doi 10.1080/17441730.2011.608991

Biometric Cards for Indian Population: Role of Mathematical Models in Assisting and Planning

Authors: Arni S. R. Srinivasa Rao

Abstract: Mathematical models could be helpful in assisting the Indian Government's new initiative of issuing biometric cards to its citizens. In this note, we look into the role of mathematical models in estimating the missing, non-enumerated population numbers, estimating annual numbers of cards required by age, gender and regions in India. The linkage between National Population Register and biometric ca… ▽ More Mathematical models could be helpful in assisting the Indian Government's new initiative of issuing biometric cards to its citizens. In this note, we look into the role of mathematical models in estimating the missing, non-enumerated population numbers, estimating annual numbers of cards required by age, gender and regions in India. The linkage between National Population Register and biometric cards is also highlighted. See technical Appendices. There are other scientific issues, namely, electronic, data storage management, identity verification etc, which we do not address in this paper. △ Less

Submitted 28 April, 2011; v1 submitted 10 April, 2011; originally announced April 2011.

Comments: Short note

MSC Class: 92D25

Journal ref: Arni S. R. Srinivasa Rao (2011): Biometric Cards for Indian Population: Role of Mathematical Models in Assisting and Planning Asian Population Studies, 7:3, 295-300. Publisher: Routledge: Taylor & Francis Group

arXiv:1102.3793 [pdf, other]

doi 10.1016/bs.host.2017.07.005

Theoretical Framework and Empirical Modeling for Time Required to Vaccinate a Population in an Epidemic

Authors: Arni S. R. Srinivasa Rao, Thomas Kurien

Abstract: The paper describes a method to understand time required to vaccinate against viruses in total as well as subpopulations. As a demonstration, a model based estimate for time required to vaccinate H1N1 in India, given its administrative difficulties is provided. We have proved novel theorems for the time functions defined in the paper. Such results are useful in planning for future epidemics. The n… ▽ More The paper describes a method to understand time required to vaccinate against viruses in total as well as subpopulations. As a demonstration, a model based estimate for time required to vaccinate H1N1 in India, given its administrative difficulties is provided. We have proved novel theorems for the time functions defined in the paper. Such results are useful in planning for future epidemics. The number of days required to vaccinate entire high risk population in three subpopulations (villages, tehsils and towns) are noted to be 84, 89 and 88 respectively. There exists state wise disparities in the health infrastructure and capacities to deliver vaccines and hence national estimates need to be re-evaluated based on individual performances in the states. △ Less

Submitted 18 February, 2011; originally announced February 2011.

Comments: 14 pages, 1 Table, 5 Figures (A preliminary draft)

MSC Class: 92D30; 60E05; 26D07

Journal ref: Handbook of Statistics (2017), Volume 37

arXiv:0908.1515 [pdf]

doi 10.1074/jbc.M109.055863

Integration of a Phosphatase Cascade with the MAP Kinase Pathway provides for a Novel Signal Processing Function

Authors: Virendra K. Chaudhri, Dhiraj Kumar, Manjari Misra, Raina Dua, Kanury V. S. Rao

Abstract: We mathematically modeled the receptor-activated MAP kinase signaling by incorporating the regulation through cellular phosphatases. Activation induced the alignment of a phosphatase cascade in parallel with the MAP kinase pathway. A novel regulatory motif was thus generated, providing for the combinatorial control of each MAPK intermediate. This ensured a non-linear mode of signal transmission… ▽ More We mathematically modeled the receptor-activated MAP kinase signaling by incorporating the regulation through cellular phosphatases. Activation induced the alignment of a phosphatase cascade in parallel with the MAP kinase pathway. A novel regulatory motif was thus generated, providing for the combinatorial control of each MAPK intermediate. This ensured a non-linear mode of signal transmission with the output being shaped by the balance between the strength of input signal, and the activity gradient along the phosphatase axis. Shifts in this balance yielded modulations in topology of the motif, thereby expanding the repertoire of output responses. Thus we identify an added dimension to signal processing, wherein the output response to an external stimulus is additionally filtered through indicators that define the phenotypic status of the cell. △ Less

Submitted 11 August, 2009; originally announced August 2009.

Comments: Whole Manuscript 33 pages inclduing Main text, 7 Figures and Supporting Information

Journal ref: J Biol Chem 285,(2), 2010

arXiv:q-bio/0608028 [pdf, ps, other]

doi 10.1216/RMJ-2015-45-3-973

Incubation periods under various anti-retroviral therapies in homogeneous mixing and age-structured dynamical models: A theoretical approach

Authors: Arni S. R. Srinivasa Rao

Abstract: With the launch of second line anti-retroviral therapy for HIV infected individuals, there has been an increased expectation on surviving period of people with HIV. We consider previously well-known models in HIV epidemiology where the parameter for incubation period is used as one of the important components to explain the dynamics of the variables. Such models are extended here to explain the dy… ▽ More With the launch of second line anti-retroviral therapy for HIV infected individuals, there has been an increased expectation on surviving period of people with HIV. We consider previously well-known models in HIV epidemiology where the parameter for incubation period is used as one of the important components to explain the dynamics of the variables. Such models are extended here to explain the dynamics with respect to a given therapy that prolongs life of an HIV infected individual. A deconvolution method is demonstrated for estimation of parameters in the situations when no-therapy and multiple therapies are given to the infected population. The models and deconvolution method are extended in order to study the impact of therapy in age-structured populations. A generalization for a situation when n-types of therapies are available is given. Models are demonstrated using hypothetical data and sensitivity of the parameters are also computed. △ Less

Submitted 2 May, 2013; v1 submitted 15 August, 2006; originally announced August 2006.

Comments: 53 pages

MSC Class: 92D30; 44A35; 62A10

Journal ref: Rocky Mountain Journal of Mathematics, (2015), 45, 3: 973-1031

arXiv:physics/0207113 [pdf, ps, other]

doi 10.1103/PhysRevE.66.031913

Simplifying the mosaic description of DNA sequences

Authors: Rajeev K. Azad, J. Subba Rao, Wentian Li, Ramakrishna Ramaswamy

Abstract: By using the Jensen-Shannon divergence, genomic DNA can be divided into compositionally distinct domains through a standard recursive segmentation procedure. Each domain, while significantly different from its neighbours, may however share compositional similarity with one or more distant (non--neighbouring) domains. We thus obtain a coarse--grained description of the given DNA string in terms o… ▽ More By using the Jensen-Shannon divergence, genomic DNA can be divided into compositionally distinct domains through a standard recursive segmentation procedure. Each domain, while significantly different from its neighbours, may however share compositional similarity with one or more distant (non--neighbouring) domains. We thus obtain a coarse--grained description of the given DNA string in terms of a smaller set of distinct domain labels. This yields a minimal domain description of a given DNA sequence, significantly reducing its organizational complexity. This procedure gives a new means of evaluating genomic complexity as one examines organisms ranging from bacteria to human. The mosaic organization of DNA sequences could have originated from the insertion of fragments of one genome (the parasite) inside another (the host), and we present numerical experiments that are suggestive of this scenario. △ Less

Submitted 27 July, 2002; originally announced July 2002.

Comments: 16 pages, 1 figure, Accepted for publication in Phys. Rev. E

arXiv:physics/0202075 [pdf, ps, other]

Long range correlations in DNA sequences

Authors: A. K. Mohanty, A. V. S. S. Narayana Rao

Abstract: The so called long range correlation properties of DNA sequences are studied using the variance analyses of the density distribution of a single or a group of nucleotides in a model independent way. This new method which was suggested earlier has been applied to extract slope parameters that characterize the correlation properties for several intron containing and intron less DNA sequences. An i… ▽ More The so called long range correlation properties of DNA sequences are studied using the variance analyses of the density distribution of a single or a group of nucleotides in a model independent way. This new method which was suggested earlier has been applied to extract slope parameters that characterize the correlation properties for several intron containing and intron less DNA sequences. An important aspect of all the DNA sequences is the properties of complimentarity by virtue of which any two complimentary distributions (like GA is complimentary to TC or G is complimentary to ATC) have identical fluctuations at all scales although their distribution functions need not be identical. Due to this complimentarity, the famous DNA walk representation whose statistical interpretation is still unresolved is shown to be a special case of the present formalism with a density distribution corresponding to a purine or a pyrimidine group. Another interesting aspect of most of the DNA sequences is that the factorial moments as a function of length exceed unity around a region where the variance versus length in a log-log plot shows a bending. This is a pure phenomenological observation which is found for several DNA sequences with a few exception. Therefore, this length scale has been used as an approximate measure to exclude the bending regions from the slope analyses. The asymmetries in the nucleotide contents or the patchy structure as a possible origin of the long range correlations has also been investigated. △ Less

Submitted 20 March, 2002; v1 submitted 28 February, 2002; originally announced February 2002.

Comments: Latex 17 pages, 11 eps figures, appendix B added

arXiv:cond-mat/9503173 [pdf, ps, other]

doi 10.1016/0378-4371(95)00325-8

Protein Folding and Spin Glass

Authors: S. Suresh Rao, Somendra M. Bhattacharjee

Abstract: We explicitly show the connection between the protein folding problem and spin glass transition. This is then used to identify appropriate quantities that are required to describe the transition. A possible way of observing the spin glass transition is proposed. We explicitly show the connection between the protein folding problem and spin glass transition. This is then used to identify appropriate quantities that are required to describe the transition. A possible way of observing the spin glass transition is proposed. △ Less

Submitted 5 April, 1995; v1 submitted 30 March, 1995; originally announced March 1995.

Comments: Revtex3+epsf, 8 pages and one postscript figure tarred, compressed and uuencoded--appended at the end of the file. Minor TeX changes

Report number: IP/BBSR/95-23 March '95

Showing 1–33 of 33 results for author: Rao, S