Search | arXiv e-print repository

Detection of evolutionary shifts in variance under an Ornsten-Uhlenbeck model

Authors: Wensha Zhang, Lam Si Tung Ho, Toby Kenney

Abstract: 1. Abrupt environmental changes can lead to evolutionary shifts in not only mean (optimal value), but also variance of descendants in trait evolution. There are some methods to detect shifts in optimal value but few studies consider shifts in variance. 2. We use a multi-optima and multi-variance OU process model to describe the trait evolution process with shifts in both optimal value and variance… ▽ More 1. Abrupt environmental changes can lead to evolutionary shifts in not only mean (optimal value), but also variance of descendants in trait evolution. There are some methods to detect shifts in optimal value but few studies consider shifts in variance. 2. We use a multi-optima and multi-variance OU process model to describe the trait evolution process with shifts in both optimal value and variance and provide analysis of how the covariance between species changes when shifts in variance occur along the path. 3. We propose a new method to detect the shifts in both variance and optimal values based on minimizing the loss function with L1 penalty. We implement our method in a new R package, ShiVa (Detection of evolutionary shifts in variance). 4. We conduct simulations to compare our method with the two methods considering only shifts in optimal values (l1ou; PhylogeneticEM). Our method shows strength in predictive ability and includes far fewer false positive shifts in optimal value compared to other methods when shifts in variance actually exist. When there are only shifts in optimal value, our method performs similarly to other methods. We applied our method to the cordylid data, ShiVa outperformed l1ou and phyloEM, exhibiting the highest log-likelihood and lowest BIC. △ Less

Submitted 29 December, 2023; originally announced December 2023.

arXiv:2211.08277 [pdf, other]

doi 10.1007/s11538-023-01174-z

SPADE4: Sparsity and Delay Embedding based Forecasting of Epidemics

Authors: Esha Saha, Lam Si Tung Ho, Giang Tran

Abstract: Predicting the evolution of diseases is challenging, especially when the data availability is scarce and incomplete. The most popular tools for modelling and predicting infectious disease epidemics are compartmental models. They stratify the population into compartments according to health status and model the dynamics of these compartments using dynamical systems. However, these predefined system… ▽ More Predicting the evolution of diseases is challenging, especially when the data availability is scarce and incomplete. The most popular tools for modelling and predicting infectious disease epidemics are compartmental models. They stratify the population into compartments according to health status and model the dynamics of these compartments using dynamical systems. However, these predefined systems may not capture the true dynamics of the epidemic due to the complexity of the disease transmission and human interactions. In order to overcome this drawback, we propose Sparsity and Delay Embedding based Forecasting (SPADE4) for predicting epidemics. SPADE4 predicts the future trajectory of an observable variable without the knowledge of the other variables or the underlying system. We use random features model with sparse regression to handle the data scarcity issue and employ Takens' delay embedding theorem to capture the nature of the underlying system from the observed variable. We show that our approach outperforms compartmental models when applied to both simulated and real data. △ Less

Submitted 13 June, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: 24 pages, 13 figures, 2 tables

Journal ref: Bull.Math.Bio.85.8 (2023) 71

arXiv:2207.12897 [pdf, other]

When can we reconstruct the ancestral state? Beyond Brownian motion

Authors: Nhat L. Vu, Thanh P. Nguyen, Binh T. Nguyen, Vu Dinh, Lam Si Tung Ho

Abstract: Reconstructing the ancestral state of a group of species helps answer many important questions in evolutionary biology. Therefore, it is crucial to understand when we can estimate the ancestral state accurately. Previous works provide a necessary and sufficient condition, called the big bang condition, for the existence of an accurate reconstruction method under discrete trait evolution models and… ▽ More Reconstructing the ancestral state of a group of species helps answer many important questions in evolutionary biology. Therefore, it is crucial to understand when we can estimate the ancestral state accurately. Previous works provide a necessary and sufficient condition, called the big bang condition, for the existence of an accurate reconstruction method under discrete trait evolution models and the Brownian motion model. In this paper, we extend this result to a wide range of continuous trait evolution models. In particular, we consider a general setting where continuous traits evolve along the tree according to stochastic processes that satisfy some regularity conditions. We verify these conditions for popular continuous trait evolution models including Ornstein-Uhlenbeck, reflected Brownian Motion, and Cox-Ingersoll-Ross. △ Less

Submitted 19 April, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

arXiv:2204.06032 [pdf, other]

Evolutionary shift detection with ensemble variable selection

Authors: Wensha Zhang, Toby Kenney, Lam Si Tung Ho

Abstract: 1. Abrupt environmental changes can lead to evolutionary shifts in trait evolution. Identifying these shifts is an important step in understanding the evolutionary history of phenotypes. 2. We propose an ensemble variable selection method (R package ELPASO) for the evolutionary shift detection task and compare it with existing methods (R packages l1ou and PhylogeneticEM) under several scenarios.… ▽ More 1. Abrupt environmental changes can lead to evolutionary shifts in trait evolution. Identifying these shifts is an important step in understanding the evolutionary history of phenotypes. 2. We propose an ensemble variable selection method (R package ELPASO) for the evolutionary shift detection task and compare it with existing methods (R packages l1ou and PhylogeneticEM) under several scenarios. 3. The performances of methods are highly dependent on the selection criterion. When the signal sizes are small, the methods using the Bayesian information criterion (BIC) have better performances. And when the signal sizes are large enough, the methods using the phylogenetic Bayesian information criterion (pBIC) (Khabbazian et al., 2016) have better performance. Moreover, the performance is heavily impacted by measurement error and tree reconstruction error. 4. Ensemble method + pBIC tends to perform less conservatively than l1ou + pBIC, and Ensemble method + BIC is more conservatively than l1ou + BIC. PhylogeneticEM is even more conservative with small signal sizes and falls between l1ou + pBIC and Ensemble method + BIC with large signal sizes. The results can differ between the methods, but none clearly outperforms the others. By applying multiple methods to a single dataset, we can access the robustness of each detected shift, based on the agreement among methods. △ Less

Submitted 12 April, 2022; originally announced April 2022.

arXiv:2111.07445 [pdf, other]

When can we reconstruct the ancestral state? A unified theory

Authors: Lam Si Tung Ho, Vu Dinh

Abstract: Ancestral state reconstruction is one of the most important tasks in evolutionary biology. Conditions under which we can reliably reconstruct the ancestral state have been studied for both discrete and continuous traits. However, the connection between these results is unclear, and it seems that each model needs different conditions. In this work, we provide a unifying theory on the consistency of… ▽ More Ancestral state reconstruction is one of the most important tasks in evolutionary biology. Conditions under which we can reliably reconstruct the ancestral state have been studied for both discrete and continuous traits. However, the connection between these results is unclear, and it seems that each model needs different conditions. In this work, we provide a unifying theory on the consistency of ancestral state reconstruction for various types of trait evolution models. Notably, we show that for a sequence of nested trees with bounded heights, the necessary and sufficient conditions for the existence of a consistent ancestral state reconstruction method under discrete models, the Brownian motion model, and the threshold model are equivalent. When tree heights are unbounded, we provide a simple counter-example to show that this equivalence is no longer valid. △ Less

Submitted 14 November, 2021; originally announced November 2021.

arXiv:2105.01723 [pdf, ps, other]

Convergence of maximum likelihood supertree reconstruction

Authors: Lam Si Tung Ho, Vu Dinh

Abstract: Supertree methods are tree reconstruction techniques that combine several smaller gene trees (possibly on different sets of species) to build a larger species tree. The question of interest is whether the reconstructed supertree converges to the true species tree as the number of gene trees increases (that is, the consistency of supertree methods). In this paper, we are particularly interested in… ▽ More Supertree methods are tree reconstruction techniques that combine several smaller gene trees (possibly on different sets of species) to build a larger species tree. The question of interest is whether the reconstructed supertree converges to the true species tree as the number of gene trees increases (that is, the consistency of supertree methods). In this paper, we are particularly interested in the convergence rate of the maximum likelihood supertree. Previous studies on the maximum likelihood supertree approach often formulate the question of interest as a discrete problem and focus on reconstructing the correct topology of the species tree. Aiming to reconstruct both the topology and the branch lengths of the species tree, we propose an analytic approach for analyzing the convergence of the maximum likelihood supertree method. Specifically, we consider each tree as one point of a metric space and prove that the distance between the maximum likelihood supertree and the species tree converges to zero at a polynomial rate under some mild conditions. We further verify these conditions for the popular exponential error model of gene trees. △ Less

Submitted 4 May, 2021; originally announced May 2021.

arXiv:2104.00151 [pdf, other]

Ancestral state reconstruction with large numbers of sequences and edge-length estimation

Authors: Lam Si Tung Ho, Edward Susko

Abstract: Likelihood-based methods are widely considered the best approaches for reconstructing ancestral states. Although much effort has been made to study properties of these methods, previous works often assume that both the tree topology and edge lengths are known. In some scenarios the tree topology might be reasonably well known for the taxa under study. When sequence length is much smaller than the… ▽ More Likelihood-based methods are widely considered the best approaches for reconstructing ancestral states. Although much effort has been made to study properties of these methods, previous works often assume that both the tree topology and edge lengths are known. In some scenarios the tree topology might be reasonably well known for the taxa under study. When sequence length is much smaller than the number of species, however, edge lengths are not likely to be accurately estimated. We study the consistency of the maximum likelihood and empirical Bayes estimators of ancestral state of discrete traits in such settings under a star tree. We prove that the likelihood-based reconstruction is consistent under symmetric models but can be inconsistent under non-symmetric models. We show, however, that a simple consistent estimator for the ancestral states is available under non-symmetric models. The results illustrate that likelihood methods can unexpectedly have undesirable properties as the number of sequences considered get very large. Broader implications of the results are discussed. △ Less

Submitted 31 March, 2021; originally announced April 2021.

arXiv:2003.10336 [pdf, other]

Efficient Bayesian Inference of General Gaussian Models on Large Phylogenetic Trees

Authors: Paul Bastide, Lam Si Tung Ho, Guy Baele, Philippe Lemey, Marc A Suchard

Abstract: Phylogenetic comparative methods correct for shared evolutionary history among a set of non-independent organisms by modeling sample traits as arising from a diffusion process along on the branches of a possibly unknown history. To incorporate such uncertainty, we present a scalable Bayesian inference framework under a general Gaussian trait evolution model that exploits Hamiltonian Monte Carlo (H… ▽ More Phylogenetic comparative methods correct for shared evolutionary history among a set of non-independent organisms by modeling sample traits as arising from a diffusion process along on the branches of a possibly unknown history. To incorporate such uncertainty, we present a scalable Bayesian inference framework under a general Gaussian trait evolution model that exploits Hamiltonian Monte Carlo (HMC). HMC enables efficient sampling of the constrained model parameters and takes advantage of the tree structure for fast likelihood and gradient computations, yielding algorithmic complexity linear in the number of observations. This approach encompasses a wide family of stochastic processes, including the general Ornstein-Uhlenbeck (OU) process, with possible missing data and measurement errors. We implement inference tools for a biologically relevant subset of all these models into the BEAST phylogenetic software package and develop model comparison through marginal likelihood estimation. We apply our approach to study the morphological evolution in the superfamilly of Musteloidea (including weasels and allies) as well as the heritability of HIV virulence. This second problem furnishes a new measure of evolutionary heritability that demonstrates its utility through a targeted simulation study. △ Less

Submitted 29 September, 2020; v1 submitted 23 March, 2020; originally announced March 2020.

arXiv:1903.03919 [pdf, other]

doi 10.1007/s00285-019-01453-1

On the convergence of the maximum likelihood estimator for the transition rate under a 2-state symmetric model

Authors: Lam Si Tung Ho, Vu Dinh, Frederick A. Matsen IV, Marc A. Suchard

Abstract: Maximum likelihood estimators are used extensively to estimate unknown parameters of stochastic trait evolution models on phylogenetic trees. Although the MLE has been proven to converge to the true value in the independent-sample case, we cannot appeal to this result because trait values of different species are correlated due to shared evolutionary history. In this paper, we consider a $2$-state… ▽ More Maximum likelihood estimators are used extensively to estimate unknown parameters of stochastic trait evolution models on phylogenetic trees. Although the MLE has been proven to converge to the true value in the independent-sample case, we cannot appeal to this result because trait values of different species are correlated due to shared evolutionary history. In this paper, we consider a $2$-state symmetric model for a single binary trait and investigate the theoretical properties of the MLE for the transition rate in the large-tree limit. Here, the large-tree limit is a theoretical scenario where the number of taxa increases to infinity and we can observe the trait values for all species. Specifically, we prove that the MLE converges to the true value under some regularity conditions. These conditions ensure that the tree shape is not too irregular, and holds for many practical scenarios such as trees with bounded edges, trees generated from the Yule (pure birth) process, and trees generated from the coalescent point process. Our result also provides an upper bound for the distance between the MLE and the true value. △ Less

Submitted 24 November, 2019; v1 submitted 9 March, 2019; originally announced March 2019.

arXiv:1706.01643 [pdf]

Retrosynthetic reaction prediction using neural sequence-to-sequence models

Authors: Bowen Liu, Bharath Ramsundar, Prasad Kawthekar, Jade Shi, Joseph Gomes, Quang Luu Nguyen, Stephen Ho, Jack Sloane, Paul Wender, Vijay Pande

Abstract: We describe a fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence map** problem. The end-to-end trained model has an encoder-decoder architecture that consists of two recurrent neural networks, which has previously shown great success in solving other sequence-to-sequence prediction tasks such as machine translation… ▽ More We describe a fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence map** problem. The end-to-end trained model has an encoder-decoder architecture that consists of two recurrent neural networks, which has previously shown great success in solving other sequence-to-sequence prediction tasks such as machine translation. The model is trained on 50,000 experimental reaction examples from the United States patent literature, which span 10 broad reaction types that are commonly used by medicinal chemists. We find that our model performs comparably with a rule-based expert system baseline model, and also overcomes certain limitations associated with rule-based expert systems and with any machine learning approach that contains a rule-based expert system component. Our model provides an important first step towards solving the challenging problem of computational retrosynthetic analysis. △ Less

Submitted 6 June, 2017; originally announced June 2017.

arXiv:1608.06769 [pdf, other]

Direct likelihood-based inference for discretely observed stochastic compartmental models of infectious disease

Authors: Lam Si Tung Ho, Forrest W. Crawford, Marc A. Suchard

Abstract: Stochastic compartmental models are important tools for understanding the course of infectious diseases epidemics in populations and in prospective evaluation of intervention policies. However, calculating the likelihood for discretely observed data from even simple models -- such as the ubiquitous susceptible-infectious-removed (SIR) model -- has been considered computationally intractable, since… ▽ More Stochastic compartmental models are important tools for understanding the course of infectious diseases epidemics in populations and in prospective evaluation of intervention policies. However, calculating the likelihood for discretely observed data from even simple models -- such as the ubiquitous susceptible-infectious-removed (SIR) model -- has been considered computationally intractable, since its formulation almost a century ago. Recently researchers have proposed methods to circumvent this limitation through data augmentation or approximation, but these approaches often suffer from high computational cost or loss of accuracy. We develop the mathematical foundation and an efficient algorithm to compute the likelihood for discretely observed data from a broad class of stochastic compartmental models. We also give expressions for the derivatives of the transition probabilities using the same technique, making possible inference via Hamiltonian Monte Carlo (HMC). We use the 17th century plague in Eyam, a classic example of the SIR model, to compare our recursion method to sequential Monte Carlo, analyze using HMC, and assess the model assumptions. We also apply our direct likelihood evaluation to perform Bayesian inference for the 2014-2015 Ebola outbreak in Guinea. The results suggest that the epidemic infectious rates have decreased since October 2014 in the Southeast region of Guinea, while rates remain the same in other regions, facilitating understanding of the outbreak and the effectiveness of Ebola control interventions. △ Less

Submitted 25 July, 2018; v1 submitted 24 August, 2016; originally announced August 2016.

arXiv:1608.05364 [pdf, other]

doi 10.1016/j.bpj.2017.03.018

Pulsatile lipid vesicles under osmotic stress

Authors: Morgan Chabanon, James C. S. Ho, Bo Liedberg, Atul N. Parikh, Padmini Rangamani

Abstract: The response of lipid bilayers to osmotic stress is an important part of cellular function. Previously, in [Oglecka et al. 2014], we reported that cell-sized giant unilamellar vesicles (GUVs) exposed to hypotonic media, respond to the osmotic assault by undergoing a cyclical sequence of swelling and bursting events, coupled to the membrane's compositional degrees of freedom. Here, we seek to deepe… ▽ More The response of lipid bilayers to osmotic stress is an important part of cellular function. Previously, in [Oglecka et al. 2014], we reported that cell-sized giant unilamellar vesicles (GUVs) exposed to hypotonic media, respond to the osmotic assault by undergoing a cyclical sequence of swelling and bursting events, coupled to the membrane's compositional degrees of freedom. Here, we seek to deepen our quantitative understanding of the essential pulsatile behavior of GUVs under hypotonic conditions, by advancing a comprehensive theoretical model for vesicle dynamics. The model quantitatively captures our experimentally measured swell-burst parameters for single-component GUVs, and reveals that thermal fluctuations enable rate dependent pore nucleation, driving the dynamics of the swell-burst cycles. We further identify new scaling relationships between the pulsatile dynamics and GUV properties. Our findings provide a fundamental framework that has the potential to guide future investigations on the non-equilibrium dynamics of vesicles under osmotic stress. △ Less

Submitted 2 May, 2017; v1 submitted 18 August, 2016; originally announced August 2016.

Journal ref: Biophysical Journal 112, 1682-1691, April 25, 2017

arXiv:1606.03059 [pdf, other]

Consistency and convergence rate of phylogenetic inference via regularization

Authors: Vu Dinh, Lam Si Tung Ho, Marc A. Suchard, Frederick A. Matsen IV

Abstract: It is common in phylogenetics to have some, perhaps partial, information about the overall evolutionary tree of a group of organisms and wish to find an evolutionary tree of a specific gene for those organisms. There may not be enough information in the gene sequences alone to accurately reconstruct the correct "gene tree." Although the gene tree may deviate from the "species tree" due to a variet… ▽ More It is common in phylogenetics to have some, perhaps partial, information about the overall evolutionary tree of a group of organisms and wish to find an evolutionary tree of a specific gene for those organisms. There may not be enough information in the gene sequences alone to accurately reconstruct the correct "gene tree." Although the gene tree may deviate from the "species tree" due to a variety of genetic processes, in the absence of evidence to the contrary it is parsimonious to assume that they agree. A common statistical approach in these situations is to develop a likelihood penalty to incorporate such additional information. Recent studies using simulation and empirical data suggest that a likelihood penalty quantifying concordance with a species tree can significantly improve the accuracy of gene tree reconstruction compared to using sequence data alone. However, the consistency of such an approach has not yet been established, nor have convergence rates been bounded. Because phylogenetics is a non-standard inference problem, the standard theory does not apply. In this paper, we propose a penalized maximum likelihood estimator for gene tree reconstruction, where the penalty is the square of the Billera-Holmes-Vogtmann geodesic distance from the gene tree to the species tree. We prove that this method is consistent, and derive its convergence rate for estimating the discrete gene tree structure and continuous edge lengths (representing the amount of evolution that has occurred on that branch) simultaneously. We find that the regularized estimator is "adaptive fast converging," meaning that it can reconstruct all edges of length greater than any given threshold from gene sequences of polynomial length. Our method does not require the species tree to be known exactly; in fact, our asymptotic theory holds for any such guide tree. △ Less

Submitted 5 January, 2018; v1 submitted 9 June, 2016; originally announced June 2016.

Comments: 34 pages, 5 figures. To appear on The Annals of Statistics

MSC Class: 05C05; 62F12 (Primary); 92B10; 92D15 (Secondary)

arXiv:1512.07948 [pdf, other]

A Relaxed Drift Diffusion Model for Phylogenetic Trait Evolution

Authors: Mandev S. Gill, Lam Si Tung Ho, Guy Baele, Philippe Lemey, Marc A. Suchard

Abstract: Understanding the processes that give rise to quantitative measurements associated with molecular sequence data remains an important issue in statistical phylogenetics. Examples of such measurements include geographic coordinates in the context of phylogeography and phenotypic traits in the context of comparative studies. A popular approach is to model the evolution of continuously varying traits… ▽ More Understanding the processes that give rise to quantitative measurements associated with molecular sequence data remains an important issue in statistical phylogenetics. Examples of such measurements include geographic coordinates in the context of phylogeography and phenotypic traits in the context of comparative studies. A popular approach is to model the evolution of continuously varying traits as a Brownian diffusion process. However, standard Brownian diffusion is quite restrictive and may not accurately characterize certain trait evolutionary processes. Here, we relax one of the major restrictions of standard Brownian diffusion by incorporating a nontrivial estimable drift into the process. We introduce a relaxed drift diffusion model for the evolution of multivariate continuously varying traits along a phylogenetic tree via Brownian diffusion with drift. Notably, the relaxed drift model accommodates branch-specific variation of drift rates while preserving model identifiability. We implement the relaxed drift model in a Bayesian inference framework to simultaneously reconstruct the evolutionary histories of molecular sequence data and associated multivariate continuous trait data, and provide tools to visualize evolutionary reconstructions. We illustrate our approach in three viral examples. In the first two, we examine the spatiotemporal spread of HIV-1 in central Africa and West Nile virus in North America and show that a relaxed drift approach uncovers a clearer, more detailed picture of the dynamics of viral dispersal than standard Brownian diffusion. Finally, we study antigenic evolution in the context of HIV-1 resistance to three broadly neutralizing antibodies. Our analysis reveals evidence of a continuous drift at the HIV-1 population level towards enhanced resistance to neutralization by the VRC01 monoclonal antibody over the course of the epidemic. △ Less

Submitted 29 December, 2015; v1 submitted 24 December, 2015; originally announced December 2015.

Comments: 35 pages, 3 figures, 5 tables. Changed from double-spaced to single-spaced

arXiv:1503.01493 [pdf, ps, other]

doi 10.1103/PhysRevApplied.4.024001

Self-tracking Energy Transfer for Neural Stimulation in Untethered Mice

Authors: John S. Ho, Yuji Tanabe, Shrivats Mohan Iyer, Amelia J. Christensen, Logan Grosenick, Karl Deisseroth, Scott L. Delp, Ada S. Y. Poon

Abstract: Optical or electrical stimulation of neural circuits in mice during natural behavior is an important paradigm for studying brain function. Conventional systems for optogenetics and electrical microstimulation require tethers or large head-mounted devices that disrupt animal behavior. We report a method for wireless powering of small-scale implanted devices based on the strong localization of energ… ▽ More Optical or electrical stimulation of neural circuits in mice during natural behavior is an important paradigm for studying brain function. Conventional systems for optogenetics and electrical microstimulation require tethers or large head-mounted devices that disrupt animal behavior. We report a method for wireless powering of small-scale implanted devices based on the strong localization of energy that occurs during resonant interaction between a radio-frequency cavity and intrinsic modes in mice. The system features self-tracking over a wide (16 cm diameter) operational area, and is used to demonstrate wireless activation of cortical neurons with miniaturized stimulators (10 mm$^{3}$, 20 mg) fully implanted under the skin. △ Less

Submitted 4 March, 2015; originally announced March 2015.

arXiv:1411.7338 [pdf, other]

Bounds on the Expected Size of the Maximum Agreement Subtree

Authors: Daniel Irving Bernstein, Lam Si Tung Ho, Colby Long, Mike Steel, Katherine St. John, Seth Sullivant

Abstract: We prove polynomial upper and lower bounds on the expected size of the maximum agreement subtree of two random binary phylogenetic trees under both the uniform distribution and Yule-Harding distribution. This positively answers a question posed in earlier work. Determining tight upper and lower bounds remains an open problem. We prove polynomial upper and lower bounds on the expected size of the maximum agreement subtree of two random binary phylogenetic trees under both the uniform distribution and Yule-Harding distribution. This positively answers a question posed in earlier work. Determining tight upper and lower bounds remains an open problem. △ Less

Submitted 31 August, 2015; v1 submitted 26 November, 2014; originally announced November 2014.

Comments: Revised version

arXiv:1406.1568 [pdf, other]

Phase transition on the convergence rate of parameter estimation under an Ornstein-Uhlenbeck diffusion on a tree

Authors: Cécile Ané, Lam Si Tung Ho, Sebastien Roch

Abstract: Diffusion processes on trees are commonly used in evolutionary biology to model the joint distribution of continuous traits, such as body mass, across species. Estimating the parameters of such processes from tip values presents challenges because of the intrinsic correlation between the observations produced by the shared evolutionary history, thus violating the standard independence assumption o… ▽ More Diffusion processes on trees are commonly used in evolutionary biology to model the joint distribution of continuous traits, such as body mass, across species. Estimating the parameters of such processes from tip values presents challenges because of the intrinsic correlation between the observations produced by the shared evolutionary history, thus violating the standard independence assumption of large-sample theory. For instance Ho and Ané \cite{HoAne13} recently proved that the mean (also known in this context as selection optimum) of an Ornstein-Uhlenbeck process on a tree cannot be estimated consistently from an increasing number of tip observations if the tree height is bounded. Here, using a fruitful connection to the so-called reconstruction problem in probability theory, we study the convergence rate of parameter estimation in the unbounded height case. For the mean of the process, we provide a necessary and sufficient condition for the consistency of the maximum likelihood estimator (MLE) and establish a phase transition on its convergence rate in terms of the growth of the tree. In particular we show that a loss of $\sqrt{n}$-consistency (i.e., the variance of the MLE becomes $Ω(n^{-1})$, where $n$ is the number of tips) occurs when the tree growth is larger than a threshold related to the phase transition of the reconstruction problem. For the covariance parameters, we give a novel, efficient estimation method which achieves $\sqrt{n}$-consistency under natural assumptions on the tree. △ Less

Submitted 25 May, 2016; v1 submitted 5 June, 2014; originally announced June 2014.

arXiv:1306.1322 [pdf, ps, other]

doi 10.1214/13-AOS1105

Asymptotic theory with hierarchical autocorrelation: Ornstein-Uhlenbeck tree models

Authors: Lam Si Tung Ho, Cécile Ané

Abstract: Hierarchical autocorrelation in the error term of linear models arises when sampling units are related to each other according to a tree. The residual covariance is parametrized using the tree-distance between sampling units. When observations are modeled using an Ornstein-Uhlenbeck (OU) process along the tree, the autocorrelation between two tips decreases exponentially with their tree distance.… ▽ More Hierarchical autocorrelation in the error term of linear models arises when sampling units are related to each other according to a tree. The residual covariance is parametrized using the tree-distance between sampling units. When observations are modeled using an Ornstein-Uhlenbeck (OU) process along the tree, the autocorrelation between two tips decreases exponentially with their tree distance. These models are most often applied in evolutionary biology, when tips represent biological species and the OU process parameters represent the strength and direction of natural selection. For these models, we show that the mean is not microergodic: no estimator can ever be consistent for this parameter and provide a lower bound for the variance of its MLE. For covariance parameters, we give a general sufficient condition ensuring microergodicity. This condition suggests that some parameters may not be estimated at the same rate as others. We show that, indeed, maximum likelihood estimators of the autocorrelation parameter converge at a slower rate than that of generally microergodic parameters. We showed this theoretically in a symmetric tree asymptotic framework and through simulations on a large real tree comprising 4507 mammal species. △ Less

Submitted 6 June, 2013; originally announced June 2013.

Comments: Published in at http://dx.doi.org/10.1214/13-AOS1105 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1105

Journal ref: Annals of Statistics 2013, Vol. 41, No. 2, 957-981

Showing 1–18 of 18 results for author: Ho, S