-
On the importance of assessing topological convergence in Bayesian phylogenetic inference
Authors:
Marius Brusselmans,
Luiz Max Carvalho,
Samuel L. Hong,
Jiansi Gao,
Frederick A. Matsen IV,
Andrew Rambaut,
Philippe Lemey,
Marc A. Suchard,
Gytis Dudas,
Guy Baele
Abstract:
Modern phylogenetics research is often performed within a Bayesian framework, using sampling algorithms such as Markov chain Monte Carlo (MCMC) to approximate the posterior distribution. These algorithms require careful evaluation of the quality of the generated samples. Within the field of phylogenetics, one frequently adopted diagnostic approach is to evaluate the effective sample size (ESS) and…
▽ More
Modern phylogenetics research is often performed within a Bayesian framework, using sampling algorithms such as Markov chain Monte Carlo (MCMC) to approximate the posterior distribution. These algorithms require careful evaluation of the quality of the generated samples. Within the field of phylogenetics, one frequently adopted diagnostic approach is to evaluate the effective sample size (ESS) and to investigate trace graphs of the sampled parameters. A major limitation of these approaches is that they are developed for continuous parameters and therefore incompatible with a crucial parameter in these inferences: the tree topology. Several recent advancements have aimed at extending these diagnostics to topological space. In this short reflection paper, we present a case study illustrating how these topological diagnostics can contain information not found in standard diagnostics, and how decisions regarding which of these diagnostics to compute can impact inferences regarding MCMC convergence and mixing. Given the major importance of detecting convergence and mixing issues in Bayesian phylogenetic analyses, the lack of a unified approach to this problem warrants further action, especially now that additional tools are becoming available to researchers.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Online Bayesian phylodynamic inference in BEAST with application to epidemic reconstruction
Authors:
Mandev S. Gill,
Philippe Lemey,
Marc A. Suchard,
Andrew Rambaut,
Guy Baele
Abstract:
Reconstructing pathogen dynamics from genetic data as they become available during an outbreak or epidemic represents an important statistical scenario in which observations arrive sequentially in time and one is interested in performing inference in an 'online' fashion. Widely-used Bayesian phylogenetic inference packages are not set up for this purpose, generally requiring one to recompute trees…
▽ More
Reconstructing pathogen dynamics from genetic data as they become available during an outbreak or epidemic represents an important statistical scenario in which observations arrive sequentially in time and one is interested in performing inference in an 'online' fashion. Widely-used Bayesian phylogenetic inference packages are not set up for this purpose, generally requiring one to recompute trees and evolutionary model parameters de novo when new data arrive. To accommodate increasing data flow in a Bayesian phylogenetic framework, we introduce a methodology to efficiently update the posterior distribution with newly available genetic data. Our procedure is implemented in the BEAST 1.10 software package, and relies on a distance-based measure to insert new taxa into the current estimate of the phylogeny and imputes plausible values for new model parameters to accommodate growing dimensionality. This augmentation creates informed starting values and re-uses optimally tuned transition kernels for posterior exploration of growing data sets, reducing the time necessary to converge to target posterior distributions. We apply our framework to data from the recent West African Ebola virus epidemic and demonstrate a considerable reduction in time required to obtain posterior estimates at different time points of the outbreak. Beyond epidemic monitoring, this framework easily finds other applications within the phylogenetics community, where changes in the data -- in terms of alignment changes, sequence addition or removal -- present common scenarios that can benefit from online inference.
△ Less
Submitted 1 February, 2020;
originally announced February 2020.
-
Gradients do grow on trees: a linear-time ${\cal O}\hspace{-0.2em}\left( N \right)$-dimensional gradient for statistical phylogenetics
Authors:
Xiang Ji,
Zhenyu Zhang,
Andrew Holbrook,
Akihiko Nishimura,
Guy Baele,
Andrew Rambaut,
Philippe Lemey,
Marc A. Suchard
Abstract:
Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order ${\cal O}\hspace{-0.2em}\left( N \right)$-dimensional gradient calculations based on the standard pruning algorithm require ${\cal O}\hspace{-0.2em}\left( N^2 \right)$ operations where N…
▽ More
Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order ${\cal O}\hspace{-0.2em}\left( N \right)$-dimensional gradient calculations based on the standard pruning algorithm require ${\cal O}\hspace{-0.2em}\left( N^2 \right)$ operations where N is the number of sampled molecular sequences. With the advent of high-throughput sequencing, recent phylogenetic studies have analyzed hundreds to thousands of sequences, with an apparent trend towards even larger data sets as a result of advancing technology. Such large-scale analyses challenge phylogenetic reconstruction by requiring inference on larger sets of process parameters to model the increasing data heterogeneity. To make this tractable, we present a linear-time algorithm for ${\cal O}\hspace{-0.2em}\left( N \right)$-dimensional gradient evaluation and apply it to general continuous-time Markov processes of sequence substitution on a phylogenetic tree without a need to assume either stationarity or reversibility. We apply this approach to learn the branch-specific evolutionary rates of three pathogenic viruses: West Nile virus, Dengue virus and Lassa virus. Our proposed algorithm significantly improves inference efficiency with a 126- to 234-fold increase in maximum-likelihood optimization and a 16- to 33-fold computational performance increase in a Bayesian framework.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
Spatio-temporal Dynamics of Foot-and-Mouth Disease Virus in South America
Authors:
Luiz Max Carvalho,
Nuno Rodrigues Faria,
Andres M. Perez,
Marc A. Suchard,
Philippe Lemey,
Waldemir de Castro Silveira,
Andrew Rambaut,
Guy Baele
Abstract:
Although foot-and-mouth disease virus (FMDV) incidence has decreased in South America over the last years, the pathogen still circulates in the region and the risk of re-emergence in previously FMDV-free areas is a veterinary public health concern. In this paper we merge environmental, epidemiological and genetic data to reconstruct spatiotemporal patterns and determinants of FMDV serotypes A and…
▽ More
Although foot-and-mouth disease virus (FMDV) incidence has decreased in South America over the last years, the pathogen still circulates in the region and the risk of re-emergence in previously FMDV-free areas is a veterinary public health concern. In this paper we merge environmental, epidemiological and genetic data to reconstruct spatiotemporal patterns and determinants of FMDV serotypes A and O dispersal in South America. Our dating analysis suggests that serotype A emerged in South America around 1930, while serotype O emerged around 1990. The rate of evolution for serotype A was significantly higher compared to serotype O. Phylogeographic inference identified two well-connected sub networks of viral flow, one including Venezuela, Colombia and Ecuador; another including Brazil, Uruguay and Argentina. The spread of serotype A was best described by geographic distances, while trade of live cattle was the predictor that best explained serotype O spread. Our findings show that the two serotypes have different underlying evolutionary and spatial dynamics and may pose different threats to control programmes. Key-words: Phylogeography, foot-and-mouth disease virus, South America, animal trade.
△ Less
Submitted 1 June, 2015; v1 submitted 5 May, 2015;
originally announced May 2015.
-
Epidemic reconstruction in a phylogenetics framework: transmission trees as partitions
Authors:
Matthew Hall,
Andrew Rambaut
Abstract:
The reconstruction of transmission trees for epidemics from genetic data has been the subject of some recent interest. It has been demonstrated that the transmission tree structure can be investigated by augmenting internal nodes of a phylogenetic tree constructed using pathogen sequences from the epidemic with information about the host that held the corresponding lineage. In this paper, we note…
▽ More
The reconstruction of transmission trees for epidemics from genetic data has been the subject of some recent interest. It has been demonstrated that the transmission tree structure can be investigated by augmenting internal nodes of a phylogenetic tree constructed using pathogen sequences from the epidemic with information about the host that held the corresponding lineage. In this paper, we note that this augmentation is equivalent to a correspondence between transmission trees and partitions of the phylogenetic tree into connected subtrees each containing one tip, and provide a framework for Markov Chain Monte Carlo inference of phylogenies that are partitioned in this way, giving a new method to co-estimate both trees. The procedure is integrated in the existing phylogenetic inference package BEAST.
△ Less
Submitted 2 June, 2014;
originally announced June 2014.
-
Reassortment between influenza B lineages and the emergence of a co-adapted PB1-PB2-HA gene complex
Authors:
Gytis Dudas,
Trevor Bedford,
Samantha Lycett,
Andrew Rambaut
Abstract:
Influenza B viruses make a considerable contribution to morbidity attributed to seasonal influenza. Currently circulating influenza B isolates are known to belong to two antigenically distinct lineages referred to as B/Victoria and B/Yamagata. Frequent exchange of genomic segments of these two lineages has been noted in the past, but the observed patterns of reassortment have not been formalized i…
▽ More
Influenza B viruses make a considerable contribution to morbidity attributed to seasonal influenza. Currently circulating influenza B isolates are known to belong to two antigenically distinct lineages referred to as B/Victoria and B/Yamagata. Frequent exchange of genomic segments of these two lineages has been noted in the past, but the observed patterns of reassortment have not been formalized in detail. We investigate inter-lineage reassortments by comparing phylogenetic trees across genomic segments. Our analyses indicate that of the 8 segments of influenza B viruses only PB1, PB2 and HA segments maintained separate Victoria and Yamagata lineages and that currently circulating strains possess PB1, PB2 and HA segments derived entirely from one or the other lineage; other segments have repeatedly reassorted between lineages thereby reducing genetic diversity. We argue that this difference between segments is due to selection against reassortant viruses with mixed lineage PB1, PB2 and HA segments. Given sufficient time and continued recruitment to the reassortment-isolated PB1-PB2-HA gene complex, we expect influenza B viruses to eventually undergo sympatric speciation.
△ Less
Submitted 12 September, 2014; v1 submitted 5 March, 2014;
originally announced March 2014.
-
piBUSS: a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios
Authors:
Filip Bielejec,
Philippe Lemey,
Luiz Max Carvalho,
Guy Baele,
Andrew Rambaut,
Marc A. Suchard
Abstract:
Background: Simulated nucleotide or amino acid sequences are frequently used to assess the performance of phylogenetic reconstruction methods. BEAST, a Bayesian statistical framework that focuses on reconstructing time-calibrated molecular evolutionary processes, supports a wide array of evolutionary models, but lacked matching machinery for simulation of character evolution along phylogenies.
R…
▽ More
Background: Simulated nucleotide or amino acid sequences are frequently used to assess the performance of phylogenetic reconstruction methods. BEAST, a Bayesian statistical framework that focuses on reconstructing time-calibrated molecular evolutionary processes, supports a wide array of evolutionary models, but lacked matching machinery for simulation of character evolution along phylogenies.
Results: We present a flexible Monte Carlo simulation tool, called piBUSS, that employs the BEAGLE high performance library for phylogenetic computations within BEAST to rapidly generate large sequence alignments under complex evolutionary models. piBUSS sports a user-friendly graphical user interface (GUI) that allows combining a rich array of models across an arbitrary number of partitions. A command-line interface mirrors the options available through the GUI and facilitates scripting in large-scale simulation studies. Analogous to BEAST model and analysis setup, more advanced simulation options are supported through an extensible markup language (XML) specification, which in addition to generating sequence output, also allows users to combine simulation and analysis in a single BEAST run.
Conclusions: piBUSS offers a unique combination of flexibility and ease-of-use for sequence simulation under realistic evolutionary scenarios. Through different interfaces, piBUSS supports simulation studies ranging from modest endeavors for illustrative purposes to complex and large-scale assessments of evolutionary inference procedures. The software aims at implementing new models and data types that are continuously being developed as part of BEAST/BEAGLE.
△ Less
Submitted 17 December, 2013;
originally announced December 2013.
-
Inferring Heterogeneous Evolutionary Processes Through Time: from sequence substitution to phylogeography
Authors:
Filip Bielejec,
Philippe Lemey,
Guy Baele,
Andrew Rambaut,
Marc A Suchard
Abstract:
Molecular phylogenetic and phylogeographic reconstructions generally assume time-homogeneous substitution processes. Motivated by computational convenience, this assumption sacrifices biological realism and offers little opportunity to uncover the temporal dynamics in evolutionary histories. Here, we extend and generalize an evolutionary approach that relaxes the time-homogeneous process assumptio…
▽ More
Molecular phylogenetic and phylogeographic reconstructions generally assume time-homogeneous substitution processes. Motivated by computational convenience, this assumption sacrifices biological realism and offers little opportunity to uncover the temporal dynamics in evolutionary histories. Here, we extend and generalize an evolutionary approach that relaxes the time-homogeneous process assumption by allowing the specification of different infinitesimal substitution rate matrices across different time intervals, called epochs, along the evolutionary history. We focus on an epoch model implementation in a Bayesian inference framework that offers great modeling flexibility in drawing inference about any discrete data type characterized as a continuous-time Markov chain, including phylogeographic traits. To alleviate the computational burden that the additional temporal heterogeneity imposes, we adopt a massively parallel approach that achieves both fine- and coarse-grain parallelization of the computations across branches that accommodate epoch transitions, making extensive use of graphics processing units. Through synthetic examples, we assess model performance in recovering evolutionary parameters from data generated according to different evolutionary scenarios that comprise different numbers of epochs for both nucleotide and codon substitution processes. We illustrate the usefulness of our inference framework in two different applications to empirical data sets: the selection dynamics on within-host HIV populations throughout infection and the seasonality of global influenza circulation. In both cases, our epoch model captures key features of temporal heterogeneity that remained difficult to test using ad hoc procedures.
△ Less
Submitted 12 September, 2013;
originally announced September 2013.
-
Integrating influenza antigenic dynamics with molecular evolution
Authors:
Trevor Bedford,
Marc A. Suchard,
Philippe Lemey,
Gytis Dudas,
Victoria Gregory,
Alan J. Hay,
John W. McCauley,
Colin A. Russell,
Derek J. Smith,
Andrew Rambaut
Abstract:
Influenza viruses undergo continual antigenic evolution allowing mutant viruses to evade host immunity acquired to previous virus strains. Antigenic phenotype is often assessed through pairwise measurement of cross-reactivity between influenza strains using the hemagglutination inhibition (HI) assay. Here, we extend previous approaches to antigenic cartography, and simultaneously characterize anti…
▽ More
Influenza viruses undergo continual antigenic evolution allowing mutant viruses to evade host immunity acquired to previous virus strains. Antigenic phenotype is often assessed through pairwise measurement of cross-reactivity between influenza strains using the hemagglutination inhibition (HI) assay. Here, we extend previous approaches to antigenic cartography, and simultaneously characterize antigenic and genetic evolution by modeling the diffusion of antigenic phenotype over a shared virus phylogeny. Using HI data from influenza lineages A/H3N2, A/H1N1, B/Victoria and B/Yamagata, we determine patterns of antigenic drift across viral lineages, showing that A/H3N2 evolves faster and in a more punctuated fashion than other influenza lineages. We also show that year-to-year antigenic drift appears to drive incidence patterns within each influenza lineage. This work makes possible substantial future advances in investigating the dynamics of influenza and other antigenically-variable pathogens by providing a model that intimately combines molecular and antigenic evolution.
△ Less
Submitted 19 December, 2013; v1 submitted 12 April, 2013;
originally announced April 2013.
-
Fold or die: experimental evolution in vitro
Authors:
Sinéad Collins,
Andrew Rambaut,
Stephen J. Bridgett
Abstract:
We introduce a system for experimental evolution consisting of populations of short oligonucleotides (Oli populations) evolving in a modified quantitative Polymerase Chain Reaction (qPCR). It is tractable at the genetic, genomic, phenotypic and fitness levels. The Oli system uses DNA hairpins designed to form structures that self-prime under defined conditions. Selection acts on the phenotype of s…
▽ More
We introduce a system for experimental evolution consisting of populations of short oligonucleotides (Oli populations) evolving in a modified quantitative Polymerase Chain Reaction (qPCR). It is tractable at the genetic, genomic, phenotypic and fitness levels. The Oli system uses DNA hairpins designed to form structures that self-prime under defined conditions. Selection acts on the phenotype of self-priming, after which differences in fitness are amplified and quantified using qPCR. We outline the methodological and bioinformatics tools for the Oli system here, and demonstrate that it can be used as a conventional experimental evolution model system by test-driving it in an experiment investigating adaptive evolution under different rates of environmental change.
△ Less
Submitted 18 November, 2012;
originally announced November 2012.
-
The seasonal flight of influenza: a unified framework for spatiotemporal hypothesis testing
Authors:
Philippe Lemey,
Andrew Rambaut,
Trevor Bedford,
Nuno R. Faria,
Filip Bielejec,
Guy Baele,
Colin A. Russell,
Derek J. Smith,
Oliver G. Pybus,
Dirk Brockmann,
Marc A. Suchard
Abstract:
Global mobility flow data are at the heart of spatial epidemiological models used to predict infectious disease behavior but this wealth of data on human mobility has been largely neglected by reconstructions of pathogen evolutionary dynamics using viral genetic data. Although stochastic models of viral evolution may potentially be informed by such data, a major challenge lies in deciding which mo…
▽ More
Global mobility flow data are at the heart of spatial epidemiological models used to predict infectious disease behavior but this wealth of data on human mobility has been largely neglected by reconstructions of pathogen evolutionary dynamics using viral genetic data. Although stochastic models of viral evolution may potentially be informed by such data, a major challenge lies in deciding which mobility processes are critical and to what extent they contribute to sha** contemporaneous distributions of pathogen diversity. Here, we develop a framework to integrate predictors of viral diffusion with phylogeographic inference and estimate human influenza H3N2 migration history while simultaneously testing and quantifying the factors that underly it. We provide evidence for air travel governing the global dynamics of human influenza whereas other processes act at a more local scale.
△ Less
Submitted 22 October, 2012;
originally announced October 2012.
-
Canalization of the evolutionary trajectory of the human influenza virus
Authors:
Trevor Bedford,
Andrew Rambaut,
Mercedes Pascual
Abstract:
Since its emergence in 1968, influenza A (H3N2) has evolved extensively in genotype and antigenic phenotype. Antigenic evolution occurs in the context of a two-dimensional 'antigenic map', while genetic evolution shows a characteristic ladder-like genealogical tree. Here, we use a large-scale individual-based model to show that evolution in a Euclidean antigenic space provides a remarkable corresp…
▽ More
Since its emergence in 1968, influenza A (H3N2) has evolved extensively in genotype and antigenic phenotype. Antigenic evolution occurs in the context of a two-dimensional 'antigenic map', while genetic evolution shows a characteristic ladder-like genealogical tree. Here, we use a large-scale individual-based model to show that evolution in a Euclidean antigenic space provides a remarkable correspondence between model behavior and the epidemiological, antigenic, genealogical and geographic patterns observed in influenza virus. We find that evolution away from existing human immunity results in rapid population turnover in the influenza virus and that this population turnover occurs primarily along a single antigenic axis. Thus, selective dynamics induce a canalized evolutionary trajectory, in which the evolutionary fate of the influenza population is surprisingly repeatable and hence, in theory, predictable.
△ Less
Submitted 19 November, 2011;
originally announced November 2011.