-
A Vector Representation for Phylogenetic Trees
Authors:
Cedric Chauve,
Caroline Colijn,
Louxin Zhang
Abstract:
Good representations for phylogenetic trees and networks are important for optimizing storage efficiency and implementation of scalable methods for the inference and analysis of evolutionary trees for genes, genomes and species. We introduce a new representation for rooted phylogenetic trees that encodes a binary tree on n taxa as a vector of length 2n in which each taxon appears exactly twice. Us…
▽ More
Good representations for phylogenetic trees and networks are important for optimizing storage efficiency and implementation of scalable methods for the inference and analysis of evolutionary trees for genes, genomes and species. We introduce a new representation for rooted phylogenetic trees that encodes a binary tree on n taxa as a vector of length 2n in which each taxon appears exactly twice. Using this new tree representation, we introduce a novel tree rearrangement operator, called a HOP, that results in a tree space of diameter n and a quadratic neighbourhood size. We also introduce a novel metric, the HOP distance, which is the minimum number of HOPs to transform a tree into another tree. The HOP distance can be computed in near-linear time, a rare instance of a tree rearrangement distance that is tractable. Our experiments show that the HOP distance is better correlated to the Subtree-Prune-and-Regraft distance than the widely used Robinson-Foulds distance. We also describe how the novel tree representation we introduce can be further generalized to tree-child networks.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models
Authors:
Cedric Chauve,
Yann Ponty,
Michael Wallner
Abstract:
Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including - but not limited to - speciation, gene duplication, gene loss, horizontal gene transfer. The reconstruction of a gene tree representing the evolution…
▽ More
Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including - but not limited to - speciation, gene duplication, gene loss, horizontal gene transfer. The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the DLT-model). We introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the DLT-model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and the complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfer induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the DLT-model is almost independent of the species tree topology.
△ Less
Submitted 13 May, 2019;
originally announced May 2019.
-
An Exact Enumeration of Distance-Hereditary Graphs
Authors:
Cédric Chauve,
Éric Fusy,
Jérémie Lumbroso
Abstract:
Distance-hereditary graphs form an important class of graphs, from the theoretical point of view, due to the fact that they are the totally decomposable graphs for the split-decomposition. The previous best enumerative result for these graphs is from Nakano et al. (J. Comp. Sci. Tech., 2007), who have proven that the number of distance-hereditary graphs on $n$ vertices is bounded by…
▽ More
Distance-hereditary graphs form an important class of graphs, from the theoretical point of view, due to the fact that they are the totally decomposable graphs for the split-decomposition. The previous best enumerative result for these graphs is from Nakano et al. (J. Comp. Sci. Tech., 2007), who have proven that the number of distance-hereditary graphs on $n$ vertices is bounded by ${2^{\lceil 3.59n\rceil}}$.
In this paper, using classical tools of enumerative combinatorics, we improve on this result by providing an exact enumeration of distance-hereditary graphs, which allows to show that the number of distance-hereditary graphs on $n$ vertices is tightly bounded by ${(7.24975\ldots)^n}$---opening the perspective such graphs could be encoded on $3n$ bits. We also provide the exact enumeration and asymptotics of an important subclass, the 3-leaf power graphs.
Our work illustrates the power of revisiting graph decomposition results through the framework of analytic combinatorics.
△ Less
Submitted 4 August, 2016;
originally announced August 2016.
-
Average-case analysis of perfect sorting by reversals (Journal Version)
Authors:
Mathilde Bouvel,
Cedric Chauve,
Marni Mishna,
Dominique Rossin
Abstract:
Perfect sorting by reversals, a problem originating in computational genomics, is the process of sorting a signed permutation to either the identity or to the reversed identity permutation, by a sequence of reversals that do not break any common interval. Bérard et al. (2007) make use of strong interval trees to describe an algorithm for sorting signed permutations by reversals. Combinatorial prop…
▽ More
Perfect sorting by reversals, a problem originating in computational genomics, is the process of sorting a signed permutation to either the identity or to the reversed identity permutation, by a sequence of reversals that do not break any common interval. Bérard et al. (2007) make use of strong interval trees to describe an algorithm for sorting signed permutations by reversals. Combinatorial properties of this family of trees are essential to the algorithm analysis. Here, we use the expected value of certain tree parameters to prove that the average run-time of the algorithm is at worst, polynomial, and additionally, for sufficiently long permutations, the sorting algorithm runs in polynomial time with probability one. Furthermore, our analysis of the subclass of commuting scenarios yields precise results on the average length of a reversal, and the average number of reversals.
△ Less
Submitted 4 January, 2012;
originally announced January 2012.
-
Average-case analysis of perfect sorting by reversals
Authors:
Mathilde Bouvel,
Cedric Chauve,
Marni Mishna,
Dominique Rossin
Abstract:
A sequence of reversals that takes a signed permutation to the identity is perfect if at no step a common interval is broken. Determining a parsimonious perfect sequence of reversals that sorts a signed permutation is NP-hard. Here we show that, despite this worst-case analysis, with probability one, sorting can be done in polynomial time. Further, we find asymptotic expressions for the average…
▽ More
A sequence of reversals that takes a signed permutation to the identity is perfect if at no step a common interval is broken. Determining a parsimonious perfect sequence of reversals that sorts a signed permutation is NP-hard. Here we show that, despite this worst-case analysis, with probability one, sorting can be done in polynomial time. Further, we find asymptotic expressions for the average length and number of reversals in commuting permutations, an interesting sub-class of signed permutations.
△ Less
Submitted 19 January, 2009;
originally announced January 2009.
-
Combinatorial operators for Kronecker powers of representations of $§_n$
Authors:
Alain Goupil,
Cedric Chauve
Abstract:
We present combinatorial operators for the expansion of the Kronecker product of irreducible representations of the symmetric group. These combinatorial operators are defined in the ring of symmetric functions and act on the Schur functions basis. This leads to a combinatorial description of the Kronecker powers of the irreducible representations indexed with the partition (n-1,1) which speciali…
▽ More
We present combinatorial operators for the expansion of the Kronecker product of irreducible representations of the symmetric group. These combinatorial operators are defined in the ring of symmetric functions and act on the Schur functions basis. This leads to a combinatorial description of the Kronecker powers of the irreducible representations indexed with the partition (n-1,1) which specializes the concept of oscillating tableaux in Young's lattice previously defined by S. Sundaram. We call our specialization {\it Kronecker tableaux}. Their combinatorial analysis leads to enumerative results for the multiplicity of any irreducible representation in the Kronecker powers of the form ${\c^{(n-1,1)}}^{\otimes k}$.
△ Less
Submitted 15 March, 2005;
originally announced March 2005.