-
The Geometry of the space of Discrete Coalescent Trees
Authors:
Lena Collienne,
Kieran Elmes,
Mareike Fischer,
David Bryant,
Alex Gavryushkin
Abstract:
Computational inference of dated evolutionary histories relies upon various hypotheses about RNA, DNA, and protein sequence mutation rates. Using mutation rates to infer these dated histories is referred to as molecular clock assumption. Coalescent theory is a popular class of evolutionary models that implements the molecular clock hypothesis to facilitate computational inference of dated phylogen…
▽ More
Computational inference of dated evolutionary histories relies upon various hypotheses about RNA, DNA, and protein sequence mutation rates. Using mutation rates to infer these dated histories is referred to as molecular clock assumption. Coalescent theory is a popular class of evolutionary models that implements the molecular clock hypothesis to facilitate computational inference of dated phylogenies. Cancer and virus evolution are two areas where these methods are particularly important.
Methodologically, phylogenetic inference methods require a tree space over which the inference is performed, and geometry of this space plays an important role in statistical and computational aspects of tree inference algorithms. It has recently been shown that molecular clock, and hence coalescent, trees possess a unique geometry, different from that of classical phylogenetic tree spaces which do not model mutation rates.
Here we introduce and study a space of discrete coalescent trees, that is, we assume that time is discrete, which is inevitable in many computational formalisations. We establish several geometrical properties of the space and show how these properties impact various algorithms used in phylogenetic analyses. Our tree space is a discretisation of a known time tree space, called t-space, and hence our results can be used to approximate solutions to various open problems in t-space. Our tree space is also a generalisation of another known trees space, called the ranked nearest neighbour interchange space, hence our advances in this paper imply new and generalise existing results about ranked trees.
△ Less
Submitted 7 January, 2021;
originally announced January 2021.
-
Computing nearest neighbour interchange distances between ranked phylogenetic trees
Authors:
Lena Collienne,
Alex Gavryushkin
Abstract:
Many popular algorithms for searching the space of leaf-labelled trees are based on tree rearrangement operations. Under any such operation, the problem is reduced to searching a graph where vertices are trees and (undirected) edges are given by pairs of trees connected by one rearrangement operation (sometimes called a move). Most popular are the classical nearest neighbour interchange, subtree p…
▽ More
Many popular algorithms for searching the space of leaf-labelled trees are based on tree rearrangement operations. Under any such operation, the problem is reduced to searching a graph where vertices are trees and (undirected) edges are given by pairs of trees connected by one rearrangement operation (sometimes called a move). Most popular are the classical nearest neighbour interchange, subtree prune and regraft, and tree bisection and reconnection moves. The problem of computing distances, however, is NP-hard in each of these graphs, making tree inference and comparison algorithms challenging to design in practice.
Although ranked phylogenetic trees are one of the central objects of interest in applications such as cancer research, immunology, and epidemiology, the computational complexity of the shortest path problem for these trees remained unsolved for decades. In this paper, we settle this problem for the ranked nearest neighbour interchange operation by establishing that the complexity depends on the weight difference between the two types of tree rearrangements (rank moves and edge moves), and varies from quadratic, which is the lowest possible complexity for this problem, to NP-hard, which is the highest. In particular, our result provides the first example of a phylogenetic tree rearrangement operation for which shortest paths, and hence the distance, can be computed efficiently. Specifically, our algorithm scales to trees with thousands of leaves (and likely hundreds of thousands if implemented efficiently).
We also connect the problem of computing distances in our graph of ranked trees with the well-known version of this problem on unranked trees by introducing a parameter for the weight difference between move types. We propose to study a family of shortest path problems indexed by this parameter with computational complexity varying from quadratic to NP-hard.
△ Less
Submitted 23 July, 2020;
originally announced July 2020.
-
Dynamic Algorithms for Interval Scheduling on a Single Machine
Authors:
Alex Gavryushkin,
Bakhadyr Khoussainov,
Mikhail Kokho,
Jiamou Liu
Abstract:
We investigate dynamic algorithms for the interval scheduling problem. Our algorithm runs in amortised time $O(\log n)$ for query operation and $O(d\log^2 n)$ for insertion and removal operations, where $n$ and $d$ are the maximal numbers of intervals and pairwise overlap** intervals respectively. We also show that for a monotonic set, that is when no interval properly contains another interval,…
▽ More
We investigate dynamic algorithms for the interval scheduling problem. Our algorithm runs in amortised time $O(\log n)$ for query operation and $O(d\log^2 n)$ for insertion and removal operations, where $n$ and $d$ are the maximal numbers of intervals and pairwise overlap** intervals respectively. We also show that for a monotonic set, that is when no interval properly contains another interval, the amortised complexity is $O(\log n)$ for both query and update operations. We compare the two algorithms for the monotonic interval sets using experiments.
△ Less
Submitted 26 December, 2014;
originally announced December 2014.