-
Maximizing Network Phylogenetic Diversity
Authors:
Leo van Iersel,
Mark Jones,
Jannik Schestag,
Celine Scornavacca,
Mathias Weller
Abstract:
Network Phylogenetic Diversity (Network-PD) is a measure for the diversity of a set of species based on a rooted phylogenetic network (with branch lengths and inheritance probabilities on the reticulation edges) describing the evolution of those species. We consider the \textsc{Max-Network-PD} problem: given such a network, find~$k$ species with maximum Network-PD score. We show that this problem…
▽ More
Network Phylogenetic Diversity (Network-PD) is a measure for the diversity of a set of species based on a rooted phylogenetic network (with branch lengths and inheritance probabilities on the reticulation edges) describing the evolution of those species. We consider the \textsc{Max-Network-PD} problem: given such a network, find~$k$ species with maximum Network-PD score. We show that this problem is fixed-parameter tractable (FPT) for binary networks, by describing an optimal algorithm running in $\mathcal{O}(2^r \log (k)(n+r))$~time, with~$n$ the total number of species in the network and~$r$ its reticulation number. Furthermore, we show that \textsc{Max-Network-PD} is NP-hard for level-1 networks, proving that, unless P$=$NP, the FPT approach cannot be extended by using the level as parameter instead of the reticulation number.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Solving the Tree Containment Problem Using Graph Neural Networks
Authors:
Arkadiy Dushatskiy,
Esther Julien,
Leen Stougie,
Leo van Iersel
Abstract:
Tree Containment is a fundamental problem in phylogenetics useful for verifying a proposed phylogenetic network, representing the evolutionary history of certain species. Tree Containment asks whether the given phylogenetic tree (for instance, constructed from a DNA fragment showing tree-like evolution) is contained in the given phylogenetic network. In the general case, this is an NP-complete pro…
▽ More
Tree Containment is a fundamental problem in phylogenetics useful for verifying a proposed phylogenetic network, representing the evolutionary history of certain species. Tree Containment asks whether the given phylogenetic tree (for instance, constructed from a DNA fragment showing tree-like evolution) is contained in the given phylogenetic network. In the general case, this is an NP-complete problem. We propose to solve it approximately using Graph Neural Networks. In particular, we propose to combine the given network and the tree and apply a Graph Neural Network to this network-tree graph. This way, we achieve the capability of solving the tree containment instances representing a larger number of species than the instances contained in the training dataset (i.e., our algorithm has the inductive learning ability). Our algorithm demonstrates an accuracy of over $95\%$ in solving the tree containment problem on instances with up to 100 leaves.
△ Less
Submitted 13 June, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Exact and Heuristic Computation of the Scanwidth of Directed Acyclic Graphs
Authors:
Niels Holtgrefe,
Leo van Iersel,
Mark Jones
Abstract:
To measure the tree-likeness of a directed acyclic graph (DAG), a new width parameter that considers the directions of the arcs was recently introduced: scanwidth. We present the first algorithm that efficiently computes the exact scanwidth of general DAGs. For DAGs with one root and scanwidth $k$ it runs in $O(k \cdot n^k \cdot m)$ time. The algorithm also functions as an FPT algorithm with compl…
▽ More
To measure the tree-likeness of a directed acyclic graph (DAG), a new width parameter that considers the directions of the arcs was recently introduced: scanwidth. We present the first algorithm that efficiently computes the exact scanwidth of general DAGs. For DAGs with one root and scanwidth $k$ it runs in $O(k \cdot n^k \cdot m)$ time. The algorithm also functions as an FPT algorithm with complexity $O(2^{4 \ell - 1} \cdot \ell \cdot n + n^2)$ for phylogenetic networks of level-$\ell$, a type of DAG used to depict evolutionary relationships among species. Our algorithm performs well in practice, being able to compute the scanwidth of synthetic networks up to 30 reticulations and 100 leaves within 500 seconds. Furthermore, we propose a heuristic that obtains an average practical approximation ratio of 1.5 on these networks. While we prove that the scanwidth is bounded from below by the treewidth of the underlying undirected graph, experiments suggest that for networks the parameters are close in practice.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Is this network proper forest-based?
Authors:
Katharina T. Huber,
Leo van Iersel,
Vincent Moulton,
Guillaume Scholz
Abstract:
In evolutionary biology, networks are becoming increasingly used to represent evolutionary histories for species that have undergone non-treelike or reticulate evolution. Such networks are essentially directed acyclic graphs with a leaf set that corresponds to a collection of species, and in which non-leaf vertices with indegree 1 correspond to speciation events and vertices with indegree greater…
▽ More
In evolutionary biology, networks are becoming increasingly used to represent evolutionary histories for species that have undergone non-treelike or reticulate evolution. Such networks are essentially directed acyclic graphs with a leaf set that corresponds to a collection of species, and in which non-leaf vertices with indegree 1 correspond to speciation events and vertices with indegree greater than 1 correspond to reticulate events such as gene transfer. Recently forest-based networks have been introduced, which are essentially (multi-rooted) networks that can be formed by adding some arcs to a collection of phylogenetic trees (or phylogenetic forest), where each arc is added in such a way that its ends always lie in two different trees in the forest. In this paper, we consider the complexity of deciding whether or not a given network is proper forest-based, that is, whether it can be formed by adding arcs to some underlying phylogenetic forest which contains the same number of trees as there are roots in the network. More specifically, we show that it can be decided in polynomial time whether or not a binary, tree-child network with $m \ge 2$ roots is proper forest-based in case $m=2$, but that this problem is NP-complete for $m\ge 3$. We also give a fixed parameter tractable (FPT) algorithm for deciding whether or not a network in which every vertex has indegree at most 2 is proper forest-based. A key element in proving our results is a new characterization for when a network with $m$ roots is proper forest-based which is given in terms of the existence of certain $m$-colorings of the vertices of the network.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Making a Network Orchard by Adding Leaves
Authors:
Leo van Iersel,
Mark Jones,
Esther Julien,
Yukihiro Murakami
Abstract:
Phylogenetic networks are used to represent the evolutionary history of species. Recently, the new class of orchard networks was introduced, which were later shown to be interpretable as trees with additional horizontal arcs. This makes the network class ideal for capturing evolutionary histories that involve horizontal gene transfers. Here, we study the minimum number of additional leaves needed…
▽ More
Phylogenetic networks are used to represent the evolutionary history of species. Recently, the new class of orchard networks was introduced, which were later shown to be interpretable as trees with additional horizontal arcs. This makes the network class ideal for capturing evolutionary histories that involve horizontal gene transfers. Here, we study the minimum number of additional leaves needed to make a network orchard. We demonstrate that computing this proximity measure for a given network is NP-hard and describe a tight upper bound. We also give an equivalent measure based on vertex labellings to construct a mixed integer linear programming formulation. Our experimental results, which include both real-world and synthetic data, illustrate the effectiveness of our implementation.
△ Less
Submitted 8 May, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Constructing Phylogenetic Networks via Cherry Picking and Machine Learning
Authors:
Giulia Bernardini,
Leo van Iersel,
Esther Julien,
Leen Stougie
Abstract:
Combining a set of phylogenetic trees into a single phylogenetic network that explains all of them is a fundamental challenge in evolutionary studies. Existing methods are computationally expensive and can either handle only small numbers of phylogenetic trees or are limited to severely restricted classes of networks. In this paper, we apply the recently-introduced theoretical framework of cherry…
▽ More
Combining a set of phylogenetic trees into a single phylogenetic network that explains all of them is a fundamental challenge in evolutionary studies. Existing methods are computationally expensive and can either handle only small numbers of phylogenetic trees or are limited to severely restricted classes of networks. In this paper, we apply the recently-introduced theoretical framework of cherry picking to design a class of efficient heuristics that are guaranteed to produce a network containing each of the input trees, for datasets consisting of binary trees. Some of the heuristics in this framework are based on the design and training of a machine learning model that captures essential information on the structure of the input trees and guides the algorithms towards better solutions. We also propose simple and fast randomised heuristics that prove to be very effective when run multiple times.
Unlike the existing exact methods, our heuristics are applicable to datasets of practical size, and the experimental study we conducted on both simulated and real data shows that these solutions are qualitatively good, always within some small constant factor from the optimum. Moreover, our machine-learned heuristics are one of the first applications of machine learning to phylogenetics and show its promise.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
A Near-Linear Kernel for Two-Parsimony Distance
Authors:
Elise Deen,
Leo van Iersel,
Remie Janssen,
Mark Jones,
Yuki Murakami,
Norbert Zeh
Abstract:
The maximum parsimony distance $d_{\textrm{MP}}(T_1,T_2)$ and the bounded-state maximum parsimony distance $d_{\textrm{MP}}^t(T_1,T_2)$ measure the difference between two phylogenetic trees $T_1,T_2$ in terms of the maximum difference between their parsimony scores for any character (with $t$ a bound on the number of states in the character, in the case of $d_{\textrm{MP}}^t(T_1,T_2)$). While comp…
▽ More
The maximum parsimony distance $d_{\textrm{MP}}(T_1,T_2)$ and the bounded-state maximum parsimony distance $d_{\textrm{MP}}^t(T_1,T_2)$ measure the difference between two phylogenetic trees $T_1,T_2$ in terms of the maximum difference between their parsimony scores for any character (with $t$ a bound on the number of states in the character, in the case of $d_{\textrm{MP}}^t(T_1,T_2)$). While computing $d_{\textrm{MP}}(T_1, T_2)$ was previously shown to be fixed-parameter tractable with a linear kernel, no such result was known for $d_{\textrm{MP}}^t(T_1,T_2)$. In this paper, we prove that computing $d_{\textrm{MP}}^t(T_1, T_2)$ is fixed-parameter tractable for all~$t$. Specifically, we prove that this problem has a kernel of size $O(k \lg k)$, where $k = d_{\textrm{MP}}^t(T_1, T_2)$. As the primary analysis tool, we introduce the concept of leg-disjoint incompatible quartets, which may be of independent interest.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Polynomial invariants for cactuses
Authors:
Leo van Iersel,
Vincent Moulton,
Yukihiro Murakami
Abstract:
Graph invariants are a useful tool in graph theory. Not only do they encode useful information about the graphs to which they are associated, but complete invariants can be used to distinguish between non-isomorphic graphs. Polynomial invariants for graphs such as the well-known Tutte polynomial have been studied for several years, and recently there has been interest to also define such invariant…
▽ More
Graph invariants are a useful tool in graph theory. Not only do they encode useful information about the graphs to which they are associated, but complete invariants can be used to distinguish between non-isomorphic graphs. Polynomial invariants for graphs such as the well-known Tutte polynomial have been studied for several years, and recently there has been interest to also define such invariants for phylogenetic networks, a special type of graph that arises in the area of evolutionary biology. Recently Liu gave a complete invariant for (phylogenetic) trees. However, the polynomial invariants defined thus far for phylogenetic networks that are not trees require vertex labels and either contain a large number of variables, or they have exponentially many terms in the number of reticulations. This can make it difficult to compute these polynomials and to use them to analyse unlabelled networks. In this paper, we shall show how to circumvent some of these difficulties for rooted cactuses and cactuses. As well as being important in other areas such as operations research, rooted cactuses contain some common classes of phylogenetic networks such phylogenetic trees and level-1 networks. More specifically, we define a polynomial $F$ that is a complete invariant for the class of rooted cactuses without vertices of indegree 1 and outdegree 1 that has 5 variables, and a polynomial $Q$ that is a complete invariant for the class of rooted cactuses that has 6 variables \vince{whose degree can be bounded linearly in terms of the size of the rooted cactus}. We also explain how to extend the $Q$ polynomial to define a complete invariant for leaf-labelled rooted cactuses as well as (unrooted) cactuses.
△ Less
Submitted 20 February, 2023; v1 submitted 26 September, 2022;
originally announced September 2022.
-
Embedding phylogenetic trees in networks of low treewidth
Authors:
Leo van Iersel,
Mark Jones,
Mathias Weller
Abstract:
Given a rooted, binary phylogenetic network and a rooted, binary phylogenetic tree, can the tree be embedded into the network? This problem, called \textsc{Tree Containment}, arises when validating networks constructed by phylogenetic inference methods.We present the first algorithm for (rooted) \textsc{Tree Containment} using the treewidth $t$ of the input network $N$ as parameter, showing that t…
▽ More
Given a rooted, binary phylogenetic network and a rooted, binary phylogenetic tree, can the tree be embedded into the network? This problem, called \textsc{Tree Containment}, arises when validating networks constructed by phylogenetic inference methods.We present the first algorithm for (rooted) \textsc{Tree Containment} using the treewidth $t$ of the input network $N$ as parameter, showing that the problem can be solved in $2^{O(t^2)}\cdot|N|$ time and space.
△ Less
Submitted 19 September, 2023; v1 submitted 1 July, 2022;
originally announced July 2022.
-
Orchard Networks are Trees with Additional Horizontal Arcs
Authors:
Leo van Iersel,
Remie Janssen,
Mark Jones,
Yukihiro Murakami
Abstract:
Phylogenetic networks are used in biology to represent evolutionary histories. The class of orchard phylogenetic networks was recently introduced for their computational benefits, without any biological justification. Here, we show that orchard networks can be interpreted as trees with additional \emph{horizontal} arcs. Therefore, they are closely related to tree-based networks, where the differen…
▽ More
Phylogenetic networks are used in biology to represent evolutionary histories. The class of orchard phylogenetic networks was recently introduced for their computational benefits, without any biological justification. Here, we show that orchard networks can be interpreted as trees with additional \emph{horizontal} arcs. Therefore, they are closely related to tree-based networks, where the difference is that in tree-based networks the additional arcs do not need to be horizontal. Then, we use this new characterization to show that the space of orchard networks is connected under the rNNI rearrangement move, with a diameter of at most $4kn+n\lceil \log_2(n) \rceil +2k+6n-8$.
△ Less
Submitted 21 October, 2021;
originally announced October 2021.
-
An algorithm for reconstructing level-2 phylogenetic networks from trinets
Authors:
Leo van Iersel,
Sjors Kole,
Vincent Moulton,
Leonie Nipius
Abstract:
Evolutionary histories for species that cross with one another or exchange genetic material can be represented by leaf-labelled, directed graphs called phylogenetic networks. A major challenge in the burgeoning area of phylogenetic networks is to develop algorithms for building such networks by amalgamating small networks into a single large network. The level of a phylogenetic network is a measur…
▽ More
Evolutionary histories for species that cross with one another or exchange genetic material can be represented by leaf-labelled, directed graphs called phylogenetic networks. A major challenge in the burgeoning area of phylogenetic networks is to develop algorithms for building such networks by amalgamating small networks into a single large network. The level of a phylogenetic network is a measure of its deviation from being a tree; the higher the level of network, the less treelike it becomes. Various algorithms have been developed for building level-1 networks from small networks. However, level-1 networks may not be able to capture the complexity of some data sets. In this paper, we present a polynomial-time algorithm for constructing a rooted binary level-2 phylogenetic network from a collection of 3-leaf networks or trinets. Moreover, we prove that the algorithm will correctly reconstruct such a network if it is given all of the trinets in the network as input. The algorithm runs in time $O(t\cdot n+n^4)$ with $t$ the number of input trinets and $n$ the number of leaves. We also show that there is a fundamental obstruction to constructing level-3 networks from trinets, and so new approaches will need to be developed for constructing level-3 and higher level-networks.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
Level-$2$ networks from shortest and longest distances
Authors:
Katharina T. Huber,
Leo van Iersel,
Remie Janssen,
Mark Jones,
Vincent Moulton,
Yukihiro Murakami
Abstract:
Recently it was shown that a certain class of phylogenetic networks, called level-$2$ networks, cannot be reconstructed from their associated distance matrices. In this paper, we show that they can be reconstructed from their induced shortest and longest distance matrices. That is, if two level-$2$ networks induce the same shortest and longest distance matrices, then they must be isomorphic. We fu…
▽ More
Recently it was shown that a certain class of phylogenetic networks, called level-$2$ networks, cannot be reconstructed from their associated distance matrices. In this paper, we show that they can be reconstructed from their induced shortest and longest distance matrices. That is, if two level-$2$ networks induce the same shortest and longest distance matrices, then they must be isomorphic. We further show that level-$2$ networks are reconstructible from their shortest distance matrices if and only if they do not contain a subgraph from a family of graphs. A generator of a network is the graph obtained by deleting all pendant subtrees and suppressing degree-$2$ vertices. We also show that networks with a leaf on every generator side is reconstructible from their induced shortest distance matrix, regardless of level.
△ Less
Submitted 11 August, 2021; v1 submitted 21 January, 2021;
originally announced January 2021.
-
New FPT algorithms for finding the temporal hybridization number for sets of phylogenetic trees
Authors:
Sander Borst,
Leo van Iersel,
Mark Jones,
Steven Kelk
Abstract:
We study the problem of finding a temporal hybridization network for a set of phylogenetic trees that minimizes the number of reticulations. First, we introduce an FPT algorithm for this problem on an arbitrary set of $m$ binary trees with $n$ leaves each with a running time of $O(5^k\cdot n\cdot m)$, where $k$ is the minimum temporal hybridization number. We also present the concept of temporal d…
▽ More
We study the problem of finding a temporal hybridization network for a set of phylogenetic trees that minimizes the number of reticulations. First, we introduce an FPT algorithm for this problem on an arbitrary set of $m$ binary trees with $n$ leaves each with a running time of $O(5^k\cdot n\cdot m)$, where $k$ is the minimum temporal hybridization number. We also present the concept of temporal distance, which is a measure for how close a tree-child network is to being temporal. Then we introduce an algorithm for computing a tree-child network with temporal distance at most $d$ and at most $k$ reticulations in $O((8k)^d5^ k\cdot n\cdot m)$ time. Lastly, we introduce a $O(6^kk!\cdot k\cdot n^2)$ time algorithm for computing a minimum temporal hybridization network for a set of two nonbinary trees. We also provide an implementation of all algorithms and an experimental analysis on their performance.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
Distinguishing level-1 phylogenetic networks on the basis of data generated by Markov processes
Authors:
Elizabeth Gross,
Leo van Iersel,
Remie Janssen,
Mark Jones,
Colby Long,
Yukihiro Murakami
Abstract:
Phylogenetic networks can represent evolutionary events that cannot be described by phylogenetic trees. These networks are able to incorporate reticulate evolutionary events such as hybridization, introgression, and lateral gene transfer. Recently, network-based Markov models of DNA sequence evolution have been introduced along with model-based methods for reconstructing phylogenetic networks. For…
▽ More
Phylogenetic networks can represent evolutionary events that cannot be described by phylogenetic trees. These networks are able to incorporate reticulate evolutionary events such as hybridization, introgression, and lateral gene transfer. Recently, network-based Markov models of DNA sequence evolution have been introduced along with model-based methods for reconstructing phylogenetic networks. For these methods to be consistent, the network parameter needs to be identifiable from data generated under the model. Here, we show that the semi-directed network parameter of a triangle-free, level-1 network model with any fixed number of reticulation vertices is generically identifiable under the Jukes-Cantor, Kimura 2-parameter, or Kimura 3-parameter constraints.
△ Less
Submitted 7 July, 2021; v1 submitted 17 July, 2020;
originally announced July 2020.
-
A Unifying Characterization of Tree-based Networks and Orchard Networks using Cherry Covers
Authors:
Leo van Iersel,
Remie Janssen,
Mark Jones,
Yukihiro Murakami,
Norbert Zeh
Abstract:
Phylogenetic networks are used to study evolutionary relationships between species in biology. Such networks are often categorized into classes by their topological features, which stem from both biological and computational motivations. We study two network classes in this paper: tree-based networks and orchard networks. Tree-based networks are those that can be obtained by inserting edges betwee…
▽ More
Phylogenetic networks are used to study evolutionary relationships between species in biology. Such networks are often categorized into classes by their topological features, which stem from both biological and computational motivations. We study two network classes in this paper: tree-based networks and orchard networks. Tree-based networks are those that can be obtained by inserting edges between the edges of an underlying tree. Orchard networks are a recently introduced generalization of the class of tree-child networks. Structural characterizations have already been discovered for tree-based networks; this is not the case for orchard networks. In this paper, we introduce cherry covers---a unifying characterization of both network classes---in which we decompose the edges of the networks into so-called cherry shapes and reticulated cherry shapes. We show that cherry covers can be used to characterize the class of tree-based networks as well as the class of orchard networks. Moreover, we also generalize these results to non-binary networks.
△ Less
Submitted 16 April, 2020;
originally announced April 2020.
-
Reconstructibility of unrooted level-$k$ phylogenetic networks from distances
Authors:
Leo van Iersel,
Vincent Moulton,
Yukihiro Murakami
Abstract:
A phylogenetic network is a graph-theoretical tool that is used by biologists to represent the evolutionary history of a collection of species. One potential way of constructing such networks is via a distance-based approach, where one is asked to find a phylogenetic network that in some way represents a given distance matrix, which gives information on the evolutionary distances between present-d…
▽ More
A phylogenetic network is a graph-theoretical tool that is used by biologists to represent the evolutionary history of a collection of species. One potential way of constructing such networks is via a distance-based approach, where one is asked to find a phylogenetic network that in some way represents a given distance matrix, which gives information on the evolutionary distances between present-day taxa. Here, we consider the following question. For which~$k$ are unrooted level-$k$ networks uniquely determined by their distance matrices? We consider this question for shortest distances as well as for the case that the multisets of all distances is given. We prove that level-$1$ networks and level-$2$ networks are reconstructible from their shortest distances and multisets of distances, respectively. Furthermore we show that, in general, networks of level higher than~$1$ are not reconstructible from shortest distances and that networks of level higher than~$2$ are not reconstructible from their multisets of distances.
△ Less
Submitted 12 June, 2020; v1 submitted 30 October, 2019;
originally announced October 2019.
-
Cutting an alignment with Ockham's razor
Authors:
Mark Jones,
Philippe Gambette,
Leo van Iersel,
Remie Janssen,
Steven Kelk,
Fabio Pardi,
Celine Scornavacca
Abstract:
In this article, we investigate different parsimony-based approaches towards finding recombination breakpoints in a multiple sequence alignment. This recombination detection task is crucial in order to avoid errors in evolutionary analyses caused by mixing together portions of sequences which had a different evolution history. Following an overview of the field of recombination detection, we formu…
▽ More
In this article, we investigate different parsimony-based approaches towards finding recombination breakpoints in a multiple sequence alignment. This recombination detection task is crucial in order to avoid errors in evolutionary analyses caused by mixing together portions of sequences which had a different evolution history. Following an overview of the field of recombination detection, we formulate four computational problems for this task with different objective functions. The four problems aim to minimize (1) the total homoplasy of all blocks (2) the maximum homoplasy per block (3) the total homoplasy ratio of all blocks and (4) the maximum homoplasy ratio per block. We describe algorithms for each of these problems, which are fixed-parameter tractable (FPT) when the characters are binary. We have implemented and tested the algorithms on simulated data, showing that minimizing the total homoplasy gives, in most cases, the most accurate results. Our implementation and experimental data have been made publicly available. Finally, we also consider the problem of combining blocks into non-contiguous blocks consisting of at most p contiguous parts. Fixing the homoplasy h of each block to 0, we show that this problem is NP-hard when p >= 3, but polynomial-time solvable for p = 2. Furthermore, the problem is FPT with parameter h for binary characters when p = 2. A number of interesting problems remain open.
△ Less
Submitted 24 October, 2019;
originally announced October 2019.
-
A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees
Authors:
Leo van Iersel,
Remie Janssen,
Mark Jones,
Yukihiro Murakami,
Norbert Zeh
Abstract:
We present the first fixed-parameter algorithm for constructing a tree-child phylogenetic network that displays an arbitrary number of binary input trees and has the minimum number of reticulations among all such networks. The algorithm uses the recently introduced framework of cherry picking sequences and runs in $O((8k)^k \mathrm{poly}(n, t))$ time, where $n$ is the number of leaves of every tre…
▽ More
We present the first fixed-parameter algorithm for constructing a tree-child phylogenetic network that displays an arbitrary number of binary input trees and has the minimum number of reticulations among all such networks. The algorithm uses the recently introduced framework of cherry picking sequences and runs in $O((8k)^k \mathrm{poly}(n, t))$ time, where $n$ is the number of leaves of every tree, $t$ is the number of trees, and $k$ is the reticulation number of the constructed network. Moreover, we provide an efficient parallel implementation of the algorithm and show that it can deal with up to $100$ input trees on a standard desktop computer, thereby providing a major improvement over previous phylogenetic network construction methods.
△ Less
Submitted 19 July, 2019;
originally announced July 2019.
-
Orienting undirected phylogenetic networks
Authors:
Katharina T. Huber,
Leo van Iersel,
Remie Janssen,
Mark Jones,
Vincent Moulton,
Yukihiro Murakami,
Charles Semple
Abstract:
This paper studies the relationship between undirected (unrooted) and directed (rooted) phylogenetic networks. We describe a polynomial-time algorithm for deciding whether an undirected nonbinary phylogenetic network, given the locations of the root and reticulation vertices, can be oriented as a directed nonbinary phylogenetic network. Moreover, we characterize when this is possible and show that…
▽ More
This paper studies the relationship between undirected (unrooted) and directed (rooted) phylogenetic networks. We describe a polynomial-time algorithm for deciding whether an undirected nonbinary phylogenetic network, given the locations of the root and reticulation vertices, can be oriented as a directed nonbinary phylogenetic network. Moreover, we characterize when this is possible and show that, in such instances, the resulting directed nonbinary phylogenetic network is unique. In addition, without being given the location of the root and the reticulation vertices, we describe an algorithm for deciding whether an undirected binary phylogenetic network $N$ can be oriented as a directed binary phylogenetic network of a certain class. The algorithm is fixed-parameter tractable (FPT) when the parameter is the level of $N$ and is applicable to classes of directed phylogenetic networks that satisfy certain conditions. As an example, we show that the well-studied class of binary tree-child networks satisfies these conditions.
△ Less
Submitted 29 September, 2023; v1 submitted 18 June, 2019;
originally announced June 2019.
-
Reconstructing Tree-Child Networks from Reticulate-Edge-Deleted Subnetworks
Authors:
Yukihiro Murakami,
Leo van Iersel,
Remie Janssen,
Mark Jones,
Vincent Moulton
Abstract:
Network reconstruction lies at the heart of phylogenetic research. Two well studied classes of phylogenetic networks include tree-child networks and level-$k$ networks. In a tree-child network, every non-leaf node has a child that is a tree node or a leaf. In a level-$k$ network, the maximum number of reticulations contained in a biconnected component is $k$. Here, we show that level-$k$ tree-chil…
▽ More
Network reconstruction lies at the heart of phylogenetic research. Two well studied classes of phylogenetic networks include tree-child networks and level-$k$ networks. In a tree-child network, every non-leaf node has a child that is a tree node or a leaf. In a level-$k$ network, the maximum number of reticulations contained in a biconnected component is $k$. Here, we show that level-$k$ tree-child networks are encoded by their reticulate-edge-deleted subnetworks, which are subnetworks obtained by deleting a single reticulation edge, if $k\geq 2$. Following this, we provide a polynomial-time algorithm for uniquely reconstructing such networks from their reticulate-edge-deleted subnetworks. Moreover, we show that this can even be done when considering subnetworks obtained by deleting one reticulation edge from each biconnected component with $k$ reticulations.
△ Less
Submitted 22 July, 2019; v1 submitted 16 November, 2018;
originally announced November 2018.
-
Parallel Machine Scheduling with a Single Resource per Job
Authors:
T. Janssen,
C. Swennenhuis,
A. Bitar,
T. Bosman,
D. Gijswijt,
L. van Iersel,
S. Dauzére-Pérès,
C. Yugma
Abstract:
We study the problem of scheduling jobs on parallel machines minimizing the total completion time, with each job using exactly one resource. First, we derive fundamental properties of the problem and show that the problem is polynomially solvable if $p_j = 1$. Then we look at a variant of the shortest processing time rule as an approximation algorithm for the problem and show that it gives at leas…
▽ More
We study the problem of scheduling jobs on parallel machines minimizing the total completion time, with each job using exactly one resource. First, we derive fundamental properties of the problem and show that the problem is polynomially solvable if $p_j = 1$. Then we look at a variant of the shortest processing time rule as an approximation algorithm for the problem and show that it gives at least a $(2-\frac{1}{m})$-approximation. Subsequently, we show that, although the complexity of the problem remains open, three related problems are $\mathcal{NP}$-hard. In the first problem, every resource also has a subset of machines on which it can be used. In the second problem, once a resource has been used on a machine it cannot be used on any other machine, hence all jobs using the same resource need to be scheduled on the same machine. In the third problem, every job needs exactly two resources instead of just one.
△ Less
Submitted 16 November, 2018; v1 submitted 13 September, 2018;
originally announced September 2018.
-
A third strike against perfect phylogeny
Authors:
Leo van Iersel,
Mark Jones,
Steven Kelk
Abstract:
Perfect phylogenies are fundamental in the study of evolutionary trees because they capture the situation when each evolutionary trait emerges only once in history; if such events are believed to be rare, then by Occam's Razor such parsimonious trees are preferable as a hypothesis of evolution. A classical result states that 2-state characters permit a perfect phylogeny precisely if each subset of…
▽ More
Perfect phylogenies are fundamental in the study of evolutionary trees because they capture the situation when each evolutionary trait emerges only once in history; if such events are believed to be rare, then by Occam's Razor such parsimonious trees are preferable as a hypothesis of evolution. A classical result states that 2-state characters permit a perfect phylogeny precisely if each subset of 2 characters permits one. More recently, it was shown that for 3-state characters the same property holds but for size-3 subsets. A long-standing open problem asked whether such a constant exists for each number of states. More precisely, it has been conjectured that for any fixed integer $r$, there exists a constant $f(r)$ such that a set of $r$-state characters $C$ has a perfect phylogeny if and only if every subset of at most $f(r)$ characters has a perfect phylogeny. In this paper, we show that this conjecture is false. In particular, we show that for any constant $t$, there exists a set $C$ of $8$-state characters such that $C$ has no perfect phylogeny, but there exists a perfect phylogeny for every subset of $t$ characters. This negative result complements the two negative results ("strikes") of Bodlaender et al. We reflect on the consequences of this third strike, pointing out that while it does close off some routes for efficient algorithm development, many others remain open.
△ Less
Submitted 14 January, 2019; v1 submitted 19 April, 2018;
originally announced April 2018.
-
Not all phylogenetic networks are leaf-reconstructible
Authors:
Péter L. Erdős,
Leo van Iersel,
Mark Jones
Abstract:
Unrooted phylogenetic networks are graphs used to represent evolutionary relationships. Accurately reconstructing such networks is of great relevance for evolutionary biology. It has recently been conjectured that all phylogenetic networks with at least five leaves can be uniquely reconstructed from their subnetworks obtained by deleting a single leaf and suppressing degree-2 vertices. Here, we sh…
▽ More
Unrooted phylogenetic networks are graphs used to represent evolutionary relationships. Accurately reconstructing such networks is of great relevance for evolutionary biology. It has recently been conjectured that all phylogenetic networks with at least five leaves can be uniquely reconstructed from their subnetworks obtained by deleting a single leaf and suppressing degree-2 vertices. Here, we show that this conjecture is false, by presenting a counter example for each possible number of leaves that is at least~4. Moreover, we show that the conjecture is still false when restricted to binary networks.
△ Less
Submitted 8 March, 2018;
originally announced March 2018.
-
Polynomial-Time Algorithms for Phylogenetic Inference Problems involving duplication and reticulation
Authors:
Leo van Iersel,
Remie Janssen,
Mark Jones,
Yukihiro Murakami,
Norbert Zeh
Abstract:
A common problem in phylogenetics is to try to infer a species phylogeny from gene trees. We consider different variants of this problem. The first variant, called Unrestricted Minimal Episodes Inference, aims at inferring a species tree based on a model with speciation and duplication where duplications are clustered in duplication episodes. The goal is to minimize the number of such episodes. Th…
▽ More
A common problem in phylogenetics is to try to infer a species phylogeny from gene trees. We consider different variants of this problem. The first variant, called Unrestricted Minimal Episodes Inference, aims at inferring a species tree based on a model with speciation and duplication where duplications are clustered in duplication episodes. The goal is to minimize the number of such episodes. The second variant, Parental Hybridization, aims at inferring a species \emph{network} based on a model with speciation and reticulation. The goal is to minimize the number of reticulation events. It is a variant of the well-studied Hybridization Number problem with a more generous view on which gene trees are consistent with a given species network. We show that these seemingly different problems are in fact closely related and can, surprisingly, both be solved in polynomial time, using a structure we call "beaded trees". However, we also show that methods based on these problems have to be used with care because the optimal species phylogenies always have a restricted form. To mitigate this problem, we introduce a new variant of Unrestricted Minimal Episodes Inference that minimizes the duplication episode depth. We prove that this new variant of the problem can also be solved in polynomial time
△ Less
Submitted 9 August, 2019; v1 submitted 1 February, 2018;
originally announced February 2018.
-
Deciding the existence of a cherry-picking sequence is hard on two trees
Authors:
Janosch Döcker,
Leo van Iersel,
Steven Kelk,
Simone Linz
Abstract:
Here we show that deciding whether two rooted binary phylogenetic trees on the same set of taxa permit a cherry-picking sequence, a special type of elimination order on the taxa, is NP-complete. This improves on an earlier result which proved hardness for eight or more trees. Via a known equivalence between cherry-picking sequences and temporal phylogenetic networks, our result proves that it is N…
▽ More
Here we show that deciding whether two rooted binary phylogenetic trees on the same set of taxa permit a cherry-picking sequence, a special type of elimination order on the taxa, is NP-complete. This improves on an earlier result which proved hardness for eight or more trees. Via a known equivalence between cherry-picking sequences and temporal phylogenetic networks, our result proves that it is NP-complete to determine the existence of a temporal phylogenetic network that contains topological embeddings of both trees. The hardness result also greatly strengthens previous inapproximability results for the minimum temporal-hybridization number problem. This is the optimization version of the problem where we wish to construct a temporal phylogenetic network that topologically embeds two given rooted binary phylogenetic trees and that has a minimum number of indegree-2 nodes, which represent events such as hybridization and horizontal gene transfer. We end on a positive note, pointing out that fixed parameter tractability results in this area are likely to ensure the continued relevance of the temporal phylogenetic network model.
△ Less
Submitted 25 January, 2019; v1 submitted 8 December, 2017;
originally announced December 2017.
-
Exploring the tiers of rooted phylogenetic network space using tail moves
Authors:
Remie Janssen,
Mark Jones,
Péter L. Erdős,
Leo van Iersel,
Celine Scornavacca
Abstract:
Popular methods for exploring the space of rooted phylogenetic trees use rearrangement moves such as rNNI (rooted Nearest Neighbour Interchange) and rSPR (rooted Subtree Prune and Regraft). Recently, these moves were generalized to rooted phylogenetic networks, which are a more suitable representation of reticulate evolutionary histories, and it was shown that any two rooted phylogenetic networks…
▽ More
Popular methods for exploring the space of rooted phylogenetic trees use rearrangement moves such as rNNI (rooted Nearest Neighbour Interchange) and rSPR (rooted Subtree Prune and Regraft). Recently, these moves were generalized to rooted phylogenetic networks, which are a more suitable representation of reticulate evolutionary histories, and it was shown that any two rooted phylogenetic networks of the same complexity are connected by a sequence of either rSPR or rNNI moves. Here, we show that this is possible using only tail moves, which are a restricted version of rSPR moves on networks that are more closely related to rSPR moves on trees. The connectedness still holds even when we restrict to distance-1 tail moves (a localized version of tail-moves). Moreover, we give bounds on the number of (distance-1) tail moves necessary to turn one network into another, which in turn yield new bounds for rSPR, rNNI and SPR (i.e. the equivalent of rSPR on unrooted networks). The upper bounds are constructive, meaning that we can actually find a sequence with at most this length for any pair of networks. Finally, we show that finding a shortest sequence of tail or rSPR moves is NP-hard.
△ Less
Submitted 25 August, 2017;
originally announced August 2017.
-
Finding the most parsimonious or likely tree in a network with respect to an alignment
Authors:
Steven Kelk,
Fabio Pardi,
Celine Scornavacca,
Leo van Iersel
Abstract:
Phylogenetic networks are often constructed by merging multiple conflicting phylogenetic signals into a directed acyclic graph. It is interesting to explore whether a network constructed in this way induces biologically-relevant phylogenetic signals that were not present in the input. Here we show that, given a multiple alignment A for a set of taxa X and a rooted phylogenetic network N whose leav…
▽ More
Phylogenetic networks are often constructed by merging multiple conflicting phylogenetic signals into a directed acyclic graph. It is interesting to explore whether a network constructed in this way induces biologically-relevant phylogenetic signals that were not present in the input. Here we show that, given a multiple alignment A for a set of taxa X and a rooted phylogenetic network N whose leaves are labelled by X, it is NP-hard to locate the most parsimonious phylogenetic tree displayed by N (with respect to A) even when the level of N - the maximum number of reticulation nodes within a biconnected component - is 1 and A contains only 2 distinct states. (If, additionally, gaps are allowed the problem becomes APX-hard.) We also show that under the same conditions, and assuming a simple binary symmetric model of character evolution, finding the most likely tree displayed by the network is NP-hard. These negative results contrast with earlier work on parsimony in which it is shown that if A consists of a single column the problem is fixed parameter tractable in the level. We conclude with a discussion of why, despite the NP-hardness, both the parsimony and likelihood problem can likely be well-solved in practice.
△ Less
Submitted 12 July, 2017;
originally announced July 2017.
-
Binets: fundamental building blocks for phylogenetic networks
Authors:
Leo van Iersel,
Vincent Moulton,
Eveline de Swart,
Taoyang Wu
Abstract:
Phylogenetic networks are a generalization of evolutionary trees that are used by biologists to represent the evolution of organisms which have undergone reticulate evolution. Essentially, a phylogenetic network is a directed acyclic graph having a unique root in which the leaves are labelled by a given set of species. Recently, some approaches have been developed to construct phylogenetic network…
▽ More
Phylogenetic networks are a generalization of evolutionary trees that are used by biologists to represent the evolution of organisms which have undergone reticulate evolution. Essentially, a phylogenetic network is a directed acyclic graph having a unique root in which the leaves are labelled by a given set of species. Recently, some approaches have been developed to construct phylogenetic networks from collections of networks on 2- and 3-leaved networks, which are known as binets and trinets, respectively. Here we study in more depth properties of collections of binets, one of the simplest possible types of networks into which a phylogenetic network can be decomposed. More specifically, we show that if a collection of level-1 binets is compatible with some binary network, then it is also compatible with a binary level-1 network. Our proofs are based on useful structural results concerning lowest stable ancestors in networks. In addition, we show that, although the binets do not determine the topology of the network, they do determine the number of reticulations in the network, which is one of its most important parameters. We also consider algorithmic questions concerning binets. We show that deciding whether an arbitrary set of binets is compatible with some network is at least as hard as the well-known Graph Isomorphism problem. However, if we restrict to level-1 binets, it is possible to decide in polynomial time whether there exists a binary network that displays all the binets. We also show that to find a network that displays a maximum number of the binets is NP-hard, but that there exists a simple polynomial-time 1/3-approximation algorithm for this problem. It is hoped that these results will eventually assist in the development of new methods for constructing phylogenetic networks from collections of smaller networks.
△ Less
Submitted 31 January, 2017;
originally announced January 2017.
-
Leaf-reconstructibility of phylogenetic networks
Authors:
Leo van Iersel,
Vincent Moulton
Abstract:
An important problem in evolutionary biology is to reconstruct the evolutionary history of a set $X$ of species. This history is often represented as a phylogenetic network, that is, a connected graph with leaves labelled by elements in $X$ (for example, an evolutionary tree), which is usually also binary, i.e. all vertices have degree 1 or 3. A common approach used in phylogenetics to build a phy…
▽ More
An important problem in evolutionary biology is to reconstruct the evolutionary history of a set $X$ of species. This history is often represented as a phylogenetic network, that is, a connected graph with leaves labelled by elements in $X$ (for example, an evolutionary tree), which is usually also binary, i.e. all vertices have degree 1 or 3. A common approach used in phylogenetics to build a phylogenetic network on $X$ involves constructing it from networks on subsets of $X$. Here we consider the question of which (unrooted) phylogenetic networks are leaf-reconstructible, i.e. which networks can be uniquely reconstructed from the set of networks obtained from it by deleting a single leaf (its $X$-deck). This problem is closely related to the (in)famous reconstruction conjecture in graph theory but, as we shall show, presents distinct challenges. We show that some large classes of phylogenetic networks are reconstructible from their $X$-deck. This includes phylogenetic trees, binary networks containing at least one non-trivial cut-edge, and binary level-4 networks (the level of a network measures how far it is from being a tree). We also show that for fixed $k$, almost all binary level-$k$ phylogenetic networks are leaf-reconstructible. As an application of our results, we show that a level-3 network $N$ can be reconstructed from its quarnets, that is, 4-leaved networks that are induced by $N$ in a certain recursive fashion. Our results lead to several interesting open problems which we discuss, including the conjecture that all phylogenetic networks with at least five leaves are leaf-reconstructible.
△ Less
Submitted 31 January, 2017;
originally announced January 2017.
-
On unrooted and root-uncertain variants of several well-known phylogenetic network problems
Authors:
Leo van Iersel,
Steven Kelk,
Georgios Stamoulis,
Leen Stougie,
Olivier Boes
Abstract:
The hybridization number problem requires us to embed a set of binary rooted phylogenetic trees into a binary rooted phylogenetic network such that the number of nodes with indegree two is minimized. However, from a biological point of view accurately inferring the root location in a phylogenetic tree is notoriously difficult and poor root placement can artificially inflate the hybridization numbe…
▽ More
The hybridization number problem requires us to embed a set of binary rooted phylogenetic trees into a binary rooted phylogenetic network such that the number of nodes with indegree two is minimized. However, from a biological point of view accurately inferring the root location in a phylogenetic tree is notoriously difficult and poor root placement can artificially inflate the hybridization number. To this end we study a number of relaxed variants of this problem. We start by showing that the fundamental problem of determining whether an \emph{unrooted} phylogenetic network displays (i.e. embeds) an \emph{unrooted} phylogenetic tree, is NP-hard. On the positive side we show that this problem is FPT in reticulation number. In the rooted case the corresponding FPT result is trivial, but here we require more subtle argumentation. Next we show that the hybridization number problem for unrooted networks (when given two unrooted trees) is equivalent to the problem of computing the Tree Bisection and Reconnect (TBR) distance of the two unrooted trees. In the third part of the paper we consider the "root uncertain" variant of hybridization number. Here we are free to choose the root location in each of a set of unrooted input trees such that the hybridization number of the resulting rooted trees is minimized. On the negative side we show that this problem is APX-hard. On the positive side, we show that the problem is FPT in the hybridization number, via kernelization, for any number of input trees.
△ Less
Submitted 2 September, 2016;
originally announced September 2016.
-
Do branch lengths help to locate a tree in a phylogenetic network?
Authors:
Philippe Gambette,
Leo van Iersel,
Steven Kelk,
Fabio Pardi,
Celine Scornavacca
Abstract:
Phylogenetic networks are increasingly used in evolutionary biology to represent the history of species that have undergone reticulate events such as horizontal gene transfer, hybrid speciation and recombination. One of the most fundamental questions that arise in this context is whether the evolution of a gene with one copy in all species can be explained by a given network. In mathematical terms…
▽ More
Phylogenetic networks are increasingly used in evolutionary biology to represent the history of species that have undergone reticulate events such as horizontal gene transfer, hybrid speciation and recombination. One of the most fundamental questions that arise in this context is whether the evolution of a gene with one copy in all species can be explained by a given network. In mathematical terms, this is often translated in the following way: is a given phylogenetic tree contained in a given phylogenetic network? Recently this tree containment problem has been widely investigated from a computational perspective, but most studies have only focused on the topology of the phylo- genies, ignoring a piece of information that, in the case of phylogenetic trees, is routinely inferred by evolutionary analyses: branch lengths. These measure the amount of change (e.g., nucleotide substitutions) that has occurred along each branch of the phylogeny. Here, we study a number of versions of the tree containment problem that explicitly account for branch lengths. We show that, although length information has the potential to locate more precisely a tree within a network, the problem is computationally hard in its most general form. On a positive note, for a number of special cases of biological relevance, we provide algorithms that solve this problem efficiently. This includes the case of networks of limited complexity, for which it is possible to recover, among the trees contained by the network with the same topology as the input tree, the closest one in terms of branch lengths.
△ Less
Submitted 21 July, 2016;
originally announced July 2016.
-
Nonbinary tree-based phylogenetic networks
Authors:
Laura Jetten,
Leo van Iersel
Abstract:
Rooted phylogenetic networks are used to describe evolutionary histories that contain non-treelike evolutionary events such as hybridization and horizontal gene transfer. In some cases, such histories can be described by a phylogenetic base-tree with additional linking arcs, which can for example represent gene transfer events. Such phylogenetic networks are called tree-based. Here, we consider tw…
▽ More
Rooted phylogenetic networks are used to describe evolutionary histories that contain non-treelike evolutionary events such as hybridization and horizontal gene transfer. In some cases, such histories can be described by a phylogenetic base-tree with additional linking arcs, which can for example represent gene transfer events. Such phylogenetic networks are called tree-based. Here, we consider two possible generalizations of this concept to nonbinary networks, which we call tree-based and strictly-tree-based nonbinary phylogenetic networks. We give simple graph-theoretic characterizations of tree-based and strictly-tree-based nonbinary phylogenetic networks. Moreover, we show for each of these two classes that it can be decided in polynomial time whether a given network is contained in the class. Our approach also provides a new view on tree-based binary phylogenetic networks. Finally, we discuss two examples of nonbinary phylogenetic networks in biology and show how our results can be applied to them.
△ Less
Submitted 30 September, 2016; v1 submitted 19 January, 2016;
originally announced January 2016.
-
Graph realizations constrained by skeleton graphs
Authors:
Péter L. Erdős,
Stephen G. Hartke,
Leo van Iersel,
István Miklós
Abstract:
In 2008 Amanatidis, Green and Mihail introduced the Joint Degree Matrix (JDM) model to capture the fundamental difference in assortativity of networks in nature studied by the physical and life sciences and social networks studied in the social sciences. In 2014 Czabarka proposed a direct generalization of the JDM model, the Partition Adjacency Matrix (PAM) model. In the PAM model the vertices hav…
▽ More
In 2008 Amanatidis, Green and Mihail introduced the Joint Degree Matrix (JDM) model to capture the fundamental difference in assortativity of networks in nature studied by the physical and life sciences and social networks studied in the social sciences. In 2014 Czabarka proposed a direct generalization of the JDM model, the Partition Adjacency Matrix (PAM) model. In the PAM model the vertices have specified degrees, and the vertex set itself is partitioned into classes. For each pair of vertex classes the number of edges between the classes in a graph realization is prescribed. In this paper we apply the new {\em skeleton graph} model to describe the same information as the PAM model. Our model is more convenient for handling problems with low number of partition classes or with special topological restrictions among the classes. We investigate two particular cases in detail: (i) when there are only two vertex classes and (ii) when the skeleton graph contains at most one cycle.
△ Less
Submitted 7 February, 2017; v1 submitted 3 August, 2015;
originally announced August 2015.
-
Phylogenetic incongruence through the lens of Monadic Second Order logic
Authors:
Steven Kelk,
Leo van Iersel,
Celine Scornavacca
Abstract:
Within the field of phylogenetics there is growing interest in measures for summarising the dissimilarity, or 'incongruence', of two or more phylogenetic trees. Many of these measures are NP-hard to compute and this has stimulated a considerable volume of research into fixed parameter tractable algorithms. In this article we use Monadic Second Order logic (MSOL) to give alternative, compact proofs…
▽ More
Within the field of phylogenetics there is growing interest in measures for summarising the dissimilarity, or 'incongruence', of two or more phylogenetic trees. Many of these measures are NP-hard to compute and this has stimulated a considerable volume of research into fixed parameter tractable algorithms. In this article we use Monadic Second Order logic (MSOL) to give alternative, compact proofs of fixed parameter tractability for several well-known incongruency measures. In doing so we wish to demonstrate the considerable potential of MSOL - machinery still largely unknown outside the algorithmic graph theory community - within phylogenetics. A crucial component of this work is the observation that many of these measures, when bounded, imply the existence of an 'agreement forest' of bounded size, which in turn implies that an auxiliary graph structure, the display graph, has bounded treewidth. It is this bound on treewidth that makes the machinery of MSOL available for proving fixed parameter tractability. We give a variety of different MSOL formulations. Some are based on explicitly encoding agreement forests, while some only use them implicitly to generate the treewidth bound. Our formulations introduce a number of "phylogenetics MSOL primitives" which will hopefully be of use to other researchers.
△ Less
Submitted 1 March, 2015;
originally announced March 2015.
-
Reconstructing phylogenetic level-1 networks from nondense binet and trinet sets
Authors:
Katharina Huber,
Leo van Iersel,
Vincent Moulton,
Celine Scornavacca,
Taoyang Wu
Abstract:
Binets and trinets are phylogenetic networks with two and three leaves, respectively. Here we consider the problem of deciding if there exists a binary level-1 phylogenetic network displaying a given set $\mathcal{T}$ of binary binets or trinets over a set $X$ of taxa, and constructing such a network whenever it exists. We show that this is NP-hard for trinets but polynomial-time solvable for bine…
▽ More
Binets and trinets are phylogenetic networks with two and three leaves, respectively. Here we consider the problem of deciding if there exists a binary level-1 phylogenetic network displaying a given set $\mathcal{T}$ of binary binets or trinets over a set $X$ of taxa, and constructing such a network whenever it exists. We show that this is NP-hard for trinets but polynomial-time solvable for binets. Moreover, we show that the problem is still polynomial-time solvable for inputs consisting of binets and trinets as long as the cycles in the trinets have size three. Finally, we present an $O(3^{|X|} poly(|X|))$ time algorithm for general sets of binets and trinets. The latter two algorithms generalise to instances containing level-1 networks with arbitrarily many leaves, and thus provide some of the first supernetwork algorithms for computing networks from a set of rooted phylogenetic networks.
△ Less
Submitted 25 November, 2014;
originally announced November 2014.
-
Exact reconciliation of undated trees
Authors:
Leo van Iersel,
Celine Scornavacca,
Steven Kelk
Abstract:
Reconciliation methods aim at recovering macro evolutionary events and at localizing them in the species history, by observing discrepancies between gene family trees and species trees. In this article we introduce an Integer Linear Programming (ILP) approach for the NP-hard problem of computing a most parsimonious time-consistent reconciliation of a gene tree with a species tree when dating infor…
▽ More
Reconciliation methods aim at recovering macro evolutionary events and at localizing them in the species history, by observing discrepancies between gene family trees and species trees. In this article we introduce an Integer Linear Programming (ILP) approach for the NP-hard problem of computing a most parsimonious time-consistent reconciliation of a gene tree with a species tree when dating information on speciations is not available. The ILP formulation, which builds upon the DTL model, returns a most parsimonious reconciliation ranging over all possible datings of the nodes of the species tree. By studying its performance on plausible simulated data we conclude that the ILP approach is significantly faster than a brute force search through the space of all possible species tree datings. Although the ILP formulation is currently limited to small trees, we believe that it is an important proof-of-concept which opens the door to the possibility of develo** an exact, parsimony based approach to dating species trees. The software (ILPEACE) is freely available for download.
△ Less
Submitted 26 October, 2014;
originally announced October 2014.
-
Satisfying ternary permutation constraints by multiple linear orders or phylogenetic trees
Authors:
Leo van Iersel,
Steven Kelk,
Nela Lekic,
Simone Linz
Abstract:
A ternary permutation constraint satisfaction problem (CSP) is specified by a subset Pi of the symmetric group S_3. An instance of such a problem consists of a set of variables V and a set of constraints C, where each constraint is an ordered triple of distinct elements from V. The goal is to construct a linear order alpha on V such that, for each constraint (a,b,c) in C, the ordering of a,b,c ind…
▽ More
A ternary permutation constraint satisfaction problem (CSP) is specified by a subset Pi of the symmetric group S_3. An instance of such a problem consists of a set of variables V and a set of constraints C, where each constraint is an ordered triple of distinct elements from V. The goal is to construct a linear order alpha on V such that, for each constraint (a,b,c) in C, the ordering of a,b,c induced by alpha is in Pi. Excluding symmetries and trivial cases there are 11 such problems, and their complexity is well known. Here we consider the variant of the problem, denoted 2-Pi, where we are allowed to construct two linear orders alpha and beta and each constraint needs to be satisfied by at least one of the two. We give a full complexity classification of all 11 2-Pi problems, observing that in the switch from one to two linear orders the complexity landscape changes quite abruptly and that hardness proofs become rather intricate. We then focus on one of the 11 problems in particular, which is closely related to the '2-Caterpillar Compatibility' problem in the phylogenetics literature. We show that this particular CSP remains hard on three linear orders, and also in the biologically relevant case when we swap three linear orders for three phylogenetic trees, yielding the '3-Tree Compatibility' problem. Due to the biological relevance of this problem we also give extremal results concerning the minimum number of trees required, in the worst case, to satisfy a set of rooted triplet constraints on n leaf labels.
△ Less
Submitted 9 October, 2014;
originally announced October 2014.
-
Hybridization Number on Three Rooted Binary Trees is EPT
Authors:
Leo van Iersel,
Steven Kelk,
Nela Lekić,
Chris Whidden,
Norbert Zeh
Abstract:
Phylogenetic networks are leaf-labelled directed acyclic graphs that are used to describe non-treelike evolutionary histories and are thus a generalization of phylogenetic trees. The hybridization number of a phylogenetic network is the sum of all indegrees minus the number of nodes plus one. The Hybridization Number problem takes as input a collection of phylogenetic trees and asks to construct a…
▽ More
Phylogenetic networks are leaf-labelled directed acyclic graphs that are used to describe non-treelike evolutionary histories and are thus a generalization of phylogenetic trees. The hybridization number of a phylogenetic network is the sum of all indegrees minus the number of nodes plus one. The Hybridization Number problem takes as input a collection of phylogenetic trees and asks to construct a phylogenetic network that contains an embedding of each of the input trees and has a smallest possible hybridization number. We present an algorithm for the Hybridization Number problem on three binary trees on $n$ leaves, which runs in time $O(c^k poly(n))$, with $k$ the hybridization number of an optimal network and $c$ a constant. For two trees, an algorithm with running time $O(3.18^k n)$ was proposed before whereas an algorithm with running time $O(c^k poly(n))$ had prior to this article remained elusive for more than two trees. The algorithm for two trees uses the close connection to acyclic agreement forests to achieve a linear exponent in the running time, while previous algorithms for more than two trees (explicitly or implicitly) relied on a brute force search through all possible underlying network topologies, leading to running times that are not $O(c^k poly(n))$ for any $c$. The connection to acyclic agreement forests is much weaker for more than two trees, so even given the right agreement forest, reconstructing the network poses major challenges. We prove novel structural results that allow us to reconstruct a network without having to guess the underlying topology. Our techniques generalize to more than three input trees with the exception of one key lemma that maps nodes in the network to tree nodes and, thus, minimizes the amount of guessing involved in constructing the network. The main open problem therefore is to establish a similar map** for more than three trees.
△ Less
Submitted 31 May, 2016; v1 submitted 10 February, 2014;
originally announced February 2014.
-
A short note on exponential-time algorithms for hybridization number
Authors:
Leo van Iersel,
Steven Kelk,
Nela Lekic,
Leen Stougie
Abstract:
In this short note we prove that, given two (not necessarily binary) rooted phylogenetic trees T_1, T_2 on the same set of taxa X, where |X|=n, the hybridization number of T_1 and T_2 can be computed in time O^{*}(2^n) i.e. O(2^{n} poly(n)). The result also means that a Maximum Acyclic Agreement Forest (MAAF) can be computed within the same time bound.
In this short note we prove that, given two (not necessarily binary) rooted phylogenetic trees T_1, T_2 on the same set of taxa X, where |X|=n, the hybridization number of T_1 and T_2 can be computed in time O^{*}(2^n) i.e. O(2^{n} poly(n)). The result also means that a Maximum Acyclic Agreement Forest (MAAF) can be computed within the same time bound.
△ Less
Submitted 4 December, 2013;
originally announced December 2013.
-
Kernelizations for the hybridization number problem on multiple nonbinary trees
Authors:
Leo van Iersel,
Steven Kelk,
Celine Scornavacca
Abstract:
Given a finite set $X$, a collection $\mathcal{T}$ of rooted phylogenetic trees on $X$ and an integer $k$, the Hybridization Number problem asks if there exists a phylogenetic network on $X$ that displays all trees from $\mathcal{T}$ and has reticulation number at most $k$. We show two kernelization algorithms for Hybridization Number, with kernel sizes $4k(5k)^t$ and $20k^2(Δ^+-1)$ respectively,…
▽ More
Given a finite set $X$, a collection $\mathcal{T}$ of rooted phylogenetic trees on $X$ and an integer $k$, the Hybridization Number problem asks if there exists a phylogenetic network on $X$ that displays all trees from $\mathcal{T}$ and has reticulation number at most $k$. We show two kernelization algorithms for Hybridization Number, with kernel sizes $4k(5k)^t$ and $20k^2(Δ^+-1)$ respectively, with $t$ the number of input trees and $Δ^+$ their maximum outdegree. Experiments on simulated data demonstrate the practical relevance of these kernelization algorithms. In addition, we present an $n^{f(k)}t$-time algorithm, with $n=|X|$ and $f$ some computable function of $k$.
△ Less
Submitted 22 March, 2016; v1 submitted 16 November, 2013;
originally announced November 2013.
-
On Computing the Maximum Parsimony Score of a Phylogenetic Network
Authors:
Mareike Fischer,
Leo van Iersel,
Steven Kelk,
Celine Scornavacca
Abstract:
Phylogenetic networks are used to display the relationship of different species whose evolution is not treelike, which is the case, for instance, in the presence of hybridization events or horizontal gene transfers. Tree inference methods such as Maximum Parsimony need to be modified in order to be applicable to networks. In this paper, we discuss two different definitions of Maximum Parsimony on…
▽ More
Phylogenetic networks are used to display the relationship of different species whose evolution is not treelike, which is the case, for instance, in the presence of hybridization events or horizontal gene transfers. Tree inference methods such as Maximum Parsimony need to be modified in order to be applicable to networks. In this paper, we discuss two different definitions of Maximum Parsimony on networks, "hardwired" and "softwired", and examine the complexity of computing them given a network topology and a character. By exploiting a link with the problem Multicut, we show that computing the hardwired parsimony score for 2-state characters is polynomial-time solvable, while for characters with more states this problem becomes NP-hard but is still approximable and fixed parameter tractable in the parsimony score. On the other hand we show that, for the softwired definition, obtaining even weak approximation guarantees is already difficult for binary characters and restricted network topologies, and fixed-parameter tractable algorithms in the parsimony score are unlikely. On the positive side we show that computing the softwired parsimony score is fixed-parameter tractable in the level of the network, a natural parameter describing how tangled reticulate activity is in the network. Finally, we show that both the hardwired and softwired parsimony score can be computed efficiently using Integer Linear Programming. The software has been made freely available.
△ Less
Submitted 1 May, 2014; v1 submitted 11 February, 2013;
originally announced February 2013.
-
Approximation algorithms for nonbinary agreement forests
Authors:
Leo van Iersel,
Steven Kelk,
Nela Lekić,
Leen Stougie
Abstract:
Given two rooted phylogenetic trees on the same set of taxa X, the Maximum Agreement Forest problem (MAF) asks to find a forest that is, in a certain sense, common to both trees and has a minimum number of components. The Maximum Acyclic Agreement Forest problem (MAAF) has the additional restriction that the components of the forest cannot have conflicting ancestral relations in the input trees. T…
▽ More
Given two rooted phylogenetic trees on the same set of taxa X, the Maximum Agreement Forest problem (MAF) asks to find a forest that is, in a certain sense, common to both trees and has a minimum number of components. The Maximum Acyclic Agreement Forest problem (MAAF) has the additional restriction that the components of the forest cannot have conflicting ancestral relations in the input trees. There has been considerable interest in the special cases of these problems in which the input trees are required to be binary. However, in practice, phylogenetic trees are rarely binary, due to uncertainty about the precise order of speciation events. Here, we show that the general, nonbinary version of MAF has a polynomial-time 4-approximation and a fixed-parameter tractable (exact) algorithm that runs in O(4^k poly(n)) time, where n = |X| and k is the number of components of the agreement forest minus one. Moreover, we show that a c-approximation algorithm for nonbinary MAF and a d-approximation algorithm for the classical problem Directed Feedback Vertex Set (DFVS) can be combined to yield a d(c+3)-approximation for nonbinary MAAF. The algorithms for MAF have been implemented and made publicly available.
△ Less
Submitted 23 December, 2012; v1 submitted 11 October, 2012;
originally announced October 2012.
-
Trinets encode tree-child and level-2 phylogenetic networks
Authors:
Leo van Iersel,
Vincent Moulton
Abstract:
Phylogenetic networks generalize evolutionary trees, and are commonly used to represent evolutionary histories of species that undergo reticulate evolutionary processes such as hybridization, recombination and lateral gene transfer. Recently, there has been great interest in trying to develop methods to construct rooted phylogenetic networks from triplets, that is rooted trees on three species. Ho…
▽ More
Phylogenetic networks generalize evolutionary trees, and are commonly used to represent evolutionary histories of species that undergo reticulate evolutionary processes such as hybridization, recombination and lateral gene transfer. Recently, there has been great interest in trying to develop methods to construct rooted phylogenetic networks from triplets, that is rooted trees on three species. However, although triplets determine or encode rooted phylogenetic trees, they do not in general encode rooted phylogenetic networks, which is a potential issue for any such method. Motivated by this fact, Huber and Moulton recently introduced trinets as a natural extension of rooted triplets to networks. In particular, they showed that level-1 phylogenetic networks are encoded by their trinets, and also conjectured that all "recoverable" rooted phylogenetic networks are encoded by their trinets. Here we prove that recoverable binary level-2 networks and binary tree-child networks are also encoded by their trinets. To do this we prove two decomposition theorems based on trinets which hold for all recoverable binary rooted phylogenetic networks. Our results provide some additional evidence in support of the conjecture that trinets encode all recoverable rooted phylogenetic networks, and could also lead to new approaches to construct phylogenetic networks from trinets.
△ Less
Submitted 1 October, 2012;
originally announced October 2012.
-
A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees
Authors:
Leo van Iersel,
Steven Kelk,
Nela Lekić,
Celine Scornavacca
Abstract:
Reticulate events play an important role in determining evolutionary relationships. The problem of computing the minimum number of such events to explain discordance between two phylogenetic trees is a hard computational problem. Even for binary trees, exact solvers struggle to solve instances with reticulation number larger than 40-50. Here we present CycleKiller and NonbinaryCycleKiller, the fir…
▽ More
Reticulate events play an important role in determining evolutionary relationships. The problem of computing the minimum number of such events to explain discordance between two phylogenetic trees is a hard computational problem. Even for binary trees, exact solvers struggle to solve instances with reticulation number larger than 40-50. Here we present CycleKiller and NonbinaryCycleKiller, the first methods to produce solutions verifiably close to optimality for instances with hundreds or even thousands of reticulations. Using simulations, we demonstrate that these algorithms run quickly for large and difficult instances, producing solutions that are very close to optimality. As a spin-off from our simulations we also present TerminusEst, which is the fastest exact method currently available that can handle nonbinary trees: this is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All three methods are based on extensions of previous theoretical work and are publicly available. We also apply our methods to real data.
△ Less
Submitted 1 May, 2014; v1 submitted 15 May, 2012;
originally announced May 2012.
-
A quadratic kernel for computing the hybridization number of multiple trees
Authors:
Leo van Iersel,
Simone Linz
Abstract:
It has recently been shown that the NP-hard problem of calculating the minimum number of hybridization events that is needed to explain a set of rooted binary phylogenetic trees by means of a hybridization network is fixed-parameter tractable if an instance of the problem consists of precisely two such trees. In this paper, we show that this problem remains fixed-parameter tractable for an arbitra…
▽ More
It has recently been shown that the NP-hard problem of calculating the minimum number of hybridization events that is needed to explain a set of rooted binary phylogenetic trees by means of a hybridization network is fixed-parameter tractable if an instance of the problem consists of precisely two such trees. In this paper, we show that this problem remains fixed-parameter tractable for an arbitrarily large set of rooted binary phylogenetic trees. In particular, we present a quadratic kernel.
△ Less
Submitted 19 March, 2012;
originally announced March 2012.
-
Cycle killer... qu'est-ce que c'est? On the comparative approximability of hybridization number and directed feedback vertex set
Authors:
Steven Kelk,
Leo van Iersel,
Nela Lekic,
Simone Linz,
Celine Scornavacca,
Leen Stougie
Abstract:
We show that the problem of computing the hybridization number of two rooted binary phylogenetic trees on the same set of taxa X has a constant factor polynomial-time approximation if and only if the problem of computing a minimum-size feedback vertex set in a directed graph (DFVS) has a constant factor polynomial-time approximation. The latter problem, which asks for a minimum number of vertices…
▽ More
We show that the problem of computing the hybridization number of two rooted binary phylogenetic trees on the same set of taxa X has a constant factor polynomial-time approximation if and only if the problem of computing a minimum-size feedback vertex set in a directed graph (DFVS) has a constant factor polynomial-time approximation. The latter problem, which asks for a minimum number of vertices to be removed from a directed graph to transform it into a directed acyclic graph, is one of the problems in Karp's seminal 1972 list of 21 NP-complete problems. However, despite considerable attention from the combinatorial optimization community it remains to this day unknown whether a constant factor polynomial-time approximation exists for DFVS. Our result thus places the (in)approximability of hybridization number in a much broader complexity context, and as a consequence we obtain that hybridization number inherits inapproximability results from the problem Vertex Cover. On the positive side, we use results from the DFVS literature to give an O(log r log log r) approximation for hybridization number, where r is the value of an optimal solution to the hybridization number problem.
△ Less
Submitted 22 December, 2011;
originally announced December 2011.
-
On the elusiveness of clusters
Authors:
Steven Kelk,
Celine Scornavacca,
Leo van Iersel
Abstract:
Rooted phylogenetic networks are often used to represent conflicting phylogenetic signals. Given a set of clusters, a network is said to represent these clusters in the "softwired" sense if, for each cluster in the input set, at least one tree embedded in the network contains that cluster. Motivated by parsimony we might wish to construct such a network using as few reticulations as possible, or m…
▽ More
Rooted phylogenetic networks are often used to represent conflicting phylogenetic signals. Given a set of clusters, a network is said to represent these clusters in the "softwired" sense if, for each cluster in the input set, at least one tree embedded in the network contains that cluster. Motivated by parsimony we might wish to construct such a network using as few reticulations as possible, or minimizing the "level" of the network, i.e. the maximum number of reticulations used in any "tangled" region of the network. Although these are NP-hard problems, here we prove that, for every fixed k >= 0, it is polynomial-time solvable to construct a phylogenetic network with level equal to k representing a cluster set, or to determine that no such network exists. However, this algorithm does not lend itself to a practical implementation. We also prove that the comparatively efficient Cass algorithm correctly solves this problem (and also minimizes the reticulation number) when input clusters are obtained from two not necessarily binary gene trees on the same set of taxa but does not always minimize level for general cluster sets. Finally, we describe a new algorithm which generates in polynomial-time all binary phylogenetic networks with exactly r reticulations representing a set of input clusters (for every fixed r >= 0).
△ Less
Submitted 9 March, 2011;
originally announced March 2011.
-
Locating a tree in a phylogenetic network
Authors:
Leo van Iersel,
Charles Semple,
Mike Steel
Abstract:
Phylogenetic trees and networks are leaf-labelled graphs that are used to describe evolutionary histories of species. The Tree Containment problem asks whether a given phylogenetic tree is embedded in a given phylogenetic network. Given a phylogenetic network and a cluster of species, the Cluster Containment problem asks whether the given cluster is a cluster of some phylogenetic tree embedded in…
▽ More
Phylogenetic trees and networks are leaf-labelled graphs that are used to describe evolutionary histories of species. The Tree Containment problem asks whether a given phylogenetic tree is embedded in a given phylogenetic network. Given a phylogenetic network and a cluster of species, the Cluster Containment problem asks whether the given cluster is a cluster of some phylogenetic tree embedded in the network. Both problems are known to be NP-complete in general. In this article, we consider the restriction of these problems to several well-studied classes of phylogenetic networks. We show that Tree Containment is polynomial-time solvable for normal networks, for binary tree-child networks, and for level-$k$ networks. On the other hand, we show that, even for tree-sibling, time-consistent, regular networks, both Tree Containment and Cluster Containment remain NP-complete.
△ Less
Submitted 15 June, 2010;
originally announced June 2010.
-
When two trees go to war
Authors:
Leo van Iersel,
Steven Kelk
Abstract:
Rooted phylogenetic networks are often constructed by combining trees, clusters, triplets or characters into a single network that in some well-defined sense simultaneously represents them all. We review these four models and investigate how they are related. In general, the model chosen influences the minimum number of reticulation events required. However, when one obtains the input data from tw…
▽ More
Rooted phylogenetic networks are often constructed by combining trees, clusters, triplets or characters into a single network that in some well-defined sense simultaneously represents them all. We review these four models and investigate how they are related. In general, the model chosen influences the minimum number of reticulation events required. However, when one obtains the input data from two binary trees, we show that the minimum number of reticulations is independent of the model. The number of reticulations necessary to represent the trees, triplets, clusters (in the softwired sense) and characters (with unrestricted multiple crossover recombination) are all equal. Furthermore, we show that these results also hold when not the number of reticulations but the level of the constructed network is minimised. We use these unification results to settle several complexity questions that have been open in the field for some time. We also give explicit examples to show that already for data obtained from three binary trees the models begin to diverge.
△ Less
Submitted 29 April, 2010;
originally announced April 2010.
-
All Ternary Permutation Constraint Satisfaction Problems Parameterized Above Average Have Kernels with Quadratic Numbers of Variables
Authors:
Gregory Gutin,
Leo van Iersel,
Matthias Mnich,
Anders Yeo
Abstract:
A ternary Permutation-CSP is specified by a subset $Π$ of the symmetric group $\mathcal S_3$. An instance of such a problem consists of a set of variables $V$ and a multiset of constraints, which are ordered triples of distinct variables of $V.$ The objective is to find a linear ordering $α$ of $V$ that maximizes the number of triples whose ordering (under $α$) follows a permutation in $Π$. We pro…
▽ More
A ternary Permutation-CSP is specified by a subset $Π$ of the symmetric group $\mathcal S_3$. An instance of such a problem consists of a set of variables $V$ and a multiset of constraints, which are ordered triples of distinct variables of $V.$ The objective is to find a linear ordering $α$ of $V$ that maximizes the number of triples whose ordering (under $α$) follows a permutation in $Π$. We prove that all ternary Permutation-CSPs parameterized above average have kernels with quadratic numbers of variables.
△ Less
Submitted 8 July, 2011; v1 submitted 12 April, 2010;
originally announced April 2010.