-
$k$-Leaf Powers Cannot be Characterized by a Finite Set of Forbidden Induced Subgraphs for $k \geq 5$
Authors:
Max Dupré la Tour,
Manuel Lafond,
Ndiamé Ndiaye,
Adrian Vetta
Abstract:
A graph $G=(V,E)$ is a $k$-leaf power if there is a tree $T$ whose leaves are the vertices of $G$ with the property that a pair of leaves $u$ and $v$ induce an edge in $G$ if and only if they are distance at most $k$ apart in $T$. For $k\le 4$, it is known that there exists a finite set $F_k$ of graphs such that the class $L(k)$ of $k$-leaf power graphs is characterized as the set of strongly chor…
▽ More
A graph $G=(V,E)$ is a $k$-leaf power if there is a tree $T$ whose leaves are the vertices of $G$ with the property that a pair of leaves $u$ and $v$ induce an edge in $G$ if and only if they are distance at most $k$ apart in $T$. For $k\le 4$, it is known that there exists a finite set $F_k$ of graphs such that the class $L(k)$ of $k$-leaf power graphs is characterized as the set of strongly chordal graphs that do not contain any graph in $F_k$ as an induced subgraph. We prove no such characterization holds for $k\ge 5$. That is, for any $k\ge 5$, there is no finite set $F_k$ of graphs such that $L(k)$ is equivalent to the set of strongly chordal graphs that do not contain as an induced subgraph any graph in $F_k$.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Finding Maximum Common Contractions Between Phylogenetic Networks
Authors:
Bertrand Marchand,
Nadia Tahiri,
Olivier Tremblay-Savard,
Manuel Lafond
Abstract:
In this paper, we lay the groundwork on the comparison of phylogenetic networks based on edge contractions and expansions as edit operations, as originally proposed by Robinson and Foulds to compare trees. We prove that these operations connect the space of all phylogenetic networks on the same set of leaves, even if we forbid contractions that create cycles. This allows to define an operational d…
▽ More
In this paper, we lay the groundwork on the comparison of phylogenetic networks based on edge contractions and expansions as edit operations, as originally proposed by Robinson and Foulds to compare trees. We prove that these operations connect the space of all phylogenetic networks on the same set of leaves, even if we forbid contractions that create cycles. This allows to define an operational distance on this space, as the minimum number of contractions and expansions required to transform one network into another. We highlight the difference between this distance and the computation of the maximum common contraction between two networks. Given its ability to outline a common structure between them, which can provide valuable biological insights, we study the algorithmic aspects of the latter. We first prove that computing a maximum common contraction between two networks is NP-hard, even when the maximum degree, the size of the common contraction, or the number of leaves is bounded. We also provide lower bounds to the problem based on the Exponential-Time Hypothesis. Nonetheless, we do provide a polynomial-time algorithm for weakly-galled networks, a generalization of galled trees.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Median and Small Parsimony Problems on RNA trees
Authors:
Bertrand Marchand,
Yoann Anselmetti,
Manuel Lafond,
Aïda Ouangraoua
Abstract:
Motivation: Non-coding RNAs (ncRNAs) express their functions by adopting molecular structures. Specifically, RNA secondary structures serve as a relatively stable intermediate step before tertiary structures, offering a reliable signature of molecular function. Consequently, within an RNA functional family, secondary structures are generally more evolutionarily conserved than sequences. Conversely…
▽ More
Motivation: Non-coding RNAs (ncRNAs) express their functions by adopting molecular structures. Specifically, RNA secondary structures serve as a relatively stable intermediate step before tertiary structures, offering a reliable signature of molecular function. Consequently, within an RNA functional family, secondary structures are generally more evolutionarily conserved than sequences. Conversely, homologous RNA families grouped within an RNA clan share ancestors but typically exhibit structural differences. Inferring the evolution of RNA structures within RNA families and clans is crucial for gaining insights into functional adaptations over time and providing clues about the Ancient RNA World Hypothesis. Results: We introduce the median problem and the small parsimony problem for ncRNA families, where secondary structures are represented as leaf-labelled trees. We utilize the Robinson-Foulds (RF) tree distance, which corresponds to a specific edit distance between RNA trees, and a new metric called the Internal-Leafset (IL) distance. While the RF tree distance compares sets of leaves descending from internal nodes of two RNA trees, the IL distance compares the collection of leaf-children of internal nodes. The latter is better at capturing differences in structural elements of RNAs than the RF distance, which is more focused on base pairs. We also consider a more general tree edit distance that allows the map** of base pairs that are not perfectly aligned. We study the theoretical complexity of the median problem and the small parsimony problem under the three distance metrics and various biologically-relevant constraints, and we present polynomial-time maximum parsimony algorithms for solving some versions of the problems. Our algorithms are applied to ncRNA families from the RFAM database, illustrating their practical utility
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Predicting Horizontal Gene Transfers with Perfect Transfer Networks
Authors:
Alitzel López Sánchez,
Manuel Lafond
Abstract:
Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequences to reconstruct the gene and species trees. However, it is well-known that sequences have difficulty identifying ancient transfers since mutations have enough time to erase all evidence of…
▽ More
Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequences to reconstruct the gene and species trees. However, it is well-known that sequences have difficulty identifying ancient transfers since mutations have enough time to erase all evidence of such events. In this work, we ask whether character-based methods can predict gene transfers. Their advantage over sequences is that homologous genes can have low DNA similarity, but still have retained enough important common motifs that allow them to have common character traits, for instance the same functional or expression profile. A phylogeny that has two separate clades that acquired the same character independently might indicate the presence of a transfer even in the absence of sequence similarity. We introduce perfect transfer networks, which are phylogenetic networks that can explain the character diversity of a set of taxa under the assumption that characters have unique births, and that once a character is gained it is rarely lost. Examples of such traits include transposable elements, biochemical markers and emergence of organelles, just to name a few. We study the differences between our model and two similar models: perfect phylogenetic networks and ancestral recombination networks. Our goals are to initiate a study on the structural and algorithmic properties of perfect transfer networks. We then show that in polynomial time, one can decide whether a given network is a valid explanation for a set of taxa, and show how, for a given tree, one can add transfer edges to it so that it explains a set of taxa. We finally provide lower and upper bounds on the number of transfers required to explain a set of taxa, in the worst case.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Parameterized Complexity of Domination Problems Using Restricted Modular Partitions
Authors:
Manuel Lafond,
Weidong Luo
Abstract:
For a graph class $\mathcal{G}$, we define the $\mathcal{G}$-modular cardinality of a graph $G$ as the minimum size of a vertex partition of $G$ into modules that each induces a graph in $\mathcal{G}$. This generalizes other module-based graph parameters such as neighborhood diversity and iterated type partition. Moreover, if $\mathcal{G}$ has bounded modular-width, the W[1]-hardness of a problem…
▽ More
For a graph class $\mathcal{G}$, we define the $\mathcal{G}$-modular cardinality of a graph $G$ as the minimum size of a vertex partition of $G$ into modules that each induces a graph in $\mathcal{G}$. This generalizes other module-based graph parameters such as neighborhood diversity and iterated type partition. Moreover, if $\mathcal{G}$ has bounded modular-width, the W[1]-hardness of a problem in $\mathcal{G}$-modular cardinality implies hardness on modular-width, clique-width, and other related parameters. On the other hand, fixed-parameter tractable (FPT) algorithms in $\mathcal{G}$-modular cardinality may provide new ideas for algorithms using such parameters.
Several FPT algorithms based on modular partitions compute a solution table in each module, then combine each table into a global solution. This works well when each table has a succinct representation, but as we argue, when no such representation exists, the problem is typically W[1]-hard. We illustrate these ideas on the generic $(α, β)$-domination problem, which asks for a set of vertices that contains at least a fraction $α$ of the adjacent vertices of each unchosen vertex, plus some (possibly negative) amount $β$. This generalizes known domination problems such as Bounded Degree Deletion, $k$-Domination, and $α$-Domination. We show that for graph classes $\mathcal{G}$ that require arbitrarily large solution tables, these problems are W[1]-hard in the $\mathcal{G}$-modular cardinality, whereas they are fixed-parameter tractable when they admit succinct solution tables. This leads to several new positive and negative results for many domination problems parameterized by known and novel structural graph parameters such as clique-width, modular-width, and $cluster$-modular cardinality.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
An FTP Algorithm for Temporal Graph Untangling
Authors:
Riccardo Dondi,
Manuel Lafond
Abstract:
Several classical combinatorial problems have been considered and analysed on temporal graphs. Recently, a variant of Vertex Cover on temporal graphs, called MinTimelineCover, has been introduced to summarize timeline activities in social networks. The problem asks to cover every temporal edge while minimizing the total span of the vertices (where the span of a vertex is the length of the timestam…
▽ More
Several classical combinatorial problems have been considered and analysed on temporal graphs. Recently, a variant of Vertex Cover on temporal graphs, called MinTimelineCover, has been introduced to summarize timeline activities in social networks. The problem asks to cover every temporal edge while minimizing the total span of the vertices (where the span of a vertex is the length of the timestamp interval it must remain active in, minus one). While the problem has been shown to be NP-hard even in very restricted cases, its parameterized complexity has not been fully understood. The problem is known to be in FPT under the span parameter only for graphs with two timestamps, but the parameterized complexity for the general case is open. We settle this open problem by giving an FPT algorithm that is based on a combination of iterative compression and a reduction to the Digraph Pair Cut problem, a powerful problem that has received significant attention recently.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Preprocessing Complexity for Some Graph Problems Parameterized by Structural Parameters
Authors:
Manuel Lafond,
Weidong Luo
Abstract:
Structural graph parameters play an important role in parameterized complexity, including in kernelization. Notably, vertex cover, neighborhood diversity, twin-cover, and modular-width have been studied extensively in the last few years. However, there are many fundamental problems whose preprocessing complexity is not fully understood under these parameters. Indeed, the existence of polynomial ke…
▽ More
Structural graph parameters play an important role in parameterized complexity, including in kernelization. Notably, vertex cover, neighborhood diversity, twin-cover, and modular-width have been studied extensively in the last few years. However, there are many fundamental problems whose preprocessing complexity is not fully understood under these parameters. Indeed, the existence of polynomial kernels or polynomial Turing kernels for famous problems such as Clique, Chromatic Number, and Steiner Tree has only been established for a subset of structural parameters. In this work, we use several techniques to obtain a complete preprocessing complexity landscape for over a dozen of fundamental algorithmic problems.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
A lightweight semi-centralized strategy for the massive parallelization of branching algorithms
Authors:
Andres Pastrana-Cruz,
Manuel Lafond
Abstract:
Several NP-hard problems are solved exactly using exponential-time branching strategies, whether it be branch-and-bound algorithms, or bounded search trees in fixed-parameter algorithms. The number of tractable instances that can be handled by sequential algorithms is usually small, whereas massive parallelization has been shown to significantly increase the space of instances that can be solved e…
▽ More
Several NP-hard problems are solved exactly using exponential-time branching strategies, whether it be branch-and-bound algorithms, or bounded search trees in fixed-parameter algorithms. The number of tractable instances that can be handled by sequential algorithms is usually small, whereas massive parallelization has been shown to significantly increase the space of instances that can be solved exactly. However, previous centralized approaches require too much communication to be efficient, whereas decentralized approaches are more efficient but have difficulty kee** track of the global state of the exploration.
In this work, we propose to revisit the centralized paradigm while avoiding previous bottlenecks. In our strategy, the center has lightweight responsibilities, requires only a few bits for every communication, but is still able to keep track of the progress of every worker. In particular, the center never holds any task but is able to guarantee that a process with no work always receives the highest priority task globally.
Our strategy was implemented in a generic C++ library called GemPBA, which allows a programmer to convert a sequential branching algorithm into a parallel version by changing only a few lines of code. An experimental case study on the vertex cover problem demonstrates that some of the toughest instances from the DIMACS challenge graphs that would take months to solve sequentially can be handled within two hours with our approach.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Finding agreement cherry-reduced subnetworks in level-1 networks
Authors:
Kaari Landry,
Olivier Tremblay-Savard,
Manuel Lafond
Abstract:
Phylogenetic networks are increasingly being considered as better suited to represent the complexity of the evolutionary relationships between species. One class of phylogenetic networks that has received a lot of attention recently is the class of orchard networks, which is composed of networks that can be reduced to a single leaf using cherry reductions. Cherry reductions, also called cherry-pic…
▽ More
Phylogenetic networks are increasingly being considered as better suited to represent the complexity of the evolutionary relationships between species. One class of phylogenetic networks that has received a lot of attention recently is the class of orchard networks, which is composed of networks that can be reduced to a single leaf using cherry reductions. Cherry reductions, also called cherry-picking operations, remove either a leaf of a simple cherry (sibling leaves sharing a parent) or a reticulate edge of a reticulate cherry (two leaves whose parents are connected by a reticulate edge). In this paper, we present a fixed-parameter tractable algorithm to solve the problem of finding a maximum agreement cherry-reduced subnetwork (MACRS) between two rooted binary level-1 networks. This is first exact algorithm proposed to solve the MACRS problem. As proven in earlier work, there is a direct relationship between finding an MACRS and calculating a distance based on cherry operations. As a result, the proposed algorithm also provides a distance that can be used for the comparison of level-1 networks.
△ Less
Submitted 28 April, 2023;
originally announced May 2023.
-
The Longest Subsequence-Repeated Subsequence Problem
Authors:
Manuel Lafond,
Wenfeng Lai,
Adiesha Liyanage,
Binhai Zhu
Abstract:
Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest subsequence-repeated subsequence (LSRS) is proposed. Given a sequence $S$ of length $n$, a letter-repeated subsequence is a subsequence of $S$ in the form of $x_1^{d_1}x_2^{d_2}\cdots x_k^{d_k}$ with $x_i$ a subsequence of $S$, $x_j\neq x_{j+1}$ and $d_i\geq 2$ for all $i$ in $[k]$ and $j$ in…
▽ More
Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest subsequence-repeated subsequence (LSRS) is proposed. Given a sequence $S$ of length $n$, a letter-repeated subsequence is a subsequence of $S$ in the form of $x_1^{d_1}x_2^{d_2}\cdots x_k^{d_k}$ with $x_i$ a subsequence of $S$, $x_j\neq x_{j+1}$ and $d_i\geq 2$ for all $i$ in $[k]$ and $j$ in $[k-1]$. We first present an $O(n^6)$ time algorithm to compute the longest cubic subsequences of all the $O(n^2)$ substrings of $S$, improving the trivial $O(n^7)$ bound. Then, an $O(n^6)$ time algorithm for computing the longest subsequence-repeated subsequence (LSRS) of $S$ is obtained. Finally we focus on two variants of this problem. We first consider the constrained version when $Σ$ is unbounded, each letter appears in $S$ at most $d$ times and all the letters in $Σ$ must appear in the solution. We show that the problem is NP-hard for $d=4$, via a reduction from a special version of SAT (which is obtained from 3-COLORING). We then show that when each letter appears in $S$ at most $d=3$ times, then the problem is solvable in $O(n^5)$ time.
△ Less
Submitted 31 August, 2023; v1 submitted 13 April, 2023;
originally announced April 2023.
-
The Two-Squirrel Problem and Its Relatives
Authors:
Sergey Bereg,
Yuya Higashikawa,
Naoki Katoh,
Manuel Lafond,
Yuki Tokuni,
Binhai Zhu
Abstract:
In this paper, we start with a variation of the star cover problem called the Two-Squirrel problem. Given a set $P$ of $2n$ points in the plane, and two sites $c_1$ and $c_2$, compute two $n$-stars $S_1$ and $S_2$ centered at $c_1$ and $c_2$ respectively such that the maximum weight of $S_1$ and $S_2$ is minimized. This problem is strongly NP-hard by a reduction from Equal-size Set-Partition with…
▽ More
In this paper, we start with a variation of the star cover problem called the Two-Squirrel problem. Given a set $P$ of $2n$ points in the plane, and two sites $c_1$ and $c_2$, compute two $n$-stars $S_1$ and $S_2$ centered at $c_1$ and $c_2$ respectively such that the maximum weight of $S_1$ and $S_2$ is minimized. This problem is strongly NP-hard by a reduction from Equal-size Set-Partition with Rationals. Then we consider two variations of the Two-Squirrel problem, namely the Two-MST and Two-TSP problem, which are both NP-hard. The NP-hardness for the latter is obvious while the former needs a non-trivial reduction from Equal-size Set-Partition with Rationals. In terms of approximation algorithms, for Two-MST and Two-TSP we give factor 3.6402 and $4+\varepsilon$ approximations respectively. Finally, we also show some interesting polynomial-time solvable cases for Two-MST.
△ Less
Submitted 12 February, 2023;
originally announced February 2023.
-
Relative Timing Information and Orthology in Evolutionary Scenarios
Authors:
David Schaller,
Tom Hartmann,
Manuel Lafond,
Nicolas Wieseke,
Peter F. Stadler,
Marc Hellmuth
Abstract:
Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the map** of the vertices of a gene tree $T$ to vertices and edges of a species tree $S$. The relative timing of the last common ancestors of two extant genes (leaves of $T$) and the last common ancestors of the two species (leaves of $S$) in which they reside is indicative of horizontal…
▽ More
Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the map** of the vertices of a gene tree $T$ to vertices and edges of a species tree $S$. The relative timing of the last common ancestors of two extant genes (leaves of $T$) and the last common ancestors of the two species (leaves of $S$) in which they reside is indicative of horizontal gene transfers (HGT) and ancient duplications. Orthologous gene pairs, on the other hand, require that their last common ancestors coincides with a corresponding speciation event. The relative timing information of gene and species divergences is captured by three colored graphs that have the extant genes as vertices and the species in which the genes are found as vertex colors: the equal-divergence-time (EDT) graph, the later-divergence-time (LDT) graph and the prior-divergence-time (PDT) graph, which together form an edge partition of the complete graph.
Here we give a complete characterization in terms of informative and forbidden triples that can be read off the three graphs and provide a polynomial time algorithm for constructing an evolutionary scenario that explains the graphs, provided such a scenario exists. We show that every EDT graph is perfect. While the information about LDT and PDT graphs is necessary to recognize EDT graphs in polynomial-time for general scenarios, this extra information can be dropped in the HGT-free case. However, recognition of EDT graphs without knowledge of putative LDT and PDT graphs is NP-complete for general scenarios. In contrast, PDT graphs can be recognized in polynomial-time. We finally connect the EDT graph to the alternative definitions of orthology that have been proposed for scenarios with horizontal gene transfer. With one exception, the corresponding graphs are shown to be colored cographs.
△ Less
Submitted 2 August, 2023; v1 submitted 5 December, 2022;
originally announced December 2022.
-
On Generalizations of Pairwise Compatibility Graphs
Authors:
Tiziana Calamoneri,
Manuel Lafond,
Angelo Monti,
Blerina Sinaimeri
Abstract:
A graph $G$ is a PCG if there exists an edge-weighted tree such that each leaf of the tree is a vertex of the graph, and there is an edge $\{ x, y \}$ in $G$ if and only if the weight of the path in the tree connecting $x$ and $y$ lies within a given interval. PCGs have different applications in phylogenetics and have been lately generalized to multi-interval-PCGs. In this paper we define two new…
▽ More
A graph $G$ is a PCG if there exists an edge-weighted tree such that each leaf of the tree is a vertex of the graph, and there is an edge $\{ x, y \}$ in $G$ if and only if the weight of the path in the tree connecting $x$ and $y$ lies within a given interval. PCGs have different applications in phylogenetics and have been lately generalized to multi-interval-PCGs. In this paper we define two new generalizations of the PCG class, namely k-OR-PCGs and k-AND-PCGs, that are the classes of graphs that can be expressed as union and intersection, respectively, of $k$ PCGs. The problems we consider can be also described in terms of the \emph{covering number} and the \emph{intersection dimension} of a graph with respect to the PCG class. In this paper we investigate how the classes of PCG, multi-interval-PCG, OR-PCG and AND-PCG are related to each other and to other graph classes known in the literature. In particular, we provide upper bounds on the minimum $k$ for which an arbitrary graph $G$ belongs to k-interval-PCG, k-OR-PCG and k-AND-PCG classes. Furthermore, for particular graph classes, we improve these general bounds. Moreover, we show that, for every integer $k$, there exists a bipartite graph that is not in the k-interval-PCG class, proving that there is no finite $k$ for which the k-interval-PCG class contains all the graphs. Finally, we use a Ramsey theory argument to show that for any $k$, there exist graphs that are not in k-AND-PCG, and graphs that are not in k-OR-PCG.
△ Less
Submitted 13 April, 2024; v1 submitted 15 December, 2021;
originally announced December 2021.
-
Recognizing k-leaf powers in polynomial time, for constant k
Authors:
Manuel Lafond
Abstract:
A graph $G$ is a $k$-leaf power if there exists a tree $T$ whose leaf set is $V(G)$, and such that $uv \in E(G)$ if and only if the distance between $u$ and $v$ in $T$ is at most $k$. The graph classes of $k$-leaf powers have several applications in computational biology, but recognizing them has remained a challenging algorithmic problem for the past two decades. The best known result is that…
▽ More
A graph $G$ is a $k$-leaf power if there exists a tree $T$ whose leaf set is $V(G)$, and such that $uv \in E(G)$ if and only if the distance between $u$ and $v$ in $T$ is at most $k$. The graph classes of $k$-leaf powers have several applications in computational biology, but recognizing them has remained a challenging algorithmic problem for the past two decades. The best known result is that $6$-leaf powers can be recognized in polynomial time. In this paper, we present an algorithm that decides whether a graph $G$ is a $k$-leaf power in time $O(n^{f(k)})$ for some function $f$ that depends only on $k$ (but has the growth rate of a power tower function).
Our techniques are based on the fact that either a $k$-leaf power has a corresponding tree of low maximum degree, in which case finding it is easy, or every corresponding tree has large maximum degree. In the latter case, large degree vertices in the tree imply that $G$ has redundant substructures which can be pruned from the graph. In addition to solving a longstanding open problem, we hope that the structural results presented in this work can lead to further results on $k$-leaf powers.
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
Indirect Identification of Horizontal Gene Transfer
Authors:
David Schaller,
Manuel Lafond,
Peter F. Stadler,
Nicolas Wieseke,
Marc Hellmuth
Abstract:
Several implicit methods to infer Horizontal Gene Transfer (HGT) focus on pairs of genes that have diverged only after the divergence of the two species in which the genes reside. This situation defines the edge set of a graph, the later-divergence-time (LDT) graph, whose vertices correspond to genes colored by their species. We investigate these graphs in the setting of relaxed scenarios, i.e., e…
▽ More
Several implicit methods to infer Horizontal Gene Transfer (HGT) focus on pairs of genes that have diverged only after the divergence of the two species in which the genes reside. This situation defines the edge set of a graph, the later-divergence-time (LDT) graph, whose vertices correspond to genes colored by their species. We investigate these graphs in the setting of relaxed scenarios, i.e., evolutionary scenarios that encompass all commonly used variants of duplication-transfer-loss scenarios in the literature. We characterize LDT graphs as a subclass of properly vertex-colored cographs, and provide a polynomial-time recognition algorithm as well as an algorithm to construct a relaxed scenario that explains a given LDT. An edge in an LDT graph implies that the two corresponding genes are separated by at least one HGT event. The converse is not true, however. We show that the complete xenology relation is described by an rs-Fitch graph, i.e., a complete multipartite graph satisfying constraints on the vertex coloring. This class of vertex-colored graphs is also recognizable in polynomial time. We finally address the question "how much information about all HGT events is contained in LDT graphs" with the help of simulations of evolutionary scenarios with a wide range of duplication, loss, and HGT events. In particular, we show that a simple greedy graph editing scheme can be used to efficiently detect HGT events that are implicitly contained in LDT graphs.
△ Less
Submitted 6 April, 2021; v1 submitted 16 December, 2020;
originally announced December 2020.
-
Further results on Hendry's Conjecture
Authors:
Manuel Lafond,
Ben Seamone,
Rezvan Sherkati
Abstract:
Recently, a conjecture due to Hendry was disproved which stated that every Hamiltonian chordal graph is cycle extendible. Here we further explore the conjecture, showing that it fails to hold even when a number of extra conditions are imposed. In particular, we show that Hendry's Conjecture fails for strongly chordal graphs, graphs with high connectivity, and if we relax the definition of "cycle e…
▽ More
Recently, a conjecture due to Hendry was disproved which stated that every Hamiltonian chordal graph is cycle extendible. Here we further explore the conjecture, showing that it fails to hold even when a number of extra conditions are imposed. In particular, we show that Hendry's Conjecture fails for strongly chordal graphs, graphs with high connectivity, and if we relax the definition of "cycle extendible" considerably. We also consider the original conjecture from a subtree intersection model point of view, showing that a result of Abuieda et al is nearly best possible.
△ Less
Submitted 16 August, 2022; v1 submitted 14 July, 2020;
originally announced July 2020.
-
Comparing copy-number profiles under multi-copy amplifications and deletions
Authors:
Garance Cordonnier,
Manuel Lafond
Abstract:
During cancer progression, malignant cells accumulate somatic mutations that can lead to genetic aberrations. In particular, evolutionary events akin to segmental duplications or deletions can alter the copy-number profile (CNP) of a set of genes in a genome. Our aim is to compute the evolutionary distance between two cells for which only CNPs are known. This asks for the minimum number of segment…
▽ More
During cancer progression, malignant cells accumulate somatic mutations that can lead to genetic aberrations. In particular, evolutionary events akin to segmental duplications or deletions can alter the copy-number profile (CNP) of a set of genes in a genome. Our aim is to compute the evolutionary distance between two cells for which only CNPs are known. This asks for the minimum number of segmental amplifications and deletions to turn one CNP into another. This was recently formalized into a model where each event is assumed to alter a copy-number by $1$ or $-1$, even though these events can affect large portions of a chromosome. We propose a general cost framework where an event can modify the copy-number of a gene by larger amounts. We show that any cost scheme that allows segmental deletions of arbitrary length makes computing the distance strongly NP-hard. We then devise a factor $2$ approximation algorithm for the problem when copy-numbers are non-zero and provide an implementation called \textsf{cnp2cnp}. We evaluate our approach experimentally by reconstructing simulated cancer phylogenies from the pairwise distances inferred by \textsf{cnp2cnp} and compare it against two other alternatives, namely the \textsf{MEDICC} distance and the Euclidean distance. The experimental results show that our distance yields more accurate phylogenies on average than these alternatives if the given CNPs are error-free, but that the \textsf{MEDICC} distance is slightly more robust against error in the data. In all cases, our experiments show that either our approach or the \textsf{MEDICC} approach should preferred over the Euclidean distance.
△ Less
Submitted 25 February, 2020;
originally announced February 2020.
-
Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms
Authors:
Manuel Lafond,
Binhai Zhu,
Peng Zou
Abstract:
Recently, due to the genomic sequence analysis in several types of cancer, the genomic data based on {\em copy number profiles} ({\em CNP} for short) are getting more and more popular. A CNP is a vector where each component is a non-negative integer representing the number of copies of a specific gene or segment of interest.
In this paper, we present two streams of results. The first is the nega…
▽ More
Recently, due to the genomic sequence analysis in several types of cancer, the genomic data based on {\em copy number profiles} ({\em CNP} for short) are getting more and more popular. A CNP is a vector where each component is a non-negative integer representing the number of copies of a specific gene or segment of interest.
In this paper, we present two streams of results. The first is the negative results on two open problems regarding the computational complexity of the Minimum Copy Number Generation (MCNG) problem posed by Qingge et al. in 2018. It was shown by Qingge et al. that the problem is NP-hard if the duplications are tandem and they left the open question of whether the problem remains NP-hard if arbitrary duplications are used. We answer this question affirmatively in this paper; in fact, we prove that it is NP-hard to even obtain a constant factor approximation. We also prove that the parameterized version is W[1]-hard, answering another open question by Qingge et al.
The other result is positive and is based on a new (and more general) problem regarding CNP's. The \emph{Copy Number Profile Conforming (CNPC)} problem is formally defined as follows: given two CNP's $C_1$ and $C_2$, compute two strings $S_1$ and $S_2$ with $cnp(S_1)=C_1$ and $cnp(S_2)=C_2$ such that the distance between $S_1$ and $S_2$, $d(S_1,S_2)$, is minimized. Here, $d(S_1,S_2)$ is a very general term, which means it could be any genome rearrangement distance (like reversal, transposition, and tandem duplication, etc). We make the first step by showing that if $d(S_1,S_2)$ is measured by the breakpoint distance then the problem is polynomially solvable.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
Reconstruction of time-consistent species trees
Authors:
Manuel Lafond,
Marc Hellmuth
Abstract:
The history of gene families -- which are equivalent to event-labeled gene trees -- can to some extent be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are "biologically feasible" which is the case if one can find a species tree with which t…
▽ More
The history of gene families -- which are equivalent to event-labeled gene trees -- can to some extent be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are "biologically feasible" which is the case if one can find a species tree with which the gene tree can be reconciled in a time-consistent way.
In this contribution, we consider event-labeled gene trees that contain speciation, duplication as well as horizontal gene transfer and we assume that the species tree is unknown. We provide a cubic-time algorithm to decide whether a "time-consistent" binary species for a given event-labeled gene tree exists and, in the affirmative case, to construct the species tree within the same time-complexity.
△ Less
Submitted 29 October, 2019;
originally announced October 2019.
-
The Tandem Duplication Distance is NP-hard
Authors:
Manuel Lafond,
Binhai Zhu,
Peng Zou
Abstract:
In computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment - this can be represented as the string operation $AXB \Rightarrow AXXB$. For example, Tandem exon duplications have been found in many species such as human, fly or…
▽ More
In computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment - this can be represented as the string operation $AXB \Rightarrow AXXB$. For example, Tandem exon duplications have been found in many species such as human, fly or worm, and have been largely studied in computational biology. The Tandem Duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings $S$ and $T$ over the same alphabet, compute the smallest sequence of tandem duplications required to convert $S$ to $T$. The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that tandem duplications have received much attention ever since. In this paper, we prove that this problem is NP-hard. We further show that this hardness holds even if all characters of $S$ are distinct. This is known as the exemplar TD distance, which is of special relevance in bioinformatics. One of the tools we develop for the reduction is a new problem called the Cost-Effective Subgraph, for which we obtain W[1]-hardness results that might be of independent interest. We finally show that computing the exemplar TD distance between $S$ and $T$ is fixed-parameter tractable. Our results open the door to many other questions, and we conclude with several open problems.
△ Less
Submitted 12 June, 2019;
originally announced June 2019.
-
Distributed Pattern Formation in a Ring
Authors:
Anne-Laure Ehresmann,
Manuel Lafond,
Lata Narayanan,
Jaroslav Opatrny
Abstract:
Motivated by concerns about diversity in social networks, we consider the following pattern formation problems in rings. Assume $n$ mobile agents are located at the nodes of an $n$-node ring network. Each agent is assigned a colour from the set $\{c_1, c_2, \ldots, c_q \}$. The ring is divided into $k$ contiguous {\em blocks} or neighbourhoods of length $p$. The agents are required to rearrange th…
▽ More
Motivated by concerns about diversity in social networks, we consider the following pattern formation problems in rings. Assume $n$ mobile agents are located at the nodes of an $n$-node ring network. Each agent is assigned a colour from the set $\{c_1, c_2, \ldots, c_q \}$. The ring is divided into $k$ contiguous {\em blocks} or neighbourhoods of length $p$. The agents are required to rearrange themselves in a distributed manner to satisfy given diversity requirements: in each block $j$ and for each colour $c_i$, there must be exactly $n_i(j) >0$ agents of colour $c_i$ in block $j$. Agents are assumed to be able to see agents in adjacent blocks, and move to any position in adjacent blocks in one time step. When the number of colours $q=2$, we give an algorithm that terminates in time $N_1/n^*_1 + k + 4$ where $N_1$ is the total number of agents of colour $c_1$ and $n^*_1$ is the minimum number of agents of colour $c_1$ required in any block. When the diversity requirements are the same in every block, our algorithm requires $3k+4$ steps, and is asymptotically optimal. Our algorithm generalizes for an arbitrary number of colours, and terminates in $O(nk)$ steps. We also show how to extend it to achieve arbitrary specific final patterns, provided there is at least one agent of every colour in every pattern.
△ Less
Submitted 21 May, 2019;
originally announced May 2019.
-
Time-Energy Tradeoffs for Evacuation by Two Robots in the Wireless Model
Authors:
Jurek Czyzowicz,
Konstantinos Georgiou,
Ryan Killick,
Evangelos Kranakis,
Danny Krizanc,
Manuel Lafond,
Lata Narayanan,
Jaroslav Opatrny,
Sunil Shende
Abstract:
Two robots stand at the origin of the infinite line and are tasked with searching collaboratively for an exit at an unknown location on the line. They can travel at maximum speed $b$ and can change speed or direction at any time. The two robots can communicate with each other at any distance and at any time. The task is completed when the last robot arrives at the exit and evacuates. We study time…
▽ More
Two robots stand at the origin of the infinite line and are tasked with searching collaboratively for an exit at an unknown location on the line. They can travel at maximum speed $b$ and can change speed or direction at any time. The two robots can communicate with each other at any distance and at any time. The task is completed when the last robot arrives at the exit and evacuates. We study time-energy tradeoffs for the above evacuation problem. The evacuation time is the time it takes the last robot to reach the exit. The energy it takes for a robot to travel a distance $x$ at speed $s$ is measured as $xs^2$. The total and makespan evacuation energies are respectively the sum and maximum of the energy consumption of the two robots while executing the evacuation algorithm.
Assuming that the maximum speed is $b$, and the evacuation time is at most $cd$, where $d$ is the distance of the exit from the origin, we study the problem of minimizing the total energy consumption of the robots. We prove that the problem is solvable only for $bc \geq 3$. For the case $bc=3$, we give an optimal algorithm, and give upper bounds on the energy for the case $bc>3$.
We also consider the problem of minimizing the evacuation time when the available energy is bounded by $Δ$. Surprisingly, when $Δ$ is a constant, independent of the distance $d$ of the exit from the origin, we prove that evacuation is possible in time $O(d^{3/2}\log d)$, and this is optimal up to a logarithmic factor. When $Δ$ is linear in $d$, we give upper bounds on the evacuation time.
△ Less
Submitted 16 May, 2019;
originally announced May 2019.
-
Energy Consumption of Group Search on a Line
Authors:
Jurek Czyzowicz,
Konstantinos Georgiou,
Ryan Killick,
Evangelos Kranakis,
Danny Krizanc,
Manuel Lafond,
Lata Narayanan,
Jaroslav Opatrny,
Sunil Shende
Abstract:
Consider two robots that start at the origin of the infinite line in search of an exit at an unknown location on the line. The robots can only communicate if they arrive at the same location at exactly the same time, i.e. they use the so-called face-to-face communication model. The group search time is defined as the worst-case time as a function of $d$, the distance of the exit from the origin, w…
▽ More
Consider two robots that start at the origin of the infinite line in search of an exit at an unknown location on the line. The robots can only communicate if they arrive at the same location at exactly the same time, i.e. they use the so-called face-to-face communication model. The group search time is defined as the worst-case time as a function of $d$, the distance of the exit from the origin, when both robots can reach the exit. It has long been known that for a single robot traveling at unit speed, the search time is at least $9d-o(d)$. It was shown recently that $k\geq2$ robots traveling at unit speed also require at least $9d$ group search time.
We investigate energy-time trade-offs in group search by two robots, where the energy loss experienced by a robot traveling a distance $x$ at constant speed $s$ is given by $s^2 x$. Specifically, we consider the problem of minimizing the total energy used by the robots, under the constraints that the search time is at most a multiple $c$ of the distance $d$ and the speed of the robots is bounded by $b$. Motivation for this study is that for the case when robots must complete the search in $9d$ time with maximum speed one, a single robot requires at least $9d$ energy, while for two robots, all previously proposed algorithms consume at least $28d/3$ energy.
When the robots have bounded memory, we generalize existing algorithms to obtain a family of optimal (and in some cases nearly optimal) algorithms parametrized by pairs of $b,c$ values that can solve the problem for the entire spectrum of these pairs for which the problem is solvable. We also propose a novel search algorithm, with unbounded memory, that simultaneously achieves search time $9d$ and consumes energy $8.42588d$. Our result shows that two robots can search on the line in optimal time $9d$ while consuming less total energy than a single robot within the same search time.
△ Less
Submitted 21 April, 2019;
originally announced April 2019.
-
Reconciling Multiple Genes Trees via Segmental Duplications and Losses
Authors:
Riccardo Dondi,
Manuel Lafond,
Celine Scornavacca
Abstract:
Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. In this paper, we extend a previous approach to reconcile a set of gene trees with a species tree based on segmental macro-evolutionary even…
▽ More
Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. In this paper, we extend a previous approach to reconcile a set of gene trees with a species tree based on segmental macro-evolutionary events, where segmental duplication events and losses are associated with cost $δ$ and $λ$, respectively. We show that the problem is polynomial-time solvable when $δ\leq λ$ (via LCA-map**), while if $δ> λ$ the problem is NP-hard, even when $λ= 0$ and a single gene tree is given, solving a long standing open problem on the complexity of the reconciliation problem. On the positive side, we give a fixed-parameter algorithm for the problem, where the parameters are $δ/λ$ and the number $d$ of segmental duplications, of time complexity $O(\lceil \fracδλ \rceil^{d} \cdot n \cdot \fracδλ)$. Finally, we demonstrate the usefulness of this algorithm on two previously studied real datasets: we first show that our method can be used to confirm or refute hypothetical segmental duplications on a set of 16 eukaryotes, then show how we can detect whole genome duplications in yeast genomes.
△ Less
Submitted 11 June, 2018;
originally announced June 2018.
-
The complexity of comparing multiply-labelled trees by extending phylogenetic-tree metrics
Authors:
Manuel Lafond,
Nadia El-Mabrouk,
Katharina T. Huber,
Vincent Moulton
Abstract:
A multilabeled tree (or MUL-tree) is a rooted tree in which every leaf is labelled by an element from some set, but in which more than one leaf may be labelled by the same element of that set. In phylogenetics, such trees are used in biogeographical studies, to study the evolution of gene families, and also within approaches to construct phylogenetic networks. A multilabelled tree in which no leaf…
▽ More
A multilabeled tree (or MUL-tree) is a rooted tree in which every leaf is labelled by an element from some set, but in which more than one leaf may be labelled by the same element of that set. In phylogenetics, such trees are used in biogeographical studies, to study the evolution of gene families, and also within approaches to construct phylogenetic networks. A multilabelled tree in which no leaf-labels are repeated is called a phylogenetic tree, and one in which every label is the same is also known as a tree-shape. In this paper, we consider the complexity of computing metrics on MUL-trees that are obtained by extending metrics on phylogenetic trees. In particular, by restricting our attention to tree shapes, we show that computing the metric extension on MUL-trees is NP complete for two well-known metrics on phylogenetic trees, namely, the path-difference and Robinson Foulds distances. We also show that the extension of the Robinson Foulds distance is fixed parameter tractable with respect to the distance parameter. The path distance complexity result allows us to also answer an open problem concerning the complexity of solving the quadratic assignment problem for two matrices that are a Robinson similarity and a Robinson dissimilarity, which we show to be NP-complete. We conclude by considering the maximum agreement subtree (MAST) distance on phylogenetic trees to MUL-trees. Although its extension to MUL-trees can be computed in polynomial time, we show that computing its natural generalization to more than two MUL-trees is NP-complete, although fixed-parameter tractable in the maximum degree when the number of given trees is bounded.
△ Less
Submitted 15 March, 2018;
originally announced March 2018.
-
Constructing a Consensus Phylogeny from a Leaf-Removal Distance
Authors:
Cedric Chauve,
Mark Jones,
Manuel Lafond,
Céline Scornavacca,
Mathias Weller
Abstract:
Understanding the evolution of a set of genes or species is a fundamental problem in evolutionary biology. The problem we study here takes as input a set of trees describing {possibly discordant} evolutionary scenarios for a given set of genes or species, and aims at finding a single tree that minimizes the leaf-removal distance to the input trees. This problem is a specific instance of the genera…
▽ More
Understanding the evolution of a set of genes or species is a fundamental problem in evolutionary biology. The problem we study here takes as input a set of trees describing {possibly discordant} evolutionary scenarios for a given set of genes or species, and aims at finding a single tree that minimizes the leaf-removal distance to the input trees. This problem is a specific instance of the general consensus/supertree problem, widely used to combine or summarize discordant evolutionary trees. The problem we introduce is specifically tailored to address the case of discrepancies between the input trees due to the misplacement of individual taxa. Most supertree or consensus tree problems are computationally intractable, and we show that the problem we introduce is also NP-hard. We provide tractability results in form of a 2-approximation algorithm. We also introduce a variant that minimizes the maximum number $d$ of leaves that are removed from any input tree, and provide a parameterized algorithm for this problem with parameter $d$.
△ Less
Submitted 8 July, 2019; v1 submitted 15 May, 2017;
originally announced May 2017.
-
Consistency of orthology and paralogy constraints in the presence of gene transfers
Authors:
Mark Jones,
Manuel Lafond,
Celine Scornavacca
Abstract:
Orthology and paralogy relations are often inferred by methods based on gene similarity, which usually yield a graph depicting the relationships between gene pairs. Such relation graphs are known to frequently contain errors, as they cannot be explained via a gene tree that both contains the depicted orthologs/paralogs, and that is consistent with a species tree $S$. This idea of detecting errors…
▽ More
Orthology and paralogy relations are often inferred by methods based on gene similarity, which usually yield a graph depicting the relationships between gene pairs. Such relation graphs are known to frequently contain errors, as they cannot be explained via a gene tree that both contains the depicted orthologs/paralogs, and that is consistent with a species tree $S$. This idea of detecting errors through inconsistency with a species tree has mostly been studied in the presence of speciation and duplication events only. In this work, we ask: could the given set of relations be consistent if we allow lateral gene transfers in the evolutionary model? We formalize this question and provide a variety of algorithmic results regarding the underlying problems. Namely, we show that deciding if a relation graph $R$ is consistent with a given species network $N$ is NP-hard, and that it is W[1]-hard under the parameter "minimum number of transfers". However, we present an FPT algorithm based on the degree of the $DS$-tree associated with $R$. We also study analogous problems in the case that the transfer highways on a species tree are unknown.
△ Less
Submitted 15 February, 2022; v1 submitted 2 May, 2017;
originally announced May 2017.
-
On strongly chordal graphs that are not leaf powers
Authors:
Manuel Lafond
Abstract:
A common task in phylogenetics is to find an evolutionary tree representing proximity relationships between species. This motivates the notion of leaf powers: a graph G = (V, E) is a leaf power if there exist a tree T on leafset V and a threshold k such that uv is an edge if and only if the distance between u and v in T is at most k. Characterizing leaf powers is a challenging open problem, along…
▽ More
A common task in phylogenetics is to find an evolutionary tree representing proximity relationships between species. This motivates the notion of leaf powers: a graph G = (V, E) is a leaf power if there exist a tree T on leafset V and a threshold k such that uv is an edge if and only if the distance between u and v in T is at most k. Characterizing leaf powers is a challenging open problem, along with determining the complexity of their recognition. This is in part due to the fact that few graphs are known to not be leaf powers, as such graphs are difficult to construct. Recently, Nevries and Rosenke asked if leaf powers could be characterized by strong chordality and a finite set of forbidden subgraphs.
In this paper, we provide a negative answer to this question, by exhibiting an infinite family \G of (minimal) strongly chordal graphs that are not leaf powers. During the process, we establish a connection between leaf powers, alternating cycles and quartet compatibility. We also show that deciding if a chordal graph is \G-free is NP-complete, which may provide insight on the complexity of the leaf power recognition problem.
△ Less
Submitted 2 July, 2017; v1 submitted 23 March, 2017;
originally announced March 2017.
-
Weak Coverage of a Rectangular Barrier
Authors:
Stefan Dobrev,
Evangelos Kranakis,
Danny Krizanc,
Manuel Lafond,
Jan Manuch,
Lata Narayanan,
Jaroslav Opatrny,
Ladislav Stacho
Abstract:
Assume n wireless mobile sensors are initially dispersed in an ad hoc manner in a rectangular region. They are required to move to final locations so that they can detect any intruder crossing the region in a direction parallel to the sides of the rectangle, and thus provide weak barrier coverage of the region. We study three optimization problems related to the movement of sensors to achieve weak…
▽ More
Assume n wireless mobile sensors are initially dispersed in an ad hoc manner in a rectangular region. They are required to move to final locations so that they can detect any intruder crossing the region in a direction parallel to the sides of the rectangle, and thus provide weak barrier coverage of the region. We study three optimization problems related to the movement of sensors to achieve weak barrier coverage: minimizing the number of sensors moved (MinNum), minimizing the average distance moved by the sensors (MinSum), and minimizing the maximum distance moved by the sensors (MinMax). We give an O(n^{3/2}) time algorithm for the MinNum problem for sensors of diameter 1 that are initially placed at integer positions; in contrast we show that the problem is NP-hard even for sensors of diameter 2 that are initially placed at integer positions. We show that the MinSum problem is solvable in O(n log n) time for homogeneous range sensors in arbitrary initial positions, while it is NP-hard for heterogeneous sensor ranges. Finally, we prove that even very restricted homogeneous versions of the MinMax problem are NP-hard.
△ Less
Submitted 25 January, 2017;
originally announced January 2017.
-
Whom to befriend to influence people
Authors:
Gennaro Cordasco,
Luisa Gargano,
Manuel Lafond,
Lata Narayanan,
Adele A. Rescigno,
Ugo Vaccaro,
Kangkang Wu
Abstract:
Alice wants to join a new social network, and influence its members to adopt a new product or idea. Each person $v$ in the network has a certain threshold $t(v)$ for {\em activation}, i.e adoption of the product or idea. If $v$ has at least $t(v)$ activated neighbors, then $v$ will also become activated. If Alice wants to activate the entire social network, whom should she befriend? More generally…
▽ More
Alice wants to join a new social network, and influence its members to adopt a new product or idea. Each person $v$ in the network has a certain threshold $t(v)$ for {\em activation}, i.e adoption of the product or idea. If $v$ has at least $t(v)$ activated neighbors, then $v$ will also become activated. If Alice wants to activate the entire social network, whom should she befriend? More generally, we study the problem of finding the minimum number of links that a set of external influencers should form to people in the network, in order to activate the entire social network. This {\em Minimum Links} Problem has applications in viral marketing and the study of epidemics. Its solution can be quite different from the related and widely studied Target Set Selection problem. We prove that the Minimum Links problem cannot be approximated to within a ratio of $O(2^{\log^{1-ε} n})$, for any fixed $ε>0$, unless $NP\subseteq DTIME(n^{polylog(n)})$, where $n$ is the number of nodes in the network. On the positive side, we give linear time algorithms to solve the problem for trees, cycles, and cliques, for any given set of external influencers, and give precise bounds on the number of links needed. For general graphs, we design a polynomial time algorithm to compute size-efficient link sets that can activate the entire graph.
△ Less
Submitted 29 November, 2016; v1 submitted 26 November, 2016;
originally announced November 2016.
-
Reconstructing protein and gene phylogenies by extending the framework of reconciliation
Authors:
Esaie Kuitche,
Manuel Lafond,
Aïda Ouangraoua
Abstract:
The architecture of eukaryotic coding genes allows the production of several different protein isoforms by genes. Current gene phylogeny reconstruction methods make use of a single protein product per gene, ignoring information on alternative protein isoforms. These methods often lead to inaccurate gene tree reconstructions that require to be corrected before being used in phylogenetic tree reconc…
▽ More
The architecture of eukaryotic coding genes allows the production of several different protein isoforms by genes. Current gene phylogeny reconstruction methods make use of a single protein product per gene, ignoring information on alternative protein isoforms. These methods often lead to inaccurate gene tree reconstructions that require to be corrected before being used in phylogenetic tree reconciliation analyses or gene products phylogeny reconstructions. Here, we propose a new approach for the reconstruction of accurate gene trees and protein trees accounting for the production of alternative protein isoforms by the genes of a gene family. We extend the concept of reconciliation to protein trees, and we define a new reconciliation problem called MinDRGT that consists in finding a gene tree that minimizes a double reconciliation cost with a given protein tree and a given species tree. We define a second problem called MinDRPGT that consists in finding a protein tree and a gene tree minimizing a double reconciliation cost, given a species tree and a set of protein subtrees. We provide algorithmic exact and heuristic solutions for some versions of the problems, and we present the results of an application to the correction of gene trees from the Ensembl database. An implementation of the heuristic method is available at https://github.com/UdeS-CoBIUS/Protein2GeneTree.
△ Less
Submitted 3 July, 2017; v1 submitted 30 October, 2016;
originally announced October 2016.
-
Gene Tree Construction and Correction using SuperTree and Reconciliation
Authors:
Manuel Lafond,
Cédric Chauve,
Nadia El-Mabrouk,
Aïda Ouangraoua
Abstract:
The supertree problem asking for a tree displaying a set of consistent input trees has been largely considered for the reconstruction of species trees. Here, we rather explore this framework for the sake of reconstructing a gene tree from a set of input gene trees on partial data. In this perspective, the phylogenetic tree for the species containing the genes of interest can be used to choose amon…
▽ More
The supertree problem asking for a tree displaying a set of consistent input trees has been largely considered for the reconstruction of species trees. Here, we rather explore this framework for the sake of reconstructing a gene tree from a set of input gene trees on partial data. In this perspective, the phylogenetic tree for the species containing the genes of interest can be used to choose among the many possible compatible "supergenetrees", the most natural criteria being to minimize a reconciliation cost. We develop a variety of algorithmic solutions for the construction and correction of gene trees using the supertree framework. A dynamic programming supertree algorithm for constructing or correcting gene trees, exponential in the number of input trees, is first developed for the less constrained version of the problem. It is then adapted to gene trees with nodes labeled as duplication or speciation, the additional constraint being to preserve the orthology and paralogy relations between genes. Then, a quadratic time algorithm is developed for efficiently correcting an initial gene tree while preserving a set of "trusted" subtrees, as well as the relative phylogenetic distance between them, in both cases of labeled or unlabeled input trees. By applying these algorithms to the set of Ensembl gene trees, we show that this new correction framework is particularly useful to correct weaklysupported duplication nodes. The C++ source code for the algorithms and simulations described in the paper are available at https://github.com/UdeM-LBIT/SuGeT.
△ Less
Submitted 21 October, 2016; v1 submitted 17 October, 2016;
originally announced October 2016.
-
On the Weighted Quartet Consensus problem
Authors:
Manuel Lafond,
Céline Scornavacca
Abstract:
In phylogenetics, the consensus problem consists in summarizing a set of phylogenetic trees that all classify the same set of species into a single tree. Several definitions of consensus exist in the literature; in this paper we focus on the Weighted Quartet Consensus problem, a problem with unknown complexity status so far. Here we prove that the Weighted Quartet Consensus problem is NP-hard and…
▽ More
In phylogenetics, the consensus problem consists in summarizing a set of phylogenetic trees that all classify the same set of species into a single tree. Several definitions of consensus exist in the literature; in this paper we focus on the Weighted Quartet Consensus problem, a problem with unknown complexity status so far. Here we prove that the Weighted Quartet Consensus problem is NP-hard and we give a 1/2-factor approximation for this problem. During the process, we propose a derandomization procedure of a previously known randomized 1/3-factor approximation. We also investigate the fixed-parameter tractability of this problem.
△ Less
Submitted 10 May, 2017; v1 submitted 3 October, 2016;
originally announced October 2016.
-
The SCJ small parsimony problem for weighted gene adjacencies (Extended version)
Authors:
Nina Luhmann,
Manuel Lafond,
Annelyse Thévenin,
Aïda Ouangraoua,
Roland Wittler,
Cedric Chauve
Abstract:
Reconstructing ancestral gene orders in a given phylogeny is a classical problem in comparative genomics. Most existing methods compare conserved features in extant genomes in the phylogeny to define potential ancestral gene adjacencies, and either try to reconstruct all ancestral genomes under a global evolutionary parsimony criterion, or, focusing on a single ancestral genome, use a scaffolding…
▽ More
Reconstructing ancestral gene orders in a given phylogeny is a classical problem in comparative genomics. Most existing methods compare conserved features in extant genomes in the phylogeny to define potential ancestral gene adjacencies, and either try to reconstruct all ancestral genomes under a global evolutionary parsimony criterion, or, focusing on a single ancestral genome, use a scaffolding approach to select a subset of ancestral gene adjacencies, generally aiming at reducing the fragmentation of the reconstructed ancestral genome. In this paper, we describe an exact algorithm for the Small Parsimony Problem that combines both approaches. We consider that gene adjacencies at internal nodes of the species phylogeny are weighted, and we introduce an objective function defined as a convex combination of these weights and the evolutionary cost under the Single-Cut-or-Join (SCJ) model. The weights of ancestral gene adjacencies can e.g. be obtained through the recent availability of ancient DNA sequencing data, which provide a direct hint at the genome structure of the considered ancestor, or through probabilistic analysis of gene adjacencies evolution. We show the NP-hardness of our problem variant and propose a Fixed-Parameter Tractable algorithm based on the Sankoff-Rousseau dynamic programming algorithm that also allows to sample co-optimal solutions. We apply our approach to mammalian and bacterial data providing different degrees of complexity. We show that including adjacency weights in the objective has a significant impact in reducing the fragmentation of the reconstructed ancestral gene orders.
△ Less
Submitted 26 September, 2016; v1 submitted 29 March, 2016;
originally announced March 2016.
-
Hamiltonian chordal graphs are not cycle extendible
Authors:
Manuel Lafond,
Ben Seamone
Abstract:
In 1990, Hendry conjectured that every Hamiltonian chordal graph is cycle extendible; that is, the vertices of any non-Hamiltonian cycle are contained in a cycle of length one greater. We disprove this conjecture by constructing counterexamples on $n$ vertices for any $n \geq 15$. Furthermore, we show that there exist counterexamples where the ratio of the length of a non-extendible cycle to the t…
▽ More
In 1990, Hendry conjectured that every Hamiltonian chordal graph is cycle extendible; that is, the vertices of any non-Hamiltonian cycle are contained in a cycle of length one greater. We disprove this conjecture by constructing counterexamples on $n$ vertices for any $n \geq 15$. Furthermore, we show that there exist counterexamples where the ratio of the length of a non-extendible cycle to the total number of vertices can be made arbitrarily small. We then consider cycle extendibility in Hamiltonian chordal graphs where certain induced subgraphs are forbidden, notably $P_n$ and the bull.
△ Less
Submitted 3 December, 2014; v1 submitted 22 November, 2013;
originally announced November 2013.