-
Locating a Tree in a Phylogenetic Network in Quadratic Time
Authors:
Philippe Gambette,
Andreas D. M. Gunawan,
Anthony Labarre,
Stéphane Vialette,
Louxin Zhang
Abstract:
A fundamental problem in the study of phylogenetic networks is to determine whether or not a given phylogenetic network contains a given phylogenetic tree. We develop a quadratic-time algorithm for this problem for binary nearly-stable phylogenetic networks. We also show that the number of reticulations in a reticulation visible or nearly stable phylogenetic network is bounded from above by a func…
▽ More
A fundamental problem in the study of phylogenetic networks is to determine whether or not a given phylogenetic network contains a given phylogenetic tree. We develop a quadratic-time algorithm for this problem for binary nearly-stable phylogenetic networks. We also show that the number of reticulations in a reticulation visible or nearly stable phylogenetic network is bounded from above by a function linear in the number of taxa.
△ Less
Submitted 11 February, 2015;
originally announced February 2015.
-
Flexible RNA design under structure and sequence constraints using formal languages
Authors:
Yu Zhou,
Yann Ponty,
Stéphane Vialette,
Jérôme Waldispühl,
Yi Zhang,
Alain Denise
Abstract:
The problem of RNA secondary structure design (also called inverse folding) is the following: given a target secondary structure, one aims to create a sequence that folds into, or is compatible with, a given structure. In several practical applications in biology, additional constraints must be taken into account, such as the presence/absence of regulatory motifs, either at a specific location or…
▽ More
The problem of RNA secondary structure design (also called inverse folding) is the following: given a target secondary structure, one aims to create a sequence that folds into, or is compatible with, a given structure. In several practical applications in biology, additional constraints must be taken into account, such as the presence/absence of regulatory motifs, either at a specific location or anywhere in the sequence. In this study, we investigate the design of RNA sequences from their targeted secondary structure, given these additional sequence constraints. To this purpose, we develop a general framework based on concepts of language theory, namely context-free grammars and finite automata. We efficiently combine a comprehensive set of constraints into a unifying context-free grammar of moderate size. From there, we use generic generic algorithms to perform a (weighted) random generation, or an exhaustive enumeration, of candidate sequences. The resulting method, whose complexity scales linearly with the length of the RNA, was implemented as a standalone program. The resulting software was embedded into a publicly available dedicated web server. The applicability demonstrated of the method on a concrete case study dedicated to Exon Splicing Enhancers, in which our approach was successfully used in the design of \emph{in vitro} experiments.
△ Less
Submitted 1 August, 2013; v1 submitted 16 May, 2013;
originally announced May 2013.
-
Comparing RNA structures using a full set of biologically relevant edit operations is intractable
Authors:
Guillaume Blin,
Sylvie Hamel,
Stéphane Vialette
Abstract:
Arc-annotated sequences are useful for representing structural information of RNAs and have been extensively used for comparing RNA structures in both terms of sequence and structural similarities. Among the many paradigms referring to arc-annotated sequences and RNA structures comparison (see \cite{IGMA_BliDenDul08} for more details), the most important one is the general edit distance. The pro…
▽ More
Arc-annotated sequences are useful for representing structural information of RNAs and have been extensively used for comparing RNA structures in both terms of sequence and structural similarities. Among the many paradigms referring to arc-annotated sequences and RNA structures comparison (see \cite{IGMA_BliDenDul08} for more details), the most important one is the general edit distance. The problem of computing an edit distance between two non-crossing arc-annotated sequences was introduced in \cite{Evans99}. The introduced model uses edit operations that involve either single letters or pairs of letters (never considered separately) and is solvable in polynomial-time \cite{ZhangShasha:1989}. To account for other possible RNA structural evolutionary events, new edit operations, allowing to consider either silmutaneously or separately letters of a pair were introduced in \cite{jiangli}; unfortunately at the cost of computational tractability. It has been proved that comparing two RNA secondary structures using a full set of biologically relevant edit operations is {\sf\bf NP}-complete. Nevertheless, in \cite{DBLP:conf/spire/GuignonCH05}, the authors have used a strong combinatorial restriction in order to compare two RNA stem-loops with a full set of biologically relevant edit operations; which have allowed them to design a polynomial-time and space algorithm for comparing general secondary RNA structures. In this paper we will prove theoretically that comparing two RNA structures using a full set of biologically relevant edit operations cannot be done without strong combinatorial restrictions.
△ Less
Submitted 20 December, 2008;
originally announced December 2008.
-
On the Approximability of Comparing Genomes with Duplicates
Authors:
Sébastien Angibaud,
Guillaume Fertin,
Irena Rusu,
Annelyse Thevenin,
Stéphane Vialette
Abstract:
A central problem in comparative genomics consists in computing a (dis-)similarity measure between two genomes, e.g. in order to construct a phylogeny. All the existing measures are defined on genomes without duplicates. However, we know that genes can be duplicated within the same genome. One possible approach to overcome this difficulty is to establish a one-to-one correspondence (i.e. a match…
▽ More
A central problem in comparative genomics consists in computing a (dis-)similarity measure between two genomes, e.g. in order to construct a phylogeny. All the existing measures are defined on genomes without duplicates. However, we know that genes can be duplicated within the same genome. One possible approach to overcome this difficulty is to establish a one-to-one correspondence (i.e. a matching) between genes of both genomes, where the correspondence is chosen in order to optimize the studied measure. In this paper, we are interested in three measures (number of breakpoints, number of common intervals and number of conserved intervals) and three models of matching (exemplar, intermediate and maximum matching models). We prove that, for each model and each measure M, computing a matching between two genomes that optimizes M is APX-hard. We also study the complexity of the following problem: is there an exemplarization (resp. an intermediate/maximum matching) that induces no breakpoint? We prove the problem to be NP-Complete in the exemplar model for a new class of instances, and we show that the problem is in P in the maximum matching model. We also focus on a fourth measure: the number of adjacencies, for which we give several approximation algorithms in the maximum matching model, in the case where genomes contain the same number of duplications of each gene.
△ Less
Submitted 6 June, 2008;
originally announced June 2008.
-
On restrictions of balanced 2-interval graphs
Authors:
Philippe Gambette,
Stéphane Vialette
Abstract:
The class of 2-interval graphs has been introduced for modelling scheduling and allocation problems, and more recently for specific bioinformatic problems. Some of those applications imply restrictions on the 2-interval graphs, and justify the introduction of a hierarchy of subclasses of 2-interval graphs that generalize line graphs: balanced 2-interval graphs, unit 2-interval graphs, and (x,x)-…
▽ More
The class of 2-interval graphs has been introduced for modelling scheduling and allocation problems, and more recently for specific bioinformatic problems. Some of those applications imply restrictions on the 2-interval graphs, and justify the introduction of a hierarchy of subclasses of 2-interval graphs that generalize line graphs: balanced 2-interval graphs, unit 2-interval graphs, and (x,x)-interval graphs. We provide instances that show that all the inclusions are strict. We extend the NP-completeness proof of recognizing 2-interval graphs to the recognition of balanced 2-interval graphs. Finally we give hints on the complexity of unit 2-interval graphs recognition, by studying relationships with other graph classes: proper circular-arc, quasi-line graphs, K_{1,5}-free graphs, ...
△ Less
Submitted 11 June, 2007; v1 submitted 12 April, 2007;
originally announced April 2007.