-
An FTP Algorithm for Temporal Graph Untangling
Authors:
Riccardo Dondi,
Manuel Lafond
Abstract:
Several classical combinatorial problems have been considered and analysed on temporal graphs. Recently, a variant of Vertex Cover on temporal graphs, called MinTimelineCover, has been introduced to summarize timeline activities in social networks. The problem asks to cover every temporal edge while minimizing the total span of the vertices (where the span of a vertex is the length of the timestam…
▽ More
Several classical combinatorial problems have been considered and analysed on temporal graphs. Recently, a variant of Vertex Cover on temporal graphs, called MinTimelineCover, has been introduced to summarize timeline activities in social networks. The problem asks to cover every temporal edge while minimizing the total span of the vertices (where the span of a vertex is the length of the timestamp interval it must remain active in, minus one). While the problem has been shown to be NP-hard even in very restricted cases, its parameterized complexity has not been fully understood. The problem is known to be in FPT under the span parameter only for graphs with two timestamps, but the parameterized complexity for the general case is open. We settle this open problem by giving an FPT algorithm that is based on a combination of iterative compression and a reduction to the Digraph Pair Cut problem, a powerful problem that has received significant attention recently.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Finding Colorful Paths in Temporal Graphs
Authors:
Riccardo Dondi,
Mohammad Mehdi Hosseinzadeh
Abstract:
The problem of finding paths in temporal graphs has been recently considered due to its many applications. In this paper we consider a variant of the problem that, given a vertex-colored temporal graph, asks for a path whose vertices have distinct colors and include the maximum number of colors. We study the approximation complexity of the problem and we provide an inapproximability lower bound. T…
▽ More
The problem of finding paths in temporal graphs has been recently considered due to its many applications. In this paper we consider a variant of the problem that, given a vertex-colored temporal graph, asks for a path whose vertices have distinct colors and include the maximum number of colors. We study the approximation complexity of the problem and we provide an inapproximability lower bound. Then we present a heuristic for the problem and an experimental evaluation of our heuristic, both on synthetic and real-world graphs.
△ Less
Submitted 3 September, 2021;
originally announced September 2021.
-
The Longest Run Subsequence Problem: Further Complexity Results
Authors:
Riccardo Dondi,
Florian Sikora
Abstract:
Longest Run Subsequence is a problem introduced recently in the context of the scaffolding phase of genome assembly (Schrinner et al., WABI 2020). The problem asks for a maximum length subsequence of a given string that contains at most one run for each symbol (a run is a maximum substring of consecutive identical symbols). The problem has been shown to be NP-hard and to be fixed-parameter tractab…
▽ More
Longest Run Subsequence is a problem introduced recently in the context of the scaffolding phase of genome assembly (Schrinner et al., WABI 2020). The problem asks for a maximum length subsequence of a given string that contains at most one run for each symbol (a run is a maximum substring of consecutive identical symbols). The problem has been shown to be NP-hard and to be fixed-parameter tractable when the parameter is the size of the alphabet on which the input string is defined. In this paper we further investigate the complexity of the problem and we show that it is fixed-parameter tractable when it is parameterized by the number of runs in a solution, a smaller parameter. Moreover, we investigate the kernelization complexity of Longest Run Subsequence and we prove that it does not admit a polynomial kernel when parameterized by the size of the alphabet or by the number of runs. Finally, we consider the restriction of Longest Run Subsequence when each symbol has at most two occurrences in the input string and we show that it is APX-hard.
△ Less
Submitted 22 June, 2021; v1 submitted 16 November, 2020;
originally announced November 2020.
-
Top-k Connected Overlap** Densest Subgraphs in Dual Networks
Authors:
Riccardo Dondi,
Pietro Hiram Guzzi,
Mohammad Mehdi Hosseinzadeh
Abstract:
Networks are largely used for modelling and analysing data and relations among them. Recently, it has been shown that the use of a single network may not be the optimal choice, since a single network may misses some aspects. Consequently, it has been proposed to use a pair of networks to better model all the aspects, and the main approach is referred to as dual networks (DNs). DNs are two related…
▽ More
Networks are largely used for modelling and analysing data and relations among them. Recently, it has been shown that the use of a single network may not be the optimal choice, since a single network may misses some aspects. Consequently, it has been proposed to use a pair of networks to better model all the aspects, and the main approach is referred to as dual networks (DNs). DNs are two related graphs (one weighted, the other unweighted) that share the same set of vertices and two different edge sets. In DNs is often interesting to extract common subgraphs among the two networks that are maximally dense in the conceptual network and connected in the physical one. The simplest instance of this problem is finding a common densest connected subgraph (DCS), while we here focus on the detection of the Top-k Densest Connected subgraphs, i.e. a set k subgraphs having the largest density in the conceptual network which are also connected in the physical network. We formalise the problem and then we propose a heuristic to find a solution, since the problem is computationally hard. A set of experiments on synthetic and real networks is also presented to support our approach.
△ Less
Submitted 4 August, 2020;
originally announced August 2020.
-
Computing the k Densest Subgraphs of a Graph
Authors:
Riccardo Dondi,
Danny Hermelin
Abstract:
Computing cohesive subgraphs is a central problem in graph theory. While many formulations of cohesive subgraphs lead to NP-hard problems, finding a densest subgraph can be done in polynomial time. As such, the densest subgraph model has emerged as the most popular notion of cohesiveness. Recently, the data mining community has started looking into the problem of computing k densest subgraphs in a…
▽ More
Computing cohesive subgraphs is a central problem in graph theory. While many formulations of cohesive subgraphs lead to NP-hard problems, finding a densest subgraph can be done in polynomial time. As such, the densest subgraph model has emerged as the most popular notion of cohesiveness. Recently, the data mining community has started looking into the problem of computing k densest subgraphs in a given graph, rather than one, with various restrictions on the possible overlap between the subgraphs. However, there seems to be very little known on this important and natural generalization from a theoretical perspective. In this paper we hope to remedy this situation by analyzing three natural variants of the k densest subgraphs problem. Each variant differs depending on the amount of overlap that is allowed between the subgraphs. In one extreme, when no overlap is allowed, we prove that the problem is NP-hard for k >= 3. On the other extreme, when overlap is allowed without any restrictions and the solution subgraphs only have to be distinct, we show that the problem is fixed-parameter tractable with respect to k, and admits a PTAS for constant k. Finally, when a limited of overlap is allowed between the subgraphs, we prove that the problem is NP-hard for k = 2.
△ Less
Submitted 23 November, 2021; v1 submitted 18 February, 2020;
originally announced February 2020.
-
Complexity Issues of String to Graph Approximate Matching
Authors:
Riccardo Dondi,
Giancarlo Mauri,
Italo Zoppis
Abstract:
The problem of matching a query string to a directed graph, whose vertices are labeled by strings, has application in different fields, from data mining to computational biology. Several variants of the problem have been considered, depending on the fact that the match is exact or approximate and, in this latter case, which edit operations are considered and where are allowed. In this paper we pre…
▽ More
The problem of matching a query string to a directed graph, whose vertices are labeled by strings, has application in different fields, from data mining to computational biology. Several variants of the problem have been considered, depending on the fact that the match is exact or approximate and, in this latter case, which edit operations are considered and where are allowed. In this paper we present results on the complexity of the approximate matching problem, where edit operations are symbol substitutions and are allowed only on the graph labels or both on the graph labels and the query string. We introduce a variant of the problem that asks whether there exists a path in a graph that represents a query string with any number of edit operations and we show that is is NP-complete, even when labels have length one and in the case the alphabet is binary. Moreover, when it is parameterized by the length of the input string and graph labels have length one, we show that the problem is fixed-parameter tractable and it is unlikely to admit a polynomial kernel. The NP-completeness of this problem leads to the inapproximability (within any factor) of the approximate matching when edit operations are allowed only on the graph labels. Moreover, we show that the variants of approximate string matching to graph we consider are not fixed-parameter tractable, when the parameter is the number of edit operations, even for graphs that have distance one from a DAG. The reduction for this latter result allows us to prove the inapproximability of the variant where edit operations can be applied both on the query string and on graph labels.
△ Less
Submitted 7 January, 2020;
originally announced January 2020.
-
Top-k Overlap** Densest Subgraphs: Approximation and Complexity
Authors:
Riccardo Dondi,
Mohammad Mehdi Hosseinzadeh,
Giancarlo Mauri,
Italo Zoppis
Abstract:
A central problem in graph mining is finding dense subgraphs, with several applications in different fields, a notable example being identifying communities. While a lot of effort has been put on the problem of finding a single dense subgraph, only recently the focus has been shifted to the problem of finding a set of densest subgraphs. Some approaches aim at finding disjoint subgraphs, while in m…
▽ More
A central problem in graph mining is finding dense subgraphs, with several applications in different fields, a notable example being identifying communities. While a lot of effort has been put on the problem of finding a single dense subgraph, only recently the focus has been shifted to the problem of finding a set of densest subgraphs. Some approaches aim at finding disjoint subgraphs, while in many real-world networks communities are often overlap**. An approach introduced to find possible overlap** subgraphs is the Top-k Overlap** Densest Subgraphs problem. For a given integer k >= 1, the goal of this problem is to find a set of k densest subgraphs that may share some vertices. The objective function to be maximized takes into account both the density of the subgraphs and the distance between subgraphs in the solution.
The Top-k Overlap** Densest Subgraphs problem has been shown to admit a 1/10-factor approximation algorithm. Furthermore, the computational complexity of the problem has been left open. In this paper, we present contributions concerning the approximability and the computational complexity of the problem. For the approximability, we present approximation algorithms that improves the approximation factor to 1/2 , when k is bounded by the vertex set, and to 2/3 when k is a constant. For the computational complexity, we show that the problem is NP-hard even when k = 3.
△ Less
Submitted 30 January, 2019; v1 submitted 7 September, 2018;
originally announced September 2018.
-
Reconciling Multiple Genes Trees via Segmental Duplications and Losses
Authors:
Riccardo Dondi,
Manuel Lafond,
Celine Scornavacca
Abstract:
Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. In this paper, we extend a previous approach to reconcile a set of gene trees with a species tree based on segmental macro-evolutionary even…
▽ More
Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. In this paper, we extend a previous approach to reconcile a set of gene trees with a species tree based on segmental macro-evolutionary events, where segmental duplication events and losses are associated with cost $δ$ and $λ$, respectively. We show that the problem is polynomial-time solvable when $δ\leq λ$ (via LCA-map**), while if $δ> λ$ the problem is NP-hard, even when $λ= 0$ and a single gene tree is given, solving a long standing open problem on the complexity of the reconciliation problem. On the positive side, we give a fixed-parameter algorithm for the problem, where the parameters are $δ/λ$ and the number $d$ of segmental duplications, of time complexity $O(\lceil \fracδλ \rceil^{d} \cdot n \cdot \fracδλ)$. Finally, we demonstrate the usefulness of this algorithm on two previously studied real datasets: we first show that our method can be used to confirm or refute hypothetical segmental duplications on a set of 16 eukaryotes, then show how we can detect whole genome duplications in yeast genomes.
△ Less
Submitted 11 June, 2018;
originally announced June 2018.
-
Covering with Clubs: Complexity and Approximability
Authors:
Riccardo Dondi,
Giancarlo Mauri,
Florian Sikora,
Italo Zoppis
Abstract:
Finding cohesive subgraphs in a network is a well-known problem in graph theory. Several alternative formulations of cohesive subgraph have been proposed, a notable example being $s$-club, which is a subgraph where each vertex is at distance at most $s$ to the others. Here we consider the problem of covering a given graph with the minimum number of $s$-clubs. We study the computational and approxi…
▽ More
Finding cohesive subgraphs in a network is a well-known problem in graph theory. Several alternative formulations of cohesive subgraph have been proposed, a notable example being $s$-club, which is a subgraph where each vertex is at distance at most $s$ to the others. Here we consider the problem of covering a given graph with the minimum number of $s$-clubs. We study the computational and approximation complexity of this problem, when $s$ is equal to 2 or 3. First, we show that deciding if there exists a cover of a graph with three $2$-clubs is NP-complete, and that deciding if there exists a cover of a graph with two $3$-clubs is NP-complete. Then, we consider the approximation complexity of covering a graph with the minimum number of $2$-clubs and $3$-clubs. We show that, given a graph $G=(V,E)$ to be covered, covering $G$ with the minimum number of $2$-clubs is not approximable within factor $O(|V|^{1/2 -\varepsilon})$, for any $\varepsilon>0$, and covering $G$ with the minimum number of $3$-clubs is not approximable within factor $O(|V|^{1 -\varepsilon})$, for any $\varepsilon>0$. On the positive side, we give an approximation algorithm of factor $2|V|^{1/2}\log^{3/2} |V|$ for covering a graph with the minimum number of $2$-clubs.
△ Less
Submitted 4 June, 2018;
originally announced June 2018.
-
Finding Disjoint Paths on Edge-Colored Graphs: More Tractability Results
Authors:
Riccardo Dondi,
Florian Sikora
Abstract:
The problem of finding the maximum number of vertex-disjoint uni-color paths in an edge-colored graph (called MaxCDP) has been recently introduced in literature, motivated by applications in social network analysis. In this paper we investigate how the complexity of the problem depends on graph parameters (namely the number of vertices to remove to make the graph a collection of disjoint paths and…
▽ More
The problem of finding the maximum number of vertex-disjoint uni-color paths in an edge-colored graph (called MaxCDP) has been recently introduced in literature, motivated by applications in social network analysis. In this paper we investigate how the complexity of the problem depends on graph parameters (namely the number of vertices to remove to make the graph a collection of disjoint paths and the size of the vertex cover of the graph), which makes sense since graphs in social networks are not random and have structure. The problem was known to be hard to approximate in polynomial time and not fixed-parameter tractable (FPT) for the natural parameter. Here, we show that it is still hard to approximate, even in FPT-time. Finally, we introduce a new variant of the problem, called MaxCDDP, whose goal is to find the maximum number of vertex-disjoint and color-disjoint uni-color paths. We extend some of the results of MaxCDP to this new variant, and we prove that unlike MaxCDP, MaxCDDP is already hard on graphs at distance two from disjoint paths.
△ Less
Submitted 29 November, 2017; v1 submitted 16 September, 2016;
originally announced September 2016.
-
Parameterized Complexity and Approximation Issues for the Colorful Components Problems
Authors:
Riccardo Dondi,
Florian Sikora
Abstract:
The quest for colorful components (connected components where each color is associated with at most one vertex) inside a vertex-colored graph has been widely considered in the last ten years. Here we consider two variants, Minimum Colorful Components (MCC) and Maximum Edges in transitive Closure (MEC), introduced in 2011 in the context of orthology gene identification in bioinformatics. The input…
▽ More
The quest for colorful components (connected components where each color is associated with at most one vertex) inside a vertex-colored graph has been widely considered in the last ten years. Here we consider two variants, Minimum Colorful Components (MCC) and Maximum Edges in transitive Closure (MEC), introduced in 2011 in the context of orthology gene identification in bioinformatics. The input of both MCC and MEC is a vertex-colored graph. MCC asks for the removal of a subset of edges, so that the resulting graph is partitioned in the minimum number of colorful connected components; MEC asks for the removal of a subset of edges, so that the resulting graph is partitioned in colorful connected components and the number of edges in the transitive closure of such a graph is maximized. We study the parameterized and approximation complexity of MCC and MEC, for general and restricted instances.
For MCC on trees we show that the problem is basically equivalent to Minimum Cut on Trees, thus MCC is not approximable within factor $1.36 - \varepsilon$, it is fixed-parameter tractable and it admits a poly-kernel (when the parameter is the number of colorful components). Moreover, we show that MCC, while it is polynomial time solvable on paths, it is NP-hard even for graphs with constant distance to disjoint paths number. Then we consider the parameterized complexity of MEC when parameterized by the number $k$ of edges in the transitive closure of a solution (the graph obtained by removing edges so that it is partitioned in colorful connected components). We give a fixed-parameter algorithm for MEC paramterized by $k$ and, when the input graph is a tree, we give a poly-kernel.
△ Less
Submitted 19 June, 2018; v1 submitted 10 May, 2016;
originally announced May 2016.
-
Parameterized Tractability of the Maximum-Duo Preservation String Map** Problem
Authors:
Stefano Beretta,
Mauro Castelli,
Riccardo Dondi
Abstract:
In this paper we investigate the parameterized complexity of the Maximum-Duo Preservation String Map** Problem, the complementary of the Minimum Common String Partition Problem. We show that this problem is fixed-parameter tractable when parameterized by the number k of conserved duos, by first giving a parameterized algorithm based on the color-coding technique and then presenting a reduction t…
▽ More
In this paper we investigate the parameterized complexity of the Maximum-Duo Preservation String Map** Problem, the complementary of the Minimum Common String Partition Problem. We show that this problem is fixed-parameter tractable when parameterized by the number k of conserved duos, by first giving a parameterized algorithm based on the color-coding technique and then presenting a reduction to a kernel of size O(k^6 ).
△ Less
Submitted 10 December, 2015;
originally announced December 2015.
-
Covering Pairs in Directed Acyclic Graphs
Authors:
Niko Beerenwinkel,
Stefano Beretta,
Paola Bonizzoni,
Riccardo Dondi,
Yuri Pirola
Abstract:
The Minimum Path Cover problem on directed acyclic graphs (DAGs) is a classical problem that provides a clear and simple mathematical formulation for several applications in different areas and that has an efficient algorithmic solution. In this paper, we study the computational complexity of two constrained variants of Minimum Path Cover motivated by the recent introduction of next-generation seq…
▽ More
The Minimum Path Cover problem on directed acyclic graphs (DAGs) is a classical problem that provides a clear and simple mathematical formulation for several applications in different areas and that has an efficient algorithmic solution. In this paper, we study the computational complexity of two constrained variants of Minimum Path Cover motivated by the recent introduction of next-generation sequencing technologies in bioinformatics. The first problem (MinPCRP), given a DAG and a set of pairs of vertices, asks for a minimum cardinality set of paths "covering" all the vertices such that both vertices of each pair belong to the same path. For this problem, we show that, while it is NP-hard to compute if there exists a solution consisting of at most three paths, it is possible to decide in polynomial time whether a solution consisting of at most two paths exists. The second problem (MaxRPSP), given a DAG and a set of pairs of vertices, asks for a path containing the maximum number of the given pairs of vertices. We show its NP-hardness and also its W[1]-hardness when parametrized by the number of covered pairs. On the positive side, we give a fixed-parameter algorithm when the parameter is the maximum overlap** degree, a natural parameter in the bioinformatics applications of the problem.
△ Less
Submitted 18 October, 2013;
originally announced October 2013.
-
On the Complexity of Minimum Labeling Alignment of Two Genomes
Authors:
Riccardo Dondi,
Nadia El-Mabrouk
Abstract:
In this note we investigate the complexity of the Minimum Label Alignment problem and we show that such a problem is APX-hard.
In this note we investigate the complexity of the Minimum Label Alignment problem and we show that such a problem is APX-hard.
△ Less
Submitted 8 June, 2012;
originally announced June 2012.
-
The Binary Perfect Phylogeny with Persistent characters
Authors:
Paola Bonizzoni,
Chiara Braghin,
Riccardo Dondi,
Gabriella Trucco
Abstract:
The binary perfect phylogeny model is too restrictive to model biological events such as back mutations. In this paper we consider a natural generalization of the model that allows a special type of back mutation. We investigate the problem of reconstructing a near perfect phylogeny over a binary set of characters where characters are persistent: characters can be gained and lost at most once. Bas…
▽ More
The binary perfect phylogeny model is too restrictive to model biological events such as back mutations. In this paper we consider a natural generalization of the model that allows a special type of back mutation. We investigate the problem of reconstructing a near perfect phylogeny over a binary set of characters where characters are persistent: characters can be gained and lost at most once. Based on this notion, we define the problem of the Persistent Perfect Phylogeny (referred as P-PP). We restate the P-PP problem as a special case of the Incomplete Directed Perfect Phylogeny, called Incomplete Perfect Phylogeny with Persistent Completion, (refereed as IP-PP), where the instance is an incomplete binary matrix M having some missing entries, denoted by symbol ?, that must be determined (or completed) as 0 or 1 so that M admits a binary perfect phylogeny. We show that the IP-PP problem can be reduced to a problem over an edge colored graph since the completion of each column of the input matrix can be represented by a graph operation. Based on this graph formulation, we develop an exact algorithm for solving the P-PP problem that is exponential in the number of characters and polynomial in the number of species.
△ Less
Submitted 28 June, 2012; v1 submitted 31 October, 2011;
originally announced October 2011.
-
Pure Parsimony Xor Haploty**
Authors:
Paola Bonizzoni,
Gianluca Della Vedova,
Riccardo Dondi,
Yuri Pirola,
Romeo Rizzi
Abstract:
The haplotype resolution from xor-genotype data has been recently formulated as a new model for genetic studies. The xor-genotype data is a cheaply obtainable type of data distinguishing heterozygous from homozygous sites without identifying the homozygous alleles. In this paper we propose a formulation based on a well-known model used in haplotype inference: pure parsimony. We exhibit exact sol…
▽ More
The haplotype resolution from xor-genotype data has been recently formulated as a new model for genetic studies. The xor-genotype data is a cheaply obtainable type of data distinguishing heterozygous from homozygous sites without identifying the homozygous alleles. In this paper we propose a formulation based on a well-known model used in haplotype inference: pure parsimony. We exhibit exact solutions of the problem by providing polynomial time algorithms for some restricted cases and a fixed-parameter algorithm for the general case. These results are based on some interesting combinatorial properties of a graph representation of the solutions. Furthermore, we show that the problem has a polynomial time k-approximation, where k is the maximum number of xor-genotypes containing a given SNP. Finally, we propose a heuristic and produce an experimental analysis showing that it scales to real-world large instances taken from the HapMap project.
△ Less
Submitted 8 January, 2010;
originally announced January 2010.
-
Variants of Constrained Longest Common Subsequence
Authors:
Paola Bonizzoni,
Gianluca Della Vedova,
Riccardo Dondi,
Yuri Pirola
Abstract:
In this work, we consider a variant of the classical Longest Common Subsequence problem called Doubly-Constrained Longest Common Subsequence (DC-LCS). Given two strings s1 and s2 over an alphabet A, a set C_s of strings, and a function Co from A to N, the DC-LCS problem consists in finding the longest subsequence s of s1 and s2 such that s is a supersequence of all the strings in Cs and such tha…
▽ More
In this work, we consider a variant of the classical Longest Common Subsequence problem called Doubly-Constrained Longest Common Subsequence (DC-LCS). Given two strings s1 and s2 over an alphabet A, a set C_s of strings, and a function Co from A to N, the DC-LCS problem consists in finding the longest subsequence s of s1 and s2 such that s is a supersequence of all the strings in Cs and such that the number of occurrences in s of each symbol a in A is upper bounded by Co(a). The DC-LCS problem provides a clear mathematical formulation of a sequence comparison problem in Computational Biology and generalizes two other constrained variants of the LCS problem: the Constrained LCS and the Repetition-Free LCS. We present two results for the DC-LCS problem. First, we illustrate a fixed-parameter algorithm where the parameter is the length of the solution. Secondly, we prove a parameterized hardness result for the Constrained LCS problem when the parameter is the number of the constraint strings and the size of the alphabet A. This hardness result also implies the parameterized hardness of the DC-LCS problem (with the same parameters) and its NP-hardness when the size of the alphabet is constant.
△ Less
Submitted 2 December, 2009;
originally announced December 2009.
-
Parameterized Complexity of the k-anonymity Problem
Authors:
Stefano Beretta,
Paola Bonizzoni,
Gianluca Della Vedova,
Riccardo Dondi,
Yuri Pirola
Abstract:
The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization that has been recently proposed is the $k$-anonymity. This approach requires that the rows of a table are partitioned in clusters of size at least $k$ and that all the rows in a cluster become the same tuple, after the suppression of some entries. The natural optimiz…
▽ More
The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization that has been recently proposed is the $k$-anonymity. This approach requires that the rows of a table are partitioned in clusters of size at least $k$ and that all the rows in a cluster become the same tuple, after the suppression of some entries. The natural optimization problem, where the goal is to minimize the number of suppressed entries, is known to be APX-hard even when the records values are over a binary alphabet and $k=3$, and when the records have length at most 8 and $k=4$ . In this paper we study how the complexity of the problem is influenced by different parameters. In this paper we follow this direction of research, first showing that the problem is W[1]-hard when parameterized by the size of the solution (and the value $k$). Then we exhibit a fixed parameter algorithm, when the problem is parameterized by the size of the alphabet and the number of columns. Finally, we investigate the computational (and approximation) complexity of the $k$-anonymity problem, when restricting the instance to records having length bounded by 3 and $k=3$. We show that such a restriction is APX-hard.
△ Less
Submitted 17 May, 2010; v1 submitted 16 October, 2009;
originally announced October 2009.
-
A PTAS for the Minimum Consensus Clustering Problem with a Fixed Number of Clusters
Authors:
Paola Bonizzoni,
Gianluca Della Vedova,
Riccardo Dondi
Abstract:
The Consensus Clustering problem has been introduced as an effective way to analyze the results of different microarray experiments. The problem consists of looking for a partition that best summarizes a set of input partitions (each corresponding to a different microarray experiment) under a simple and intuitive cost function. The problem admits polynomial time algorithms on two input partition…
▽ More
The Consensus Clustering problem has been introduced as an effective way to analyze the results of different microarray experiments. The problem consists of looking for a partition that best summarizes a set of input partitions (each corresponding to a different microarray experiment) under a simple and intuitive cost function. The problem admits polynomial time algorithms on two input partitions, but is APX-hard on three input partitions. We investigate the restriction of Consensus Clustering when the output partition is required to contain at most k sets, giving a polynomial time approximation scheme (PTAS) while proving the NP-hardness of this restriction.
△ Less
Submitted 10 July, 2009;
originally announced July 2009.
-
The $k$-anonymity Problem is Hard
Authors:
Paola Bonizzoni,
Gianluca Della Vedova,
Riccardo Dondi
Abstract:
The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization recently proposed is the k-anonymity. This approach requires that the rows in a table are clustered in sets of size at least k and that all the rows in a cluster become the same tuple, after the suppression of some records. The natural optimization problem, where the…
▽ More
The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization recently proposed is the k-anonymity. This approach requires that the rows in a table are clustered in sets of size at least k and that all the rows in a cluster become the same tuple, after the suppression of some records. The natural optimization problem, where the goal is to minimize the number of suppressed entries, is known to be NP-hard when the values are over a ternary alphabet, k = 3 and the rows length is unbounded. In this paper we give a lower bound on the approximation factor that any polynomial-time algorithm can achive on two restrictions of the problem,namely (i) when the records values are over a binary alphabet and k = 3, and (ii) when the records have length at most 8 and k = 4, showing that these restrictions of the problem are APX-hard.
△ Less
Submitted 2 June, 2009; v1 submitted 3 July, 2007;
originally announced July 2007.
-
Approximating Clustering of Fingerprint Vectors with Missing Values
Authors:
Paola Bonizzoni,
Gianluca Della Vedova,
Riccardo Dondi
Abstract:
The problem of clustering fingerprint vectors is an interesting problem in Computational Biology that has been proposed in (Figureroa et al. 2004). In this paper we show some improvements in closing the gaps between the known lower bounds and upper bounds on the approximability of some variants of the biological problem. Namely we are able to prove that the problem is APX-hard even when each fin…
▽ More
The problem of clustering fingerprint vectors is an interesting problem in Computational Biology that has been proposed in (Figureroa et al. 2004). In this paper we show some improvements in closing the gaps between the known lower bounds and upper bounds on the approximability of some variants of the biological problem. Namely we are able to prove that the problem is APX-hard even when each fingerprint contains only two unknown position. Moreover we have studied some variants of the orginal problem, and we give two 2-approximation algorithm for the IECMV and OECMV problems when the number of unknown entries for each vector is at most a constant.
△ Less
Submitted 23 November, 2005;
originally announced November 2005.