Search | arXiv e-print repository

An FTP Algorithm for Temporal Graph Untangling

Abstract: Several classical combinatorial problems have been considered and analysed on temporal graphs. Recently, a variant of Vertex Cover on temporal graphs, called MinTimelineCover, has been introduced to summarize timeline activities in social networks. The problem asks to cover every temporal edge while minimizing the total span of the vertices (where the span of a vertex is the length of the timestam… ▽ More Several classical combinatorial problems have been considered and analysed on temporal graphs. Recently, a variant of Vertex Cover on temporal graphs, called MinTimelineCover, has been introduced to summarize timeline activities in social networks. The problem asks to cover every temporal edge while minimizing the total span of the vertices (where the span of a vertex is the length of the timestamp interval it must remain active in, minus one). While the problem has been shown to be NP-hard even in very restricted cases, its parameterized complexity has not been fully understood. The problem is known to be in FPT under the span parameter only for graphs with two timestamps, but the parameterized complexity for the general case is open. We settle this open problem by giving an FPT algorithm that is based on a combination of iterative compression and a reduction to the Digraph Pair Cut problem, a powerful problem that has received significant attention recently. △ Less

Submitted 3 July, 2023; originally announced July 2023.

arXiv:2109.01392 [pdf, other]

Finding Colorful Paths in Temporal Graphs

Authors: Riccardo Dondi, Mohammad Mehdi Hosseinzadeh

Abstract: The problem of finding paths in temporal graphs has been recently considered due to its many applications. In this paper we consider a variant of the problem that, given a vertex-colored temporal graph, asks for a path whose vertices have distinct colors and include the maximum number of colors. We study the approximation complexity of the problem and we provide an inapproximability lower bound. T… ▽ More The problem of finding paths in temporal graphs has been recently considered due to its many applications. In this paper we consider a variant of the problem that, given a vertex-colored temporal graph, asks for a path whose vertices have distinct colors and include the maximum number of colors. We study the approximation complexity of the problem and we provide an inapproximability lower bound. Then we present a heuristic for the problem and an experimental evaluation of our heuristic, both on synthetic and real-world graphs. △ Less

Submitted 3 September, 2021; originally announced September 2021.

arXiv:2011.08119 [pdf, other]

The Longest Run Subsequence Problem: Further Complexity Results

Authors: Riccardo Dondi, Florian Sikora

Abstract: Longest Run Subsequence is a problem introduced recently in the context of the scaffolding phase of genome assembly (Schrinner et al., WABI 2020). The problem asks for a maximum length subsequence of a given string that contains at most one run for each symbol (a run is a maximum substring of consecutive identical symbols). The problem has been shown to be NP-hard and to be fixed-parameter tractab… ▽ More Longest Run Subsequence is a problem introduced recently in the context of the scaffolding phase of genome assembly (Schrinner et al., WABI 2020). The problem asks for a maximum length subsequence of a given string that contains at most one run for each symbol (a run is a maximum substring of consecutive identical symbols). The problem has been shown to be NP-hard and to be fixed-parameter tractable when the parameter is the size of the alphabet on which the input string is defined. In this paper we further investigate the complexity of the problem and we show that it is fixed-parameter tractable when it is parameterized by the number of runs in a solution, a smaller parameter. Moreover, we investigate the kernelization complexity of Longest Run Subsequence and we prove that it does not admit a polynomial kernel when parameterized by the size of the alphabet or by the number of runs. Finally, we consider the restriction of Longest Run Subsequence when each symbol has at most two occurrences in the input string and we show that it is APX-hard. △ Less

Submitted 22 June, 2021; v1 submitted 16 November, 2020; originally announced November 2020.

Comments: Accepted in CPM 2021

arXiv:2008.01573 [pdf, other]

Top-k Connected Overlap** Densest Subgraphs in Dual Networks

Authors: Riccardo Dondi, Pietro Hiram Guzzi, Mohammad Mehdi Hosseinzadeh

Abstract: Networks are largely used for modelling and analysing data and relations among them. Recently, it has been shown that the use of a single network may not be the optimal choice, since a single network may misses some aspects. Consequently, it has been proposed to use a pair of networks to better model all the aspects, and the main approach is referred to as dual networks (DNs). DNs are two related… ▽ More Networks are largely used for modelling and analysing data and relations among them. Recently, it has been shown that the use of a single network may not be the optimal choice, since a single network may misses some aspects. Consequently, it has been proposed to use a pair of networks to better model all the aspects, and the main approach is referred to as dual networks (DNs). DNs are two related graphs (one weighted, the other unweighted) that share the same set of vertices and two different edge sets. In DNs is often interesting to extract common subgraphs among the two networks that are maximally dense in the conceptual network and connected in the physical one. The simplest instance of this problem is finding a common densest connected subgraph (DCS), while we here focus on the detection of the Top-k Densest Connected subgraphs, i.e. a set k subgraphs having the largest density in the conceptual network which are also connected in the physical network. We formalise the problem and then we propose a heuristic to find a solution, since the problem is computationally hard. A set of experiments on synthetic and real networks is also presented to support our approach. △ Less

Submitted 4 August, 2020; originally announced August 2020.

arXiv:2002.07695 [pdf, other]

Computing the k Densest Subgraphs of a Graph

Authors: Riccardo Dondi, Danny Hermelin

Abstract: Computing cohesive subgraphs is a central problem in graph theory. While many formulations of cohesive subgraphs lead to NP-hard problems, finding a densest subgraph can be done in polynomial time. As such, the densest subgraph model has emerged as the most popular notion of cohesiveness. Recently, the data mining community has started looking into the problem of computing k densest subgraphs in a… ▽ More Computing cohesive subgraphs is a central problem in graph theory. While many formulations of cohesive subgraphs lead to NP-hard problems, finding a densest subgraph can be done in polynomial time. As such, the densest subgraph model has emerged as the most popular notion of cohesiveness. Recently, the data mining community has started looking into the problem of computing k densest subgraphs in a given graph, rather than one, with various restrictions on the possible overlap between the subgraphs. However, there seems to be very little known on this important and natural generalization from a theoretical perspective. In this paper we hope to remedy this situation by analyzing three natural variants of the k densest subgraphs problem. Each variant differs depending on the amount of overlap that is allowed between the subgraphs. In one extreme, when no overlap is allowed, we prove that the problem is NP-hard for k >= 3. On the other extreme, when overlap is allowed without any restrictions and the solution subgraphs only have to be distinct, we show that the problem is fixed-parameter tractable with respect to k, and admits a PTAS for constant k. Finally, when a limited of overlap is allowed between the subgraphs, we prove that the problem is NP-hard for k = 2. △ Less

Submitted 23 November, 2021; v1 submitted 18 February, 2020; originally announced February 2020.

arXiv:2001.01961 [pdf, other]

Complexity Issues of String to Graph Approximate Matching

Authors: Riccardo Dondi, Giancarlo Mauri, Italo Zoppis

Abstract: The problem of matching a query string to a directed graph, whose vertices are labeled by strings, has application in different fields, from data mining to computational biology. Several variants of the problem have been considered, depending on the fact that the match is exact or approximate and, in this latter case, which edit operations are considered and where are allowed. In this paper we pre… ▽ More The problem of matching a query string to a directed graph, whose vertices are labeled by strings, has application in different fields, from data mining to computational biology. Several variants of the problem have been considered, depending on the fact that the match is exact or approximate and, in this latter case, which edit operations are considered and where are allowed. In this paper we present results on the complexity of the approximate matching problem, where edit operations are symbol substitutions and are allowed only on the graph labels or both on the graph labels and the query string. We introduce a variant of the problem that asks whether there exists a path in a graph that represents a query string with any number of edit operations and we show that is is NP-complete, even when labels have length one and in the case the alphabet is binary. Moreover, when it is parameterized by the length of the input string and graph labels have length one, we show that the problem is fixed-parameter tractable and it is unlikely to admit a polynomial kernel. The NP-completeness of this problem leads to the inapproximability (within any factor) of the approximate matching when edit operations are allowed only on the graph labels. Moreover, we show that the variants of approximate string matching to graph we consider are not fixed-parameter tractable, when the parameter is the number of edit operations, even for graphs that have distance one from a DAG. The reduction for this latter result allows us to prove the inapproximability of the variant where edit operations can be applied both on the query string and on graph labels. △ Less

Submitted 7 January, 2020; originally announced January 2020.

Comments: Extended version of a paper accepted to LATA 2020

arXiv:1809.02434 [pdf, other]

Top-k Overlap** Densest Subgraphs: Approximation and Complexity

Authors: Riccardo Dondi, Mohammad Mehdi Hosseinzadeh, Giancarlo Mauri, Italo Zoppis

Abstract: A central problem in graph mining is finding dense subgraphs, with several applications in different fields, a notable example being identifying communities. While a lot of effort has been put on the problem of finding a single dense subgraph, only recently the focus has been shifted to the problem of finding a set of densest subgraphs. Some approaches aim at finding disjoint subgraphs, while in m… ▽ More A central problem in graph mining is finding dense subgraphs, with several applications in different fields, a notable example being identifying communities. While a lot of effort has been put on the problem of finding a single dense subgraph, only recently the focus has been shifted to the problem of finding a set of densest subgraphs. Some approaches aim at finding disjoint subgraphs, while in many real-world networks communities are often overlap**. An approach introduced to find possible overlap** subgraphs is the Top-k Overlap** Densest Subgraphs problem. For a given integer k >= 1, the goal of this problem is to find a set of k densest subgraphs that may share some vertices. The objective function to be maximized takes into account both the density of the subgraphs and the distance between subgraphs in the solution. The Top-k Overlap** Densest Subgraphs problem has been shown to admit a 1/10-factor approximation algorithm. Furthermore, the computational complexity of the problem has been left open. In this paper, we present contributions concerning the approximability and the computational complexity of the problem. For the approximability, we present approximation algorithms that improves the approximation factor to 1/2 , when k is bounded by the vertex set, and to 2/3 when k is a constant. For the computational complexity, we show that the problem is NP-hard even when k = 3. △ Less

Submitted 30 January, 2019; v1 submitted 7 September, 2018; originally announced September 2018.

arXiv:1806.03988 [pdf, other]

Reconciling Multiple Genes Trees via Segmental Duplications and Losses

Authors: Riccardo Dondi, Manuel Lafond, Celine Scornavacca

Abstract: Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. In this paper, we extend a previous approach to reconcile a set of gene trees with a species tree based on segmental macro-evolutionary even… ▽ More Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. In this paper, we extend a previous approach to reconcile a set of gene trees with a species tree based on segmental macro-evolutionary events, where segmental duplication events and losses are associated with cost $δ$ and $λ$, respectively. We show that the problem is polynomial-time solvable when $δ\leq λ$ (via LCA-map**), while if $δ> λ$ the problem is NP-hard, even when $λ= 0$ and a single gene tree is given, solving a long standing open problem on the complexity of the reconciliation problem. On the positive side, we give a fixed-parameter algorithm for the problem, where the parameters are $δ/λ$ and the number $d$ of segmental duplications, of time complexity $O(\lceil \fracδλ \rceil^{d} \cdot n \cdot \fracδλ)$. Finally, we demonstrate the usefulness of this algorithm on two previously studied real datasets: we first show that our method can be used to confirm or refute hypothetical segmental duplications on a set of 16 eukaryotes, then show how we can detect whole genome duplications in yeast genomes. △ Less

Submitted 11 June, 2018; originally announced June 2018.

Comments: 23 pages, 7 figures, WABI 2018

arXiv:1806.01119 [pdf, other]

Covering with Clubs: Complexity and Approximability

Authors: Riccardo Dondi, Giancarlo Mauri, Florian Sikora, Italo Zoppis

Abstract: Finding cohesive subgraphs in a network is a well-known problem in graph theory. Several alternative formulations of cohesive subgraph have been proposed, a notable example being $s$-club, which is a subgraph where each vertex is at distance at most $s$ to the others. Here we consider the problem of covering a given graph with the minimum number of $s$-clubs. We study the computational and approxi… ▽ More Finding cohesive subgraphs in a network is a well-known problem in graph theory. Several alternative formulations of cohesive subgraph have been proposed, a notable example being $s$-club, which is a subgraph where each vertex is at distance at most $s$ to the others. Here we consider the problem of covering a given graph with the minimum number of $s$-clubs. We study the computational and approximation complexity of this problem, when $s$ is equal to 2 or 3. First, we show that deciding if there exists a cover of a graph with three $2$-clubs is NP-complete, and that deciding if there exists a cover of a graph with two $3$-clubs is NP-complete. Then, we consider the approximation complexity of covering a graph with the minimum number of $2$-clubs and $3$-clubs. We show that, given a graph $G=(V,E)$ to be covered, covering $G$ with the minimum number of $2$-clubs is not approximable within factor $O(|V|^{1/2 -\varepsilon})$, for any $\varepsilon>0$, and covering $G$ with the minimum number of $3$-clubs is not approximable within factor $O(|V|^{1 -\varepsilon})$, for any $\varepsilon>0$. On the positive side, we give an approximation algorithm of factor $2|V|^{1/2}\log^{3/2} |V|$ for covering a graph with the minimum number of $2$-clubs. △ Less

Submitted 4 June, 2018; originally announced June 2018.

Comments: Accepted in IWOCA 2018

arXiv:1609.04951 [pdf, ps, other]

Finding Disjoint Paths on Edge-Colored Graphs: More Tractability Results

Authors: Riccardo Dondi, Florian Sikora

Abstract: The problem of finding the maximum number of vertex-disjoint uni-color paths in an edge-colored graph (called MaxCDP) has been recently introduced in literature, motivated by applications in social network analysis. In this paper we investigate how the complexity of the problem depends on graph parameters (namely the number of vertices to remove to make the graph a collection of disjoint paths and… ▽ More The problem of finding the maximum number of vertex-disjoint uni-color paths in an edge-colored graph (called MaxCDP) has been recently introduced in literature, motivated by applications in social network analysis. In this paper we investigate how the complexity of the problem depends on graph parameters (namely the number of vertices to remove to make the graph a collection of disjoint paths and the size of the vertex cover of the graph), which makes sense since graphs in social networks are not random and have structure. The problem was known to be hard to approximate in polynomial time and not fixed-parameter tractable (FPT) for the natural parameter. Here, we show that it is still hard to approximate, even in FPT-time. Finally, we introduce a new variant of the problem, called MaxCDDP, whose goal is to find the maximum number of vertex-disjoint and color-disjoint uni-color paths. We extend some of the results of MaxCDP to this new variant, and we prove that unlike MaxCDP, MaxCDDP is already hard on graphs at distance two from disjoint paths. △ Less

Submitted 29 November, 2017; v1 submitted 16 September, 2016; originally announced September 2016.

Comments: Journal version in JOCO

arXiv:1605.03071 [pdf, ps, other]

doi 10.1016/j.tcs.2018.04.044

Parameterized Complexity and Approximation Issues for the Colorful Components Problems

Authors: Riccardo Dondi, Florian Sikora

Abstract: The quest for colorful components (connected components where each color is associated with at most one vertex) inside a vertex-colored graph has been widely considered in the last ten years. Here we consider two variants, Minimum Colorful Components (MCC) and Maximum Edges in transitive Closure (MEC), introduced in 2011 in the context of orthology gene identification in bioinformatics. The input… ▽ More The quest for colorful components (connected components where each color is associated with at most one vertex) inside a vertex-colored graph has been widely considered in the last ten years. Here we consider two variants, Minimum Colorful Components (MCC) and Maximum Edges in transitive Closure (MEC), introduced in 2011 in the context of orthology gene identification in bioinformatics. The input of both MCC and MEC is a vertex-colored graph. MCC asks for the removal of a subset of edges, so that the resulting graph is partitioned in the minimum number of colorful connected components; MEC asks for the removal of a subset of edges, so that the resulting graph is partitioned in colorful connected components and the number of edges in the transitive closure of such a graph is maximized. We study the parameterized and approximation complexity of MCC and MEC, for general and restricted instances. For MCC on trees we show that the problem is basically equivalent to Minimum Cut on Trees, thus MCC is not approximable within factor $1.36 - \varepsilon$, it is fixed-parameter tractable and it admits a poly-kernel (when the parameter is the number of colorful components). Moreover, we show that MCC, while it is polynomial time solvable on paths, it is NP-hard even for graphs with constant distance to disjoint paths number. Then we consider the parameterized complexity of MEC when parameterized by the number $k$ of edges in the transitive closure of a solution (the graph obtained by removing edges so that it is partitioned in colorful connected components). We give a fixed-parameter algorithm for MEC paramterized by $k$ and, when the input graph is a tree, we give a poly-kernel. △ Less

Submitted 19 June, 2018; v1 submitted 10 May, 2016; originally announced May 2016.

Journal ref: Theoretical Computer Science Volume 739, 29 August 2018, Pages 1-12

arXiv:1512.03220 [pdf, ps, other]

Parameterized Tractability of the Maximum-Duo Preservation String Map** Problem

Authors: Stefano Beretta, Mauro Castelli, Riccardo Dondi

Abstract: In this paper we investigate the parameterized complexity of the Maximum-Duo Preservation String Map** Problem, the complementary of the Minimum Common String Partition Problem. We show that this problem is fixed-parameter tractable when parameterized by the number k of conserved duos, by first giving a parameterized algorithm based on the color-coding technique and then presenting a reduction t… ▽ More In this paper we investigate the parameterized complexity of the Maximum-Duo Preservation String Map** Problem, the complementary of the Minimum Common String Partition Problem. We show that this problem is fixed-parameter tractable when parameterized by the number k of conserved duos, by first giving a parameterized algorithm based on the color-coding technique and then presenting a reduction to a kernel of size O(k^6 ). △ Less

Submitted 10 December, 2015; originally announced December 2015.

arXiv:1310.5037 [pdf, other]

doi 10.1007/978-3-319-04921-2_10

Covering Pairs in Directed Acyclic Graphs

Authors: Niko Beerenwinkel, Stefano Beretta, Paola Bonizzoni, Riccardo Dondi, Yuri Pirola

Abstract: The Minimum Path Cover problem on directed acyclic graphs (DAGs) is a classical problem that provides a clear and simple mathematical formulation for several applications in different areas and that has an efficient algorithmic solution. In this paper, we study the computational complexity of two constrained variants of Minimum Path Cover motivated by the recent introduction of next-generation seq… ▽ More The Minimum Path Cover problem on directed acyclic graphs (DAGs) is a classical problem that provides a clear and simple mathematical formulation for several applications in different areas and that has an efficient algorithmic solution. In this paper, we study the computational complexity of two constrained variants of Minimum Path Cover motivated by the recent introduction of next-generation sequencing technologies in bioinformatics. The first problem (MinPCRP), given a DAG and a set of pairs of vertices, asks for a minimum cardinality set of paths "covering" all the vertices such that both vertices of each pair belong to the same path. For this problem, we show that, while it is NP-hard to compute if there exists a solution consisting of at most three paths, it is possible to decide in polynomial time whether a solution consisting of at most two paths exists. The second problem (MaxRPSP), given a DAG and a set of pairs of vertices, asks for a path containing the maximum number of the given pairs of vertices. We show its NP-hardness and also its W[1]-hardness when parametrized by the number of covered pairs. On the positive side, we give a fixed-parameter algorithm when the parameter is the maximum overlap** degree, a natural parameter in the bioinformatics applications of the problem. △ Less

Submitted 18 October, 2013; originally announced October 2013.

Journal ref: Proc. of Language and Automata Theory and Applications (LATA 2014), LNCS Vol. 8370, 2014, pp 126-137

arXiv:1206.1877 [pdf, ps, other]

On the Complexity of Minimum Labeling Alignment of Two Genomes

Authors: Riccardo Dondi, Nadia El-Mabrouk

Abstract: In this note we investigate the complexity of the Minimum Label Alignment problem and we show that such a problem is APX-hard. In this note we investigate the complexity of the Minimum Label Alignment problem and we show that such a problem is APX-hard. △ Less

Submitted 8 June, 2012; originally announced June 2012.

arXiv:1110.6739 [pdf, ps, other]

The Binary Perfect Phylogeny with Persistent characters

Authors: Paola Bonizzoni, Chiara Braghin, Riccardo Dondi, Gabriella Trucco

Abstract: The binary perfect phylogeny model is too restrictive to model biological events such as back mutations. In this paper we consider a natural generalization of the model that allows a special type of back mutation. We investigate the problem of reconstructing a near perfect phylogeny over a binary set of characters where characters are persistent: characters can be gained and lost at most once. Bas… ▽ More The binary perfect phylogeny model is too restrictive to model biological events such as back mutations. In this paper we consider a natural generalization of the model that allows a special type of back mutation. We investigate the problem of reconstructing a near perfect phylogeny over a binary set of characters where characters are persistent: characters can be gained and lost at most once. Based on this notion, we define the problem of the Persistent Perfect Phylogeny (referred as P-PP). We restate the P-PP problem as a special case of the Incomplete Directed Perfect Phylogeny, called Incomplete Perfect Phylogeny with Persistent Completion, (refereed as IP-PP), where the instance is an incomplete binary matrix M having some missing entries, denoted by symbol ?, that must be determined (or completed) as 0 or 1 so that M admits a binary perfect phylogeny. We show that the IP-PP problem can be reduced to a problem over an edge colored graph since the completion of each column of the input matrix can be represented by a graph operation. Based on this graph formulation, we develop an exact algorithm for solving the P-PP problem that is exponential in the number of characters and polynomial in the number of species. △ Less

Submitted 28 June, 2012; v1 submitted 31 October, 2011; originally announced October 2011.

Comments: 13 pages, 3 figures

arXiv:1001.1210 [pdf, other]

doi 10.1109/TCBB.2010.52

Pure Parsimony Xor Haploty**

Authors: Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola, Romeo Rizzi

Abstract: The haplotype resolution from xor-genotype data has been recently formulated as a new model for genetic studies. The xor-genotype data is a cheaply obtainable type of data distinguishing heterozygous from homozygous sites without identifying the homozygous alleles. In this paper we propose a formulation based on a well-known model used in haplotype inference: pure parsimony. We exhibit exact sol… ▽ More The haplotype resolution from xor-genotype data has been recently formulated as a new model for genetic studies. The xor-genotype data is a cheaply obtainable type of data distinguishing heterozygous from homozygous sites without identifying the homozygous alleles. In this paper we propose a formulation based on a well-known model used in haplotype inference: pure parsimony. We exhibit exact solutions of the problem by providing polynomial time algorithms for some restricted cases and a fixed-parameter algorithm for the general case. These results are based on some interesting combinatorial properties of a graph representation of the solutions. Furthermore, we show that the problem has a polynomial time k-approximation, where k is the maximum number of xor-genotypes containing a given SNP. Finally, we propose a heuristic and produce an experimental analysis showing that it scales to real-world large instances taken from the HapMap project. △ Less

Submitted 8 January, 2010; originally announced January 2010.

Journal ref: IEEE/ACM Trans. on Computational Biology and Bioinformatics 7.4 (2010) 598-610

arXiv:0912.0368 [pdf, ps, other]

doi 10.1016/j.ipl.2010.07.015

Variants of Constrained Longest Common Subsequence

Authors: Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola

Abstract: In this work, we consider a variant of the classical Longest Common Subsequence problem called Doubly-Constrained Longest Common Subsequence (DC-LCS). Given two strings s1 and s2 over an alphabet A, a set C_s of strings, and a function Co from A to N, the DC-LCS problem consists in finding the longest subsequence s of s1 and s2 such that s is a supersequence of all the strings in Cs and such tha… ▽ More In this work, we consider a variant of the classical Longest Common Subsequence problem called Doubly-Constrained Longest Common Subsequence (DC-LCS). Given two strings s1 and s2 over an alphabet A, a set C_s of strings, and a function Co from A to N, the DC-LCS problem consists in finding the longest subsequence s of s1 and s2 such that s is a supersequence of all the strings in Cs and such that the number of occurrences in s of each symbol a in A is upper bounded by Co(a). The DC-LCS problem provides a clear mathematical formulation of a sequence comparison problem in Computational Biology and generalizes two other constrained variants of the LCS problem: the Constrained LCS and the Repetition-Free LCS. We present two results for the DC-LCS problem. First, we illustrate a fixed-parameter algorithm where the parameter is the length of the solution. Secondly, we prove a parameterized hardness result for the Constrained LCS problem when the parameter is the number of the constraint strings and the size of the alphabet A. This hardness result also implies the parameterized hardness of the DC-LCS problem (with the same parameters) and its NP-hardness when the size of the alphabet is constant. △ Less

Submitted 2 December, 2009; originally announced December 2009.

Journal ref: Information Processing Letters 110.20 (2010) 877-881

arXiv:0910.3148 [pdf, other]

doi 10.1007/s10878-011-9428-9

Parameterized Complexity of the k-anonymity Problem

Authors: Stefano Beretta, Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola

Abstract: The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization that has been recently proposed is the $k$-anonymity. This approach requires that the rows of a table are partitioned in clusters of size at least $k$ and that all the rows in a cluster become the same tuple, after the suppression of some entries. The natural optimiz… ▽ More The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization that has been recently proposed is the $k$-anonymity. This approach requires that the rows of a table are partitioned in clusters of size at least $k$ and that all the rows in a cluster become the same tuple, after the suppression of some entries. The natural optimization problem, where the goal is to minimize the number of suppressed entries, is known to be APX-hard even when the records values are over a binary alphabet and $k=3$, and when the records have length at most 8 and $k=4$ . In this paper we study how the complexity of the problem is influenced by different parameters. In this paper we follow this direction of research, first showing that the problem is W[1]-hard when parameterized by the size of the solution (and the value $k$). Then we exhibit a fixed parameter algorithm, when the problem is parameterized by the size of the alphabet and the number of columns. Finally, we investigate the computational (and approximation) complexity of the $k$-anonymity problem, when restricting the instance to records having length bounded by 3 and $k=3$. We show that such a restriction is APX-hard. △ Less

Submitted 17 May, 2010; v1 submitted 16 October, 2009; originally announced October 2009.

Comments: 22 pages, 2 figures

Journal ref: J. of Combinatorial Optimization 26.1 (2013) 19-43

arXiv:0907.1840 [pdf, ps, other]

A PTAS for the Minimum Consensus Clustering Problem with a Fixed Number of Clusters

Authors: Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi

Abstract: The Consensus Clustering problem has been introduced as an effective way to analyze the results of different microarray experiments. The problem consists of looking for a partition that best summarizes a set of input partitions (each corresponding to a different microarray experiment) under a simple and intuitive cost function. The problem admits polynomial time algorithms on two input partition… ▽ More The Consensus Clustering problem has been introduced as an effective way to analyze the results of different microarray experiments. The problem consists of looking for a partition that best summarizes a set of input partitions (each corresponding to a different microarray experiment) under a simple and intuitive cost function. The problem admits polynomial time algorithms on two input partitions, but is APX-hard on three input partitions. We investigate the restriction of Consensus Clustering when the output partition is required to contain at most k sets, giving a polynomial time approximation scheme (PTAS) while proving the NP-hardness of this restriction. △ Less

Submitted 10 July, 2009; originally announced July 2009.

arXiv:0707.0421 [pdf, ps, other]

The $k$-anonymity Problem is Hard

Authors: Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi

Abstract: The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization recently proposed is the k-anonymity. This approach requires that the rows in a table are clustered in sets of size at least k and that all the rows in a cluster become the same tuple, after the suppression of some records. The natural optimization problem, where the… ▽ More The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization recently proposed is the k-anonymity. This approach requires that the rows in a table are clustered in sets of size at least k and that all the rows in a cluster become the same tuple, after the suppression of some records. The natural optimization problem, where the goal is to minimize the number of suppressed entries, is known to be NP-hard when the values are over a ternary alphabet, k = 3 and the rows length is unbounded. In this paper we give a lower bound on the approximation factor that any polynomial-time algorithm can achive on two restrictions of the problem,namely (i) when the records values are over a binary alphabet and k = 3, and (ii) when the records have length at most 8 and k = 4, showing that these restrictions of the problem are APX-hard. △ Less

Submitted 2 June, 2009; v1 submitted 3 July, 2007; originally announced July 2007.

Comments: 21 pages, A short version of this paper has been accepted in FCT 2009 - 17th International Symposium on Fundamentals of Computation Theory

arXiv:cs/0511082 [pdf, ps, other]

doi 10.1007/s00453-008-9265-0

Approximating Clustering of Fingerprint Vectors with Missing Values

Authors: Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi

Abstract: The problem of clustering fingerprint vectors is an interesting problem in Computational Biology that has been proposed in (Figureroa et al. 2004). In this paper we show some improvements in closing the gaps between the known lower bounds and upper bounds on the approximability of some variants of the biological problem. Namely we are able to prove that the problem is APX-hard even when each fin… ▽ More The problem of clustering fingerprint vectors is an interesting problem in Computational Biology that has been proposed in (Figureroa et al. 2004). In this paper we show some improvements in closing the gaps between the known lower bounds and upper bounds on the approximability of some variants of the biological problem. Namely we are able to prove that the problem is APX-hard even when each fingerprint contains only two unknown position. Moreover we have studied some variants of the orginal problem, and we give two 2-approximation algorithm for the IECMV and OECMV problems when the number of unknown entries for each vector is at most a constant. △ Less

Submitted 23 November, 2005; originally announced November 2005.

Comments: 13 pages, 4 figures

Showing 1–21 of 21 results for author: Dondi, R