Skip to main content

Showing 1–27 of 27 results for author: Bonizzoni, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18473  [pdf, ps, other

    cs.FL

    Unveiling the connection between the Lyndon factorization and the Canonical Inverse Lyndon factorization via a border property

    Authors: Paola Bonizzoni, Clelia De Felice, Brian Riccardi, Rocco Zaccagnino, Rosalba Zizza

    Abstract: The notion of Lyndon word and Lyndon factorization has shown to have unexpected applications in theory as well in develo** novel algorithms on words. A counterpart to these notions are those of inverse Lyndon word and inverse Lyndon factorization. Differently from the Lyndon words, the inverse Lyndon words may be bordered. The relationship between the two factorizations is related to the inverse… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: 11 pages, version submitted to MFCS2024. arXiv admin note: text overlap with arXiv:2404.17969, arXiv:1911.01851

  2. arXiv:2404.17969  [pdf, ps, other

    math.CO cs.FL

    From the Lyndon factorization to the Canonical Inverse Lyndon factorization: back and forth

    Authors: Paola Bonizzoni, Clelia De Felice, Rocco Zaccagnino, Rosalba Zizza

    Abstract: The notion of inverse Lyndon word is related to the classical notion of Lyndon word. More precisely, inverse Lyndon words are all and only the nonempty prefixes of the powers of the anti-Lyndon words, where an anti-Lyndon word with respect to a lexicographical order is a classical Lyndon word with respect to the inverse lexicographic order. Each word $w$ admits a factorization in inverse Lyndon wo… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:1911.01851

  3. arXiv:2202.13884  [pdf, other

    q-bio.GN cs.FL cs.LG

    Numeric Lyndon-based feature embedding of sequencing reads for machine learning approaches

    Authors: Paola Bonizzoni, Matteo Costantini, Clelia De Felice, Alessia Petescia, Yuri Pirola, Marco Previtali, Raffaella Rizzi, Jens Stoye, Rocco Zaccagnino, Rosalba Zizza

    Abstract: Feature embedding methods have been proposed in literature to represent sequences as numeric vectors to be used in some bioinformatics investigations, such as family classification and protein structure prediction. Recent theoretical results showed that the well-known Lyndon factorization preserves common factors in overlap** strings. Surprisingly, the fingerprint of a sequencing read, which is… ▽ More

    Submitted 2 June, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

    ACM Class: I.2.6; F.4.3

    Journal ref: Information Sciences 607 (2022) 458-476

  4. arXiv:2010.05644  [pdf, other

    cs.DS

    Incomplete Directed Perfect Phylogeny in Linear Time

    Authors: Giulia Bernardini, Paola Bonizzoni, Paweł Gawrychowski

    Abstract: Reconstructing the evolutionary history of a set of species is a central task in computational biology. In real data, it is often the case that some information is missing: the Incomplete Directed Perfect Phylogeny (IDPP) problem asks, given a collection of species described by a set of binary characters with some unknown states, to complete the missing states in such a way that the result can be… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: 12 pages, 3 figures

  5. arXiv:2002.05600  [pdf, other

    cs.DS

    On Two Measures of Distance between Fully-Labelled Trees

    Authors: Giulia Bernardini, Paola Bonizzoni, Paweł Gawrychowski

    Abstract: The last decade brought a significant increase in the amount of data and a variety of new inference methods for reconstructing the detailed evolutionary history of various cancers. This brings the need of designing efficient procedures for comparing rooted trees representing the evolution of mutations in tumor phylogenies. Bernardini et al. [CPM 2019] recently introduced a notion of the rearrangem… ▽ More

    Submitted 29 April, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: 17 pages, 15 figures. To be published in the proceedings of CPM 2020

  6. Lyndon words versus inverse Lyndon words: queries on suffixes and bordered words

    Authors: Paola Bonizzoni, Clelia De Felice, Rocco Zaccagnino, Rosalba Zizza

    Abstract: Lyndon words have been largely investigated and showned to be a useful tool to prove interesting combinatorial properties of words. In this paper we state new properties of both Lyndon and inverse Lyndon factorizations of a word $w$, with the aim of exploring their use in some classical queries on $w$. The main property we prove is related to a classical query on words. We prove that there are r… ▽ More

    Submitted 2 November, 2019; originally announced November 2019.

    Comments: arXiv admin note: text overlap with arXiv:1705.10277

    Journal ref: Theoretical Computer Science, 2020

  7. arXiv:1904.01321  [pdf, other

    cs.DS

    A rearrangement distance for fully-labelled trees

    Authors: Giulia Bernardini, Paola Bonizzoni, Gianluca Della Vedova, Murray Patterson

    Abstract: The problem of comparing trees representing the evolutionary histories of cancerous tumors has turned out to be crucial, since there is a variety of different methods which typically infer multiple possible trees. A departure from the widely studied setting of classical phylogenetics, where trees are leaf-labelled, tumoral trees are fully labelled, i.e., \emph{every} vertex has a label. In this… ▽ More

    Submitted 2 April, 2019; originally announced April 2019.

    Comments: Conference paper

  8. arXiv:1705.10277  [pdf, ps, other

    cs.FL cs.DM cs.DS

    Inverse Lyndon words and Inverse Lyndon factorizations of words

    Authors: Paola Bonizzoni, Clelia De Felice, Rocco Zaccagnino, Rosalba Zizza

    Abstract: Motivated by applications to string processing, we introduce variants of the Lyndon factorization called inverse Lyndon factorizations. Their factors, named inverse Lyndon words, are in a class that strictly contains anti-Lyndon words, that is Lyndon words with respect to the inverse lexicographic order. The Lyndon factorization of a nonempty word w is unique but w may have several inverse Lyndon… ▽ More

    Submitted 17 December, 2017; v1 submitted 29 May, 2017; originally announced May 2017.

    MSC Class: 68R15 ACM Class: G.2.1; F.4.3

    Journal ref: Advances in Applied Mathematics, Vol. 101, pp. 281-319, 2018

  9. Computing the BWT and LCP array of a Set of Strings in External Memory

    Authors: Paola Bonizzoni, Gianluca Della Vedova, Yuri Pirola, Marco Previtali, Raffaella Rizzi

    Abstract: Indexing very large collections of strings, such as those produced by the widespread next generation sequencing technologies, heavily relies on multistring generalization of the Burrows-Wheeler Transform (BWT): large requirements of in-memory approaches have stimulated recent developments on external memory algorithms. The related problem of computing the Longest Common Prefix (LCP) array of a set… ▽ More

    Submitted 4 December, 2020; v1 submitted 19 May, 2017; originally announced May 2017.

    Comments: Theoretical Computer Science (2020). arXiv admin note: text overlap with arXiv:1607.08342

  10. arXiv:1611.01017  [pdf, other

    cs.DS

    Solving the Persistent Phylogeny Problem in polynomial time

    Authors: Paola Bonizzoni, Gianluca Della Vedova, Gabriella Trucco

    Abstract: The notion of a Persistent Phylogeny generalizes the well-known Perfect phylogeny model that has been thoroughly investigated and is used to explain a wide range of evolutionary phenomena. More precisely, while the Perfect Phylogeny model allows each character to be acquired once in the entire evolutionary history while character losses are not allowed, the Persistent Phylogeny model allows each c… ▽ More

    Submitted 3 November, 2016; originally announced November 2016.

  11. arXiv:1607.08342  [pdf, other

    cs.DS

    A New Lightweight Algorithm to compute the BWT and the LCP array of a Set of Strings

    Authors: Paola Bonizzoni, Gianluca Della Vedova, Serena Nicosia, Marco Previtali, Raffaella Rizzi

    Abstract: Indexing of very large collections of strings such as those produced by the widespread sequencing technologies, heavily relies on multi-string generalizations of the Burrows-Wheeler Transform (BWT), and for this problem various in-memory algorithms have been proposed. The rapid growing of data that are processed routinely, such as in bioinformatics, requires a large amount of main memory, and this… ▽ More

    Submitted 28 July, 2016; originally announced July 2016.

  12. arXiv:1604.03587  [pdf, ps, other

    cs.DS q-bio.GN

    FSG: Fast String Graph Construction for De Novo Assembly of Reads Data

    Authors: Paola Bonizzoni, Gianluca Della Vedova, Yuri Pirola, Marco Previtali, Raffaella Rizzi

    Abstract: The string graph for a collection of next-generation reads is a lossless data representation that is fundamental for de novo assemblers based on the overlap-layout-consensus paradigm. In this paper, we explore a novel approach to compute the string graph, based on the FM-index and Burrows-Wheeler Transform. We describe a simple algorithm that uses only the FM-index representation of the collection… ▽ More

    Submitted 29 May, 2017; v1 submitted 12 April, 2016; originally announced April 2016.

    Comments: Accepted to Journal of Computational Biology

  13. arXiv:1510.01574  [pdf, ps, other

    cs.FL cs.DM

    Splicing Systems from Past to Future: Old and New Challenges

    Authors: Luc Boasson, Paola Bonizzoni, Clelia De Felice, Isabelle Fagnot, Gabriele Fici, Rocco Zaccagnino, Rosalba Zizza

    Abstract: A splicing system is a formal model of a recombinant behaviour of sets of double stranded DNA molecules when acted on by restriction enzymes and ligase. In this survey we will concentrate on a specific behaviour of a type of splicing systems, introduced by Păun and subsequently developed by many researchers in both linear and circular case of splicing definition. In particular, we will present rec… ▽ More

    Submitted 6 October, 2015; originally announced October 2015.

    Comments: Appeared in: Discrete Mathematics and Computer Science. Papers in Memoriam Alexandru Mateescu (1952-2005). The Publishing House of the Romanian Academy, 2014. arXiv admin note: text overlap with arXiv:1112.4897 by other authors

  14. arXiv:1405.7520  [pdf, other

    cs.DS q-bio.GN

    An External-Memory Algorithm for String Graph Construction

    Authors: Paola Bonizzoni, Gianluca Della Vedova, Yuri Pirola, Marco Previtali, Raffaella Rizzi

    Abstract: Some recent results have introduced external-memory algorithms to compute self-indexes of a set of strings, mainly via computing the Burrows-Wheeler Transform (BWT) of the input strings. The motivations for those results stem from Bioinformatics, where a large number of short strings (called reads) are routinely produced and analyzed. In that field, a fundamental problem is to assemble a genome fr… ▽ More

    Submitted 11 June, 2015; v1 submitted 29 May, 2014; originally announced May 2014.

  15. arXiv:1405.7497  [pdf, other

    cs.DS q-bio.QM

    Algorithms for the Constrained Perfect Phylogeny with Persistent Characters

    Authors: Paola Bonizzoni, Anna Paola Carrieri, Gianluca Della Vedova, Gabriella Trucco

    Abstract: The perfect phylogeny is one of the most used models in different areas of computational biology. In this paper we consider the problem of the Persistent Perfect Phylogeny (referred as P-PP) recently introduced to extend the perfect phylogeny model allowing persistent characters, that is characters can be gained and lost at most once. We define a natural generalization of the P-PP problem obtained… ▽ More

    Submitted 29 May, 2014; originally announced May 2014.

  16. Covering Pairs in Directed Acyclic Graphs

    Authors: Niko Beerenwinkel, Stefano Beretta, Paola Bonizzoni, Riccardo Dondi, Yuri Pirola

    Abstract: The Minimum Path Cover problem on directed acyclic graphs (DAGs) is a classical problem that provides a clear and simple mathematical formulation for several applications in different areas and that has an efficient algorithmic solution. In this paper, we study the computational complexity of two constrained variants of Minimum Path Cover motivated by the recent introduction of next-generation seq… ▽ More

    Submitted 18 October, 2013; originally announced October 2013.

    Journal ref: Proc. of Language and Automata Theory and Applications (LATA 2014), LNCS Vol. 8370, 2014, pp 126-137

  17. arXiv:1203.4732  [pdf, other

    cs.DB

    A Unifying Framework to Characterize the Power of a Language to Express Relations

    Authors: Paola Bonizzoni, Peter J. Cameron, Gianluca Della Vedova, Alberto Leporati, Giancarlo Mauri

    Abstract: In this extended abstract we provide a unifying framework that can be used to characterize and compare the expressive power of query languages for different data base models. The framework is based upon the new idea of valid partition, that is a partition of the elements of a given data base, where each class of the partition is composed by elements that cannot be separated (distinguished) accordi… ▽ More

    Submitted 21 March, 2012; originally announced March 2012.

    Comments: 23 pages

  18. arXiv:1110.6739  [pdf, ps, other

    cs.DS cs.CE

    The Binary Perfect Phylogeny with Persistent characters

    Authors: Paola Bonizzoni, Chiara Braghin, Riccardo Dondi, Gabriella Trucco

    Abstract: The binary perfect phylogeny model is too restrictive to model biological events such as back mutations. In this paper we consider a natural generalization of the model that allows a special type of back mutation. We investigate the problem of reconstructing a near perfect phylogeny over a binary set of characters where characters are persistent: characters can be gained and lost at most once. Bas… ▽ More

    Submitted 28 June, 2012; v1 submitted 31 October, 2011; originally announced October 2011.

    Comments: 13 pages, 3 figures

  19. arXiv:1108.0047  [pdf, other

    q-bio.GN cs.CE cs.DS

    Reconstructing Isoform Graphs from RNA-Seq data

    Authors: Stefano Beretta, Paola Bonizzoni, Gianluca Della Vedova, Raffaella Rizzi

    Abstract: Next-generation sequencing (NGS) technologies allow new methodologies for alternative splicing (AS) analysis. Current computational methods for AS from NGS data are mainly focused on predicting splice site junctions or de novo assembly of full-length transcripts. These methods are computationally expensive and produce a huge number of full-length transcripts or splice junctions, spanning the whole… ▽ More

    Submitted 14 August, 2012; v1 submitted 30 July, 2011; originally announced August 2011.

  20. arXiv:1107.3724  [pdf, other

    cs.DS q-bio.PE

    Haplotype Inference on Pedigrees with Recombinations, Errors, and Missing Genotypes via SAT solvers

    Authors: Yuri Pirola, Gianluca Della Vedova, Stefano Biffani, Alessandra Stella, Paola Bonizzoni

    Abstract: The Minimum-Recombinant Haplotype Configuration problem (MRHC) has been highly successful in providing a sound combinatorial formulation for the important problem of genotype phasing on pedigrees. Despite several algorithmic advances and refinements that led to some efficient algorithms, its applicability to real datasets has been limited by the absence of some important characteristics of these d… ▽ More

    Submitted 19 July, 2011; originally announced July 2011.

    Comments: 14 pages, 1 figure, 4 tables, the associated software reHCstar is available at http://www.algolab.eu/reHCstar

    ACM Class: F.2.2

    Journal ref: IEEE/ACM Trans. on Computational Biology and Bioinformatics 9.6 (2012) 1582-1594

  21. Pure Parsimony Xor Haploty**

    Authors: Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola, Romeo Rizzi

    Abstract: The haplotype resolution from xor-genotype data has been recently formulated as a new model for genetic studies. The xor-genotype data is a cheaply obtainable type of data distinguishing heterozygous from homozygous sites without identifying the homozygous alleles. In this paper we propose a formulation based on a well-known model used in haplotype inference: pure parsimony. We exhibit exact sol… ▽ More

    Submitted 8 January, 2010; originally announced January 2010.

    Journal ref: IEEE/ACM Trans. on Computational Biology and Bioinformatics 7.4 (2010) 598-610

  22. Variants of Constrained Longest Common Subsequence

    Authors: Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola

    Abstract: In this work, we consider a variant of the classical Longest Common Subsequence problem called Doubly-Constrained Longest Common Subsequence (DC-LCS). Given two strings s1 and s2 over an alphabet A, a set C_s of strings, and a function Co from A to N, the DC-LCS problem consists in finding the longest subsequence s of s1 and s2 such that s is a supersequence of all the strings in Cs and such tha… ▽ More

    Submitted 2 December, 2009; originally announced December 2009.

    Journal ref: Information Processing Letters 110.20 (2010) 877-881

  23. Circular Languages Generated by Complete Splicing Systems and Pure Unitary Languages

    Authors: Paola Bonizzoni, Clelia De Felice, Rosalba Zizza

    Abstract: Circular splicing systems are a formal model of a generative mechanism of circular words, inspired by a recombinant behaviour of circular DNA. Some unanswered questions are related to the computational power of such systems, and finding a characterization of the class of circular languages generated by circular splicing systems is still an open problem. In this paper we solve this problem for co… ▽ More

    Submitted 12 November, 2009; originally announced November 2009.

    Journal ref: EPTCS 9, 2009, pp. 22-31

  24. arXiv:0910.3148  [pdf, other

    cs.DS cs.DB cs.DM

    Parameterized Complexity of the k-anonymity Problem

    Authors: Stefano Beretta, Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola

    Abstract: The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization that has been recently proposed is the $k$-anonymity. This approach requires that the rows of a table are partitioned in clusters of size at least $k$ and that all the rows in a cluster become the same tuple, after the suppression of some entries. The natural optimiz… ▽ More

    Submitted 17 May, 2010; v1 submitted 16 October, 2009; originally announced October 2009.

    Comments: 22 pages, 2 figures

    Journal ref: J. of Combinatorial Optimization 26.1 (2013) 19-43

  25. arXiv:0907.1840  [pdf, ps, other

    cs.DS

    A PTAS for the Minimum Consensus Clustering Problem with a Fixed Number of Clusters

    Authors: Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi

    Abstract: The Consensus Clustering problem has been introduced as an effective way to analyze the results of different microarray experiments. The problem consists of looking for a partition that best summarizes a set of input partitions (each corresponding to a different microarray experiment) under a simple and intuitive cost function. The problem admits polynomial time algorithms on two input partition… ▽ More

    Submitted 10 July, 2009; originally announced July 2009.

  26. arXiv:0707.0421  [pdf, ps, other

    cs.DB cs.CC cs.DS

    The $k$-anonymity Problem is Hard

    Authors: Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi

    Abstract: The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization recently proposed is the k-anonymity. This approach requires that the rows in a table are clustered in sets of size at least k and that all the rows in a cluster become the same tuple, after the suppression of some records. The natural optimization problem, where the… ▽ More

    Submitted 2 June, 2009; v1 submitted 3 July, 2007; originally announced July 2007.

    Comments: 21 pages, A short version of this paper has been accepted in FCT 2009 - 17th International Symposium on Fundamentals of Computation Theory

  27. Approximating Clustering of Fingerprint Vectors with Missing Values

    Authors: Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi

    Abstract: The problem of clustering fingerprint vectors is an interesting problem in Computational Biology that has been proposed in (Figureroa et al. 2004). In this paper we show some improvements in closing the gaps between the known lower bounds and upper bounds on the approximability of some variants of the biological problem. Namely we are able to prove that the problem is APX-hard even when each fin… ▽ More

    Submitted 23 November, 2005; originally announced November 2005.

    Comments: 13 pages, 4 figures