Skip to main content

Showing 1–19 of 19 results for author: Giancarlo, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.00946  [pdf, other

    cs.DS cs.AI cs.DB cs.IR cs.LG

    From Specific to Generic Learned Sorted Set Dictionaries: A Theoretically Sound Paradigm Yelding Competitive Data Structural Boosters in Practice

    Authors: Domenico Amato, Giosué Lo Bosco, Raffaele Giancarlo

    Abstract: This research concerns Learned Data Structures, a recent area that has emerged at the crossroad of Machine Learning and Classic Data Structures. It is methodologically important and with a high practical impact. We focus on Learned Indexes, i.e., Learned Sorted Set Dictionaries. The proposals available so far are specific in the sense that they can boost, indeed impressively, the time performance… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

    ACM Class: E.1; I.2; H.2

  2. arXiv:2305.05551  [pdf, other

    cs.SE

    Digital Transformation in the Public Administrations: a Guided Tour For Computer Scientists

    Authors: Paolo Ciancarini, Raffaele Giancarlo, Gennaro Grimaudo

    Abstract: Digital Transformation (DT) is the process of integrating digital technologies and solutions into the activities of an organization, whether public or private. This paper focuses on the DT of public sector organizations, where the targets of innovative digital solutions are either the citizens or the administrative bodies or both. This paper is a guided tour for Computer Scientists, as the digital… ▽ More

    Submitted 10 May, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: 30 pages, 3 figures

  3. arXiv:2212.03067  [pdf, other

    cs.DS q-bio.GN

    Pareto Optimal Compression of Genomic Dictionaries, with or without Random Access in Main Memory

    Authors: Raffaele Giancarlo, Gennaro Grimaudo

    Abstract: Motivation: A Genomic Dictionary, i.e., the set of the k-mers appearing in a genome, is a fundamental source of genomic information: its collection is the first step in strategic computational methods ranging from assembly to sequence comparison and phylogeny. Unfortunately, it is costly to store. This motivates some recent studies regarding the compression of those k-mer sets. However, such an ar… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Main: 13 pages, 3 tables, 3 figures; Supplementary Material: 17 pages, 20 tables, 10 figures

  4. arXiv:2211.15565  [pdf, other

    cs.LG cs.AI

    A Critical Analysis of Classifier Selection in Learned Bloom Filters

    Authors: Dario Malchiodi, Davide Raimondi, Giacomo Fumagalli, Raffaele Giancarlo, Marco Frasca

    Abstract: Learned Bloom Filters, i.e., models induced from data via machine learning techniques and solving the approximate set membership problem, have recently been introduced with the aim of enhancing the performance of standard Bloom Filters, with special focus on space occupancy. Unlike in the classical case, the "complexity" of the data used to build the filter might heavily impact on its performance.… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  5. arXiv:2205.05643  [pdf, other

    cs.DS

    A New Class of String Transformations for Compressed Text Indexing

    Authors: Raffaele Giancarlo, Giovanni Manzini, Antonio Restivo, Giovanna Rosone, Marinella Sciortino

    Abstract: Introduced about thirty years ago in the field of Data Compression, the Burrows-Wheeler Transform (BWT) is a string transformation that, besides being a booster of the performance of memoryless compressors, plays a fundamental role in the design of efficient self-indexing compressed data structures. Finding other string transformations with the same remarkable properties of BWT has been a challeng… ▽ More

    Submitted 8 May, 2023; v1 submitted 11 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:1902.01280

  6. arXiv:2203.14777  [pdf, other

    cs.DB cs.IR cs.LG cs.NE

    On the Suitability of Neural Networks as Building Blocks for The Design of Efficient Learned Indexes

    Authors: Domenico Amato, Giosue' Lo Bosco, Raffaele Giancarlo

    Abstract: With the aim of obtaining time/space improvements in classic Data Structures, an emerging trend is to combine Machine Learning techniques with the ones proper of Data Structures. This new area goes under the name of Learned Data Structures. The motivation for its study is a perceived change of paradigm in Computer Architectures that would favour the use of Graphics Processing Units and Tensor Proc… ▽ More

    Submitted 21 February, 2022; originally announced March 2022.

    ACM Class: E.1; I.2; H.2

  7. arXiv:2201.01554  [pdf, other

    cs.DS cs.DB cs.IR cs.LG

    Standard Vs Uniform Binary Search and Their Variants in Learned Static Indexing: The Case of the Searching on Sorted Data Benchmarking Software Platform

    Authors: Domenico Amato, Giosuè Lo Bosco, Raffaele Giancarlo

    Abstract: Learned Indexes are a novel approach to search in a sorted table. A model is used to predict an interval in which to search into and a Binary Search routine is used to finalize the search. They are quite effective. For the final stage, usually, the lower_bound routine of the Standard C++ library is used, although this is more of a natural choice rather than a requirement. However, recent studies,… ▽ More

    Submitted 8 July, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2107.09480

    ACM Class: E.1; I.2; H.2

  8. arXiv:2112.06563  [pdf, other

    cs.LG cs.NE

    On the Choice of General Purpose Classifiers in Learned Bloom Filters: An Initial Analysis Within Basic Filters

    Authors: Giacomo Fumagalli, Davide Raimondi, Raffaele Giancarlo, Dario Malchiodi, Marco Frasca

    Abstract: Bloom Filters are a fundamental and pervasive data structure. Within the growing area of Learned Data Structures, several Learned versions of Bloom Filters have been considered, yielding advantages over classic Filters. Each of them uses a classifier, which is the Learned part of the data structure. Although it has a central role in those new filters, and its space footprint as well as classificat… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: ICPRAM 2022

    MSC Class: 68T07 ACM Class: I.2.6

  9. arXiv:2107.09480  [pdf, other

    cs.IR cs.DB cs.DS cs.LG

    Learned Sorted Table Search and Static Indexes in Small Model Space

    Authors: Domenico Amato, Giosuè Lo Bosco, Raffaele Giancarlo

    Abstract: Machine Learning Techniques, properly combined with Data Structures, have resulted in Learned Static Indexes, innovative and powerful tools that speed-up Binary Search, with the use of additional space with respect to the table being searched into. Such space is devoted to the Machine Learning Model. Although in their infancy, they are methodologically and practically important, due to the pervasi… ▽ More

    Submitted 17 September, 2022; v1 submitted 19 July, 2021; originally announced July 2021.

    ACM Class: E.1; I.2; H.2

  10. arXiv:2107.03341  [pdf, ps, other

    cs.DS cs.DC

    Burrows Wheeler Transform on a Large Scale: Algorithms Implemented in Apache Spark

    Authors: Ylenia Galluzzo, Raffaele Giancarlo, Mario Randazzo, Simona E. Rombo

    Abstract: With the rapid growth of Next Generation Sequencing (NGS) technologies, large amounts of "omics" data are daily collected and need to be processed. Indexing and compressing large sequences datasets are some of the most important tasks in this context. Here we propose algorithms for the computation of Burrows Wheeler transform relying on Big Data technologies, i.e., Apache Spark and Hadoop. Our alg… ▽ More

    Submitted 7 July, 2021; originally announced July 2021.

    Comments: 11 pages, 2 figures, 2 tables. arXiv admin note: substantial text overlap with arXiv:2007.10095

  11. arXiv:2106.15531  [pdf, other

    q-bio.GN cs.DC

    The Power of Word-Frequency Based Alignment-Free Functions: a Comprehensive Large-scale Experimental Analysis -- Version 3

    Authors: Giuseppe Cattaneo, Umberto Ferraro Petrillo, Raffaele Giancarlo, Francesco Palini, Chiara Romualdi

    Abstract: Motivation: Alignment-free (AF) distance/similarity functions are a key tool for sequence analysis. Experimental studies on real datasets abound and, to some extent, there are also studies regarding their control of false positive rate (Type I error). However, assessment of their power, i.e., their ability to identify true similarity, has been limited to some members of the D2 family by experiment… ▽ More

    Submitted 19 October, 2021; v1 submitted 27 June, 2021; originally announced June 2021.

  12. arXiv:2007.13673  [pdf, other

    cs.DC

    FASTA/Q Data Compressors for MapReduce-Hadoop Genomics:Space and Time Savings Made Easy -- Version 1

    Authors: Umberto Ferraro Petrillo, Francesco Palini, Giuseppe Cattaneo, Raffaele Giancarlo

    Abstract: Motivation: Storage of genomic data is a major cost for the Life Sciences, effectively addressed mostly via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

  13. arXiv:2007.10237  [pdf, other

    cs.LG cs.DS stat.ML

    Learning from Data to Speed-up Sorted Table Search Procedures: Methodology and Practical Guidelines

    Authors: Domenico Amato, Giosué Lo Bosco, Raffaele Giancarlo

    Abstract: Sorted Table Search Procedures are the quintessential query-answering tool, with widespread usage that now includes also Web Applications, e.g, Search Engines (Google Chrome) and ad Bidding Systems (AppNexus). Speeding them up, at very little cost in space, is still a quite significant achievement. Here we study to what extend Machine Learning Techniques can contribute to obtain such a speed-up vi… ▽ More

    Submitted 30 July, 2020; v1 submitted 20 July, 2020; originally announced July 2020.

    MSC Class: 68T07; 68P05; 62J05; 68P10 ACM Class: E.1; I.2.0

  14. Alignment-free Genomic Analysis via a Big Data Spark Platform

    Authors: Umberto Ferraro Petrillo, Francesco Palini, Giuseppe Cattaneo, Raffaele Giancarlo

    Abstract: Motivation: Alignment-free distance and similarity functions (AF functions, for short) are a well established alternative to two and multiple sequence alignments for many genomic, metagenomic and epigenomic tasks. Due to data-intensive applications, the computation of AF functions is a Big Data problem, with the recent Literature indicating that the development of fast and scalable algorithms comp… ▽ More

    Submitted 23 October, 2021; v1 submitted 2 May, 2020; originally announced May 2020.

    Journal ref: Bioinformatics, Volume 37, Issue 12, 15 June 2021, Pages 1658-1665

  15. arXiv:1907.02308  [pdf, ps, other

    cs.DS

    The Alternating BWT: an algorithmic perspective

    Authors: Raffaele Giancarlo, Giovanni Manzini, Antonio Restivo, Giovanna Rosone, Marinella Sciortino

    Abstract: The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression. It has become a fundamental tool for designing self-indexing data structures, with important applications in several area in science and engineering. The Alternating Burrows-Wheeler Transform (ABWT) is another transformation recently introduced in [Gessel et al. 2012] and studied in the field of C… ▽ More

    Submitted 4 July, 2019; originally announced July 2019.

  16. arXiv:1902.01280  [pdf, other

    cs.DS

    A New Class of Searchable and Provably Highly Compressible String Transformations

    Authors: Raffaele Giancarlo, Giovanni Manzini, Giovanna Rosone, Marinella Sciortino

    Abstract: The Burrows-Wheeler Transform is a string transformation that plays a fundamental role for the design of self-indexing compressed data structures. Over the years, researchers have successfully extended this transformation outside the domains of strings. However, efforts to find non-trivial alternatives of the original, now 25 years old, Burrows-Wheeler string transformation have met limited succes… ▽ More

    Submitted 4 February, 2019; originally announced February 2019.

  17. arXiv:1807.01566  [pdf, other

    cs.DC

    Analyzing Big Datasets of Genomic Sequences: Fast and Scalable Collection of k-mer Statistics

    Authors: Umberto Ferraro Petrillo, Mara Sorella, Giuseppe Cattaneo, Raffaele Giancarlo, Simona Rombo

    Abstract: Distributed approaches based on the map-reduce programming paradigm have started to be proposed in the bioinformatics domain, due to the large amount of data produced by the next-generation sequencing techniques. However, the use of map-reduce and related Big Data technologies and frameworks (e.g., Apache Hadoop and Spark) does not necessarily produce satisfactory results, in terms of both efficie… ▽ More

    Submitted 4 July, 2018; originally announced July 2018.

  18. arXiv:1205.6010  [pdf, ps, other

    q-bio.GN cs.CE

    The Chromatin Organization of an Eukaryotic Genome : Sequence Specific+ Statistical=Combinatorial (Extended Abstract)

    Authors: Davide Corona, Valeria Di Benedetto, Raffaele Giancarlo, Filippo Utro

    Abstract: Nucleosome organization in eukaryotic genomes has a deep impact on gene function. Although progress has been recently made in the identification of various concurring factors influencing nucleosome positioning, it is still unclear whether nucleosome positions are sequence dictated or determined by a random process. It has been postulated for a long time that,in the proximity of TSS, a barrier dete… ▽ More

    Submitted 27 May, 2012; originally announced May 2012.

    Comments: Work presented at the 8th SIBBM Seminar (Annual Conference Meeting of the Italian Biophysics and Molecular Biology Society)- May 24-26 2012, Palermo, Italy

  19. arXiv:cs/0203018  [pdf, ps, other

    cs.DS

    Improving Table Compression with Combinatorial Optimization

    Authors: Adam L. Buchsbaum, Glenn S. Fowler, Raffaele Giancarlo

    Abstract: We study the problem of compressing massive tables within the partition-training paradigm introduced by Buchsbaum et al. [SODA'00], in which a table is partitioned by an off-line training procedure into disjoint intervals of columns, each of which is compressed separately by a standard, on-line compressor like gzip. We provide a new theory that unifies previous experimental observations on parti… ▽ More

    Submitted 13 March, 2002; originally announced March 2002.

    Comments: 22 pages, 2 figures, 5 tables, 23 references. Extended abstract appears in Proc. 13th ACM-SIAM SODA, pp. 213-222, 2002

    ACM Class: E.4; F.1.3; F.2.2; G.2.1; H.1.1; H.1.8; H.2.7

    Journal ref: JACM 50(6):825-851, 2003