-
When Edge Computing Meets Compact Data Structures
Abstract: Edge computing enables data processing and storage closer to where the data are created. Given the largely distributed compute environment and the significantly dispersed data distribution, there are increasing demands of data sharing and collaborative processing on the edge. Since data shuffling can dominate the overall execution time of collaborative processing jobs, considering the limited powe… ▽ More
Submitted 1 June, 2023; originally announced June 2023.
-
Cell cycle and protein complex dynamics in discovering signaling pathways
Abstract: Signaling pathways are responsible for the regulation of cell processes, such as monitoring the external environment, transmitting information across membranes, and making cell fate decisions. Given the increasing amount of biological data available and the recent discoveries showing that many diseases are related to the disruption of cellular signal transduction cascades, in silico discovery of s… ▽ More
Submitted 6 April, 2020; v1 submitted 26 February, 2020; originally announced February 2020.
Journal ref: Journal of Bioinformatics and Computational Biology 2019
-
arXiv:1912.11944 [pdf, ps, other]
On the Reproducibility of Experiments of Indexing Repetitive Document Collections
Abstract: This work introduces a companion reproducible paper with the aim of allowing the exact replication of the methods, experiments, and results discussed in a previous work [5]. In that parent paper, we proposed many and varied techniques for compressing indexes which exploit that highly repetitive collections are formed mostly of documents that are near-copies of others. More concretely, we describe… ▽ More
Submitted 26 December, 2019; originally announced December 2019.
Comments: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941. Replication framework available at: https://github.com/migumar2/uiHRDC/
Journal ref: Information Systems; Volume 83, July 2019; pages 181-194
-
arXiv:1912.02217 [pdf, ps, other]
Assessing the best edit in perturbation-based iterative refinement algorithms to compute the median string
Abstract: Strings are a natural representation of biological data such as DNA, RNA and protein sequences. The problem of finding a string that summarizes a set of sequences has direct application in relative compression algorithms for genome and proteome analysis, where reference sequences need to be chosen. Median strings have been used as representatives of a set of strings in different domains. However,… ▽ More
Submitted 4 December, 2019; originally announced December 2019.
Comments: 14 pages, 4 figures
Journal ref: Pattern Recognition Letters, Volume 120, 1 April 2019, Pages 104-111
-
arXiv:1911.09498 [pdf, ps, other]
Navigating Planar Topologies in Near-Optimal Space and Time
Abstract: We show that any embedding of a planar graph can be encoded succinctly while efficiently answering a number of topological queries near-optimally. More precisely, we build on a succinct representation that encodes an embedding of $m$ edges within $4m$ bits, which is close to the information-theoretic lower bound of about $3.58m$. With $4m+o(m)$ bits of space, we show how to answer a number of topo… ▽ More
Submitted 10 December, 2021; v1 submitted 21 November, 2019; originally announced November 2019.
Comments: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941. Conference version presented at SPIRE 2019
-
IDEAIS: Smart Voice Assistants to Improve Interaction with SDIs
Abstract: A critical goal, is that organizations and citizens can easily access the geographic information required for good governance. However, despite the costly efforts of governments to create and implement Spatial Data Infrastructures (SDIs), this goal is far from being achieved. This is partly due to the lack of usability of the geoportals through which the geographic information is accessed. In this… ▽ More
Submitted 1 October, 2019; originally announced October 2019.
Comments: This research has received funding from CYTED, Ibero-American Program of Science and Technology for Development, GA No. 519RT0579
-
A Compact Representation of Raster Time Series
Abstract: The raster model is widely used in Geographic Information Systems to represent data that vary continuously in space, such as temperatures, precipitations, elevation, among other spatial attributes. In applications like weather forecast systems, not just a single raster, but a sequence of rasters covering the same region at different timestamps, known as a raster time series, needs to be stored and… ▽ More
Submitted 7 January, 2019; originally announced January 2019.
Comments: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941
Journal ref: Proceedings of the Data Compression Conference (DCC 2019)
-
Faster and Smaller Two-Level Index for Network-based Trajectories
Abstract: Two-level indexes have been widely used to handle trajectories of moving objects that are constrained to a network. The top-level of these indexes handles the spatial dimension, whereas the bottom level handles the temporal dimension. The latter turns out to be an instance of the interval-intersection problem, but it has been tackled by non-specialized spatial indexes. In this work, we propose the… ▽ More
Submitted 4 January, 2019; originally announced January 2019.
Comments: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941
Journal ref: Proceedings of the 25th International Symposium on String Processing and Information Retrieval (SPIRE 2018)
-
arXiv:1803.02576 [pdf, ps, other]
Compact Representations of Event Sequences
Abstract: We introduce a new technique for the efficient management of large sequences of multidimensional data, which takes advantage of regularities that arise in real-world datasets and supports different types of aggregation queries. More importantly, our representation is flexible in the sense that the relevant dimensions and queries may be used to guide the construction process, easily providing a spa… ▽ More
Submitted 7 March, 2018; originally announced March 2018.
Comments: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941
-
arXiv:1610.05994 [pdf, ps, other]
Parallel Construction of Wavelet Trees on Multicore Architectures
Abstract: The wavelet tree has become a very useful data structure to efficiently represent and query large volumes of data in many different domains, from bioinformatics to geographic information systems. One problem with wavelet trees is their construction time. In this paper, we introduce two algorithms that reduce the time complexity of a wavelet tree's construction by taking advantage of nowadays ubiqu… ▽ More
Submitted 19 October, 2016; originally announced October 2016.
Comments: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941
ACM Class: D.1.3; E.1
Journal ref: Knowl Inf Syst (2016)
-
arXiv:1603.02063 [pdf, ps, other]
Aggregated 2D Range Queries on Clustered Points
Abstract: Efficient processing of aggregated range queries on two-dimensional grids is a common requirement in information retrieval and data mining systems, for example in Geographic Information Systems and OLAP cubes. We introduce a technique to represent grids supporting aggregated range queries that requires little space when the data points in the grid are clustered, which is common in practice. We sho… ▽ More
Submitted 30 March, 2016; v1 submitted 7 March, 2016; originally announced March 2016.
Comments: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941
Journal ref: Information Systems, Volume 60, Pages 34-49, 2016
-
Faster Compressed Quadtrees
Abstract: Real-world point sets tend to be clustered, so using a machine word for each point is wasteful. In this paper we first show how a compact representation of quadtrees using $\Oh{1}$ bits per node can break this bound on clustered point sets, while offering efficient range searches. We then describe a new compact quadtree representation based on heavy path decompositions, which supports queries fast… ▽ More
Submitted 8 December, 2021; v1 submitted 11 November, 2014; originally announced November 2014.
Comments: Journal version of DCC '15 paper