Skip to main content

Showing 1–27 of 27 results for author: Baquero, C

.
  1. arXiv:2306.06742  [pdf, other

    cs.DS cs.DC

    Time-limited Bloom Filter

    Authors: Ana Rodrigues, Ariel Shtul, Carlos Baquero, Paulo Sérgio Almeida

    Abstract: A Bloom Filter is a probabilistic data structure designed to check, rapidly and memory-efficiently, whether an element is present in a set. It has been vastly used in various computing areas and several variants, allowing deletions, dynamic sets and working with sliding windows, have surfaced over the years. When summarizing data streams, it becomes relevant to identify the more recent elements in… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: This version extends the 4-page version published in ACM SAC 2023 and adds a section on Experimental Evaluation

  2. arXiv:2108.03284  [pdf, other

    physics.soc-ph cs.DC stat.CO

    Estimating Active Cases of COVID-19

    Authors: Javier Álvarez, Carlos Baquero, Elisa Cabana, Jaya Prakash Champati, Antonio Fernández Anta, Davide Frey, Augusto García-Agúndez, Chryssis Georgiou, Mathieu Goessens, Harold Hernández, Rosa Lillo, Raquel Menezes, Raúl Moreno, Nicolas Nicolaou, Oluwasegun Ojo, Antonio Ortega, Jesús Rufino, Efstathios Stavrakis, Govind Jeevan, Christin Glorioso

    Abstract: Having accurate and timely data on confirmed active COVID-19 cases is challenging, since it depends on testing capacity and the availability of an appropriate infrastructure to perform tests and aggregate their results. In this paper, we propose methods to estimate the number of active cases of COVID-19 from the official data (of confirmed cases and fatalities) and from survey data. We show that t… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

    Comments: Presented at the 2nd KDD Workshop on Data-driven Humanitarian Map**: Harnessing Human-Machine Intelligence for High-Stake Public Policy and Resiliency Planning, August 15, 2021

  3. arXiv:2104.01142  [pdf, other

    cs.DC

    Efficient Replication via Timestamp Stability (Extended Version)

    Authors: Vitor Enes, Carlos Baquero, Alexey Gotsman, Pierre Sutra

    Abstract: Modern web applications replicate their data across the globe and require strong consistency guarantees for their most critical data. These guarantees are usually provided via state-machine replication (SMR). Recent advances in SMR have focused on leaderless protocols, which improve the availability and performance of traditional Paxos-based solutions. We propose Tempo - a leaderless SMR protocol… ▽ More

    Submitted 25 April, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

    Comments: Extended version of a EuroSys'21 paper

  4. arXiv:2012.09086  [pdf, other

    cs.DC

    Causality is Graphically Simple

    Authors: Carlos Baquero

    Abstract: Events in distributed systems include sending or receiving messages, or changing some state in a node. Not all events are related, but some events can cause and influence how other, later events, occur. For instance, a reply to a received mail message is influenced by that message, and maybe by other prior messages also received. This article brings an introduction to classic causality tracking me… ▽ More

    Submitted 16 December, 2020; originally announced December 2020.

    Comments: 19 pages

    ACM Class: A.1

  5. arXiv:2005.12783  [pdf, other

    cs.DC cs.CY stat.AP

    CoronaSurveys: Using Surveys with Indirect Reporting to Estimate the Incidence and Evolution of Epidemics

    Authors: Oluwasegun Ojo, Augusto García-Agundez, Benjamin Girault, Harold Hernández, Elisa Cabana, Amanda García-García, Payman Arabshahi, Carlos Baquero, Paolo Casari, Ednaldo José Ferreira, Davide Frey, Chryssis Georgiou, Mathieu Goessens, Anna Ishchenko, Ernesto Jiménez, Oleksiy Kebkal, Rosa Lillo, Raquel Menezes, Nicolas Nicolaou, Antonio Ortega, Paul Patras, Julian C Roberts, Efstathios Stavrakis, Yuichi Tanaka, Antonio Fernández Anta

    Abstract: The world is suffering from a pandemic called COVID-19, caused by the SARS-CoV-2 virus. National governments have problems evaluating the reach of the epidemic, due to having limited resources and tests at their disposal. This problem is especially acute in low and middle-income countries (LMICs). Hence, any simple, cheap and flexible means of evaluating the incidence and evolution of the epidemic… ▽ More

    Submitted 26 June, 2020; v1 submitted 24 May, 2020; originally announced May 2020.

    Comments: Presented at The KDD Workshop on Humanitarian Map**, San Diego, California USA, August 24, 2020

  6. arXiv:2003.11789  [pdf, other

    cs.DC

    State-Machine Replication for Planet-Scale Systems (Extended Version)

    Authors: Vitor Enes, Carlos Baquero, Tuanir França Rezende, Alexey Gotsman, Matthieu Perrin, Pierre Sutra

    Abstract: Online applications now routinely replicate their data at multiple sites around the world. In this paper we present Atlas, the first state-machine replication protocol tailored for such planet-scale systems. Atlas does not rely on a distinguished leader, so clients enjoy the same quality of service independently of their geographical locations. Furthermore, client-perceived latency improves as we… ▽ More

    Submitted 18 May, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

    Comments: Extended version of a EuroSys'20 paper

  7. arXiv:2001.03147  [pdf, other

    cs.DS cs.DB cs.DC

    Age-Partitioned Bloom Filters

    Authors: Ariel Shtul, Carlos Baquero, Paulo Sérgio Almeida

    Abstract: Bloom filters (BF) are widely used for approximate membership queries over a set of elements. BF variants allow removals, sets of unbounded size or querying a sliding window over an unbounded stream. However, for this last case the best current approaches are dictionary based (e.g., based on Cuckoo Filters or TinyTable), and it may seem that BF-based approaches will never be competitive to diction… ▽ More

    Submitted 9 January, 2020; originally announced January 2020.

    Comments: 25 pages

    ACM Class: E.1; H.3

  8. Conflict-free Replicated Data Types (CRDTs)

    Authors: Nuno Preguiça, Carlos Baquero, Marc Shapiro

    Abstract: A conflict-free replicated data type (CRDT) is an abstract data type, with a well defined interface, designed to be replicated at multiple processes and exhibiting the following properties: (1) any replica can be modified without coordinating with another replicas; (2) when any two replicas have received the same set of updates, they reach the same state, deterministically, by adopting mathematica… ▽ More

    Submitted 16 May, 2018; originally announced May 2018.

    Journal ref: Sakr, Sherif and Zomaya, Albert. Encyclopedia of Big Data Technologies, Springer International Publishing, 2018, Encyclopedia of Big Data Technologies, 978-3-319-63962-8

  9. arXiv:1803.02750  [pdf, other

    cs.DC cs.DS

    Efficient Synchronization of State-based CRDTs

    Authors: Vitor Enes, Paulo Sérgio Almeida, Carlos Baquero, João Leitão

    Abstract: To ensure high availability in large scale distributed systems, Conflict-free Replicated Data Types (CRDTs) relax consistency by allowing immediate query and update operations at the local replica, with no need for remote synchronization. State-based CRDTs synchronize replicas by periodically sending their full state to other replicas, which can become extremely costly as the CRDT state grows. Del… ▽ More

    Submitted 11 March, 2019; v1 submitted 7 March, 2018; originally announced March 2018.

    Comments: To be published at the 35th IEEE International Conference on Data Engineering

  10. arXiv:1710.04469  [pdf, ps, other

    cs.DC cs.DB cs.DS

    Pure Operation-Based Replicated Data Types

    Authors: Carlos Baquero, Paulo Sergio Almeida, Ali Shoker

    Abstract: Distributed systems designed to serve clients across the world often make use of geo-replication to attain low latency and high availability. Conflict-free Replicated Data Types (CRDTs) allow the design of predictable multi-master replication and support eventual consistency of replicas that are allowed to transiently diverge. CRDTs come in two flavors: state-based, where a state is changed locall… ▽ More

    Submitted 12 October, 2017; originally announced October 2017.

    Comments: 30 pages

  11. Practical Evaluation of the Lasp Programming Model at Large Scale - An Experience Report

    Authors: Christopher S. Meiklejohn, Vitor Enes, Junghun Yoo, Carlos Baquero, Peter Van Roy, Annette Bieniusa

    Abstract: Programming models for building large-scale distributed applications assist the developer in reasoning about consistency and distribution. However, many of the programming models for weak consistency, which promise the largest scalability gains, have little in the way of evaluation to demonstrate the promised scalability. We present an experience report on the implementation and large-scale evalua… ▽ More

    Submitted 21 August, 2017; originally announced August 2017.

  12. arXiv:1705.03704  [pdf, other

    cs.DC

    Global-Local View: Scalable Consistency for Concurrent Data Types

    Authors: Deepthi Devaki Akkoorath, José Brandão, Annette Bieniusa, Carlos Baquero

    Abstract: Concurrent linearizable access to shared objects can be prohibitively expensive in a high contention workload. Many applications apply ad-hoc techniques to eliminate the need of synchronous atomic updates, which may result in non-linearizable implementations. We propose a new programming model which leverages such patterns for concurrent access to objects in a shared memory system. In this model,… ▽ More

    Submitted 10 May, 2017; originally announced May 2017.

    Comments: 16 pages

  13. Worlds of Events: Deduction with Partial Knowledge about Causality

    Authors: Seyed Hossein Haeri, Peter Van Roy, Carlos Baquero, Christopher Meiklejohn

    Abstract: Interactions between internet users are mediated by their devices and the common support infrastructure in data centres. Kee** track of causality amongst actions that take place in this distributed system is key to provide a seamless interaction where effects follow causes. Tracking causality in large scale interactions is difficult due to the cost of kee** large quantities of metadata; even m… ▽ More

    Submitted 10 August, 2016; originally announced August 2016.

    Comments: In Proceedings ICE 2016, arXiv:1608.03131

    ACM Class: C.2.4; F.4.1

    Journal ref: EPTCS 223, 2016, pp. 113-127

  14. Delta State Replicated Data Types

    Authors: Paulo Sérgio Almeida, Ali Shoker, Carlos Baquero

    Abstract: CRDTs are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs ensure convergence through disseminating the entire state, that may be large, and merging it to other replicas; whereas operation-based CRDTs disseminate operations (i.e., small states) assuming an exactly-once reliable dissemination layer. We introduce D… ▽ More

    Submitted 4 March, 2016; originally announced March 2016.

    Comments: arXiv admin note: substantial text overlap with arXiv:1410.2803

    Journal ref: Journal of Parallel and Distributed Computing, Volume 111, January 2018, Pages 162-173

  15. arXiv:1511.05010  [pdf, other

    cs.DC cs.DB cs.DS

    Eventually Consistent Register Revisited

    Authors: Marek Zawirski, Carlos Baquero, Annette Bieniusa, Nuno Preguiça, Marc Shapiro

    Abstract: In order to converge in the presence of concurrent updates, modern eventually consistent replication systems rely on causality information and operation semantics. It is relatively easy to use semantics of high-level operations on replicated data structures, such as sets, lists, etc. However, it is difficult to exploit semantics of operations on registers, which store opaque data. In existing regi… ▽ More

    Submitted 16 November, 2015; originally announced November 2015.

    Comments: 8 pages

  16. arXiv:1410.2803  [pdf, ps, other

    cs.DC cs.DB cs.DS cs.PF

    Efficient State-based CRDTs by Delta-Mutation

    Authors: Paulo Sérgio Almeida, Ali Shoker, Carlos Baquero

    Abstract: CRDTs are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs ensure convergence through disseminating the en- tire state, that may be large, and merging it to other replicas; whereas operation-based CRDTs disseminate operations (i.e., small states) assuming an exactly-once reliable dissemination layer. We introduce… ▽ More

    Submitted 3 March, 2015; v1 submitted 10 October, 2014; originally announced October 2014.

    Comments: 19 pages

  17. arXiv:1310.3107  [pdf, ps, other

    cs.DC cs.DB

    SwiftCloud: Fault-Tolerant Geo-Replication Integrated all the Way to the Client Machine

    Authors: Marek Zawirski, Annette Bieniusa, Valter Balegas, Sérgio Duarte, Carlos Baquero, Marc Shapiro, Nuno Preguiça

    Abstract: Client-side logic and storage are increasingly used in web and mobile applications to improve response time and availability. Current approaches tend to be ad-hoc and poorly integrated with the server-side logic. We present a principled approach to integrate client- and server-side storage. We support mergeable and strongly consistent transactions that target either client or server replicas and p… ▽ More

    Submitted 11 October, 2013; originally announced October 2013.

    Report number: RR-8347

    Journal ref: N° RR-8347 (2013)

  18. arXiv:1307.3207  [pdf, other

    cs.DC

    Scalable Eventually Consistent Counters over Unreliable Networks

    Authors: Paulo Sérgio Almeida, Carlos Baquero

    Abstract: Counters are an important abstraction in distributed computing, and play a central role in large scale geo-replicated systems, counting events such as web page impressions or social network "likes". Classic distributed counters, strongly consistent, cannot be made both available and partition-tolerant, due to the CAP Theorem, being unsuitable to large scale scenarios. This paper defines Eventually… ▽ More

    Submitted 11 July, 2013; originally announced July 2013.

    ACM Class: C.2.4; E.1; H.2.4

  19. arXiv:1303.5909  [pdf

    cs.SI physics.soc-ph

    Genetic Algorithm with a Local Search Strategy for Discovering Communities in Complex Networks

    Authors: Dayou Liu, Di **, Carlos Baquero, Dongxiao He, Bo Yang, Qiangyuan Yu

    Abstract: In order to further improve the performance of current genetic algorithms aiming at discovering communities, a local search based genetic algorithm GALS is here proposed. The core of GALS is a local search based mutation technique. In order to overcome the drawbacks of traditional mutation methods, the paper develops the concept of marginal gene and then the local monotonicity of modularity functi… ▽ More

    Submitted 23 March, 2013; originally announced March 2013.

    Comments: 17 pages, 8 figures. arXiv admin note: text overlap with arXiv:1303.4711

    Journal ref: International Journal of Computational Intelligence Systems, Vol. 6, No. 2 (March, 2013), 354-369

  20. arXiv:1303.5675  [pdf

    cs.SI cond-mat.stat-mech physics.soc-ph

    Markov random walk under constraint for discovering overlap** communities in complex networks

    Authors: Di **, Bo Yang, Carlos Baquero, Dayou Liu, Dongxiao He, Jie Liu

    Abstract: Detection of overlap** communities in complex networks has motivated recent research in the relevant fields. Aiming this problem, we propose a Markov dynamics based algorithm, called UEOC, which means, 'unfold and extract overlap** communities'. In UEOC, when identifying each natural community that overlaps, a Markov random walk method combined with a constraint strategy, which is based on the… ▽ More

    Submitted 22 March, 2013; originally announced March 2013.

    Comments: 21 pages, 8 pages, 2 tables

    Journal ref: Journal of Statistical Mechanics: Theory and Experiment, P05031, 2011

  21. arXiv:1210.3368  [pdf, other

    cs.DC cs.DS

    An optimized conflict-free replicated set

    Authors: Annette Bieniusa, Marek Zawirski, Nuno Preguiça, Marc Shapiro, Carlos Baquero, Valter Balegas, Sérgio Duarte

    Abstract: Eventual consistency of replicated data supports concurrent updates, reduces latency and improves fault tolerance, but forgoes strong consistency. Accordingly, several cloud computing platforms implement eventually-consistent data types. The set is a widespread and useful abstraction, and many replicated set designs have been proposed. We present a reasoning abstraction, permutation equivalence, t… ▽ More

    Submitted 11 October, 2012; originally announced October 2012.

    Comments: No. RR-8083 (2012)

  22. arXiv:1204.1373  [pdf, other

    cs.DC cs.DS

    Spectra: Robust Estimation of Distribution Functions in Networks

    Authors: Miguel Borges, Paulo Jesus, Carlos Baquero, Paulo Sérgio Almeida

    Abstract: Distributed aggregation allows the derivation of a given global aggregate property from many individual local values in nodes of an interconnected network system. Simple aggregates such as minima/maxima, counts, sums and averages have been thoroughly studied in the past and are important tools for distributed algorithms and network coordination. Nonetheless, this kind of aggregates may not be comp… ▽ More

    Submitted 5 April, 2012; originally announced April 2012.

    Comments: Full version of the paper published at 12th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS), Stockholm (Sweden), June 2012

    ACM Class: C.2.4; G.3

  23. arXiv:1111.6087  [pdf, other

    cs.DC cs.NI cs.SI

    Fast Distributed Computation of Distances in Networks

    Authors: Paulo Sérgio Almeida, Carlos Baquero, Alcino Cunha

    Abstract: This paper presents a distributed algorithm to simultaneously compute the diameter, radius and node eccentricity in all nodes of a synchronous network. Such topological information may be useful as input to configure other algorithms. Previous approaches have been modular, progressing in sequential phases using building blocks such as BFS tree construction, thus incurring longer executions than st… ▽ More

    Submitted 25 November, 2011; originally announced November 2011.

    Comments: 12 pages

    Journal ref: IEEE 51st Annual Conference on Decision and Control (2012), 5215-5220

  24. arXiv:1110.0725  [pdf, other

    cs.DC cs.DS cs.IR cs.NI

    A Survey of Distributed Data Aggregation Algorithms

    Authors: Paulo Jesus, Carlos Baquero, Paulo Sérgio Almeida

    Abstract: Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, that can then be used to direct the execution of other applications. The resulting values result from the distributed computation of functions like COUNT, SUM and AVERAGE. Some application examples can found to determine the network size, total storage capacity, average load… ▽ More

    Submitted 4 October, 2011; originally announced October 2011.

    Comments: 45 pages, Technical Report

    ACM Class: C.2.4; A.1

  25. Fault-Tolerant Aggregation: Flow-Updating Meets Mass-Distribution

    Authors: Paulo S. Almeida, Carlos Baquero, Martin Farach-Colton, Paulo Jesus, Miguel A. Mosteiro

    Abstract: Flow-Updating (FU) is a fault-tolerant technique that has proved to be efficient in practice for the distributed computation of aggregate functions in communication networks where individual processors do not have access to global information. Previous distributed aggregation protocols, based on repeated sharing of input values (or mass) among processors, sometimes called Mass-Distribution (MD) pr… ▽ More

    Submitted 20 September, 2011; originally announced September 2011.

    Comments: 18 pages, 5 figures, To appear in OPODIS 2011

  26. arXiv:1011.6596  [pdf, ps, other

    cs.DC

    Dependability in Aggregation by Averaging

    Authors: Paulo Jesus, Carlos Baquero, Paulo Sérgio Almeida

    Abstract: Aggregation is an important building block of modern distributed applications, allowing the determination of meaningful properties (e.g. network size, total storage capacity, average load, majorities, etc.) that are used to direct the execution of the system. However, the majority of the existing aggregation algorithms exhibit relevant dependability issues, when prospecting their use in real appli… ▽ More

    Submitted 30 November, 2010; originally announced November 2010.

    Comments: 14 pages. Presented in Inforum 2009

    ACM Class: C.2.4

  27. arXiv:1011.5808  [pdf, other

    cs.DC

    Dotted Version Vectors: Logical Clocks for Optimistic Replication

    Authors: Nuno Preguiça, Carlos Baquero, Paulo Sérgio Almeida, Victor Fonte, Ricardo Gonçalves

    Abstract: In cloud computing environments, a large number of users access data stored in highly available storage systems. To provide good performance to geographically disperse users and allow operation even in the presence of failures or network partitions, these systems often rely on optimistic replication solutions that guarantee only eventual consistency. In this scenario, it is important to be able to… ▽ More

    Submitted 26 November, 2010; originally announced November 2010.

    Comments: Preprint, submitted for publication. 12 pages

    ACM Class: C.2.4; E.1