Search | arXiv e-print repository

Time-limited Bloom Filter

Authors: Ana Rodrigues, Ariel Shtul, Carlos Baquero, Paulo Sérgio Almeida

Abstract: A Bloom Filter is a probabilistic data structure designed to check, rapidly and memory-efficiently, whether an element is present in a set. It has been vastly used in various computing areas and several variants, allowing deletions, dynamic sets and working with sliding windows, have surfaced over the years. When summarizing data streams, it becomes relevant to identify the more recent elements in… ▽ More A Bloom Filter is a probabilistic data structure designed to check, rapidly and memory-efficiently, whether an element is present in a set. It has been vastly used in various computing areas and several variants, allowing deletions, dynamic sets and working with sliding windows, have surfaced over the years. When summarizing data streams, it becomes relevant to identify the more recent elements in the stream. However, most of the sliding window schemes consider the most recent items of a data stream without considering time as a factor. While this allows, e.g., storing the most recent 10000 elements, it does not easily translate into storing elements received in the last 60 seconds, unless the insertion rate is stable and known in advance. In this paper, we present the Time-limited Bloom Filter, a new BF-based approach that can save information of a given time period and correctly identify it as present when queried, while also being able to retire data when it becomes stale. The approach supports variable insertion rates while striving to keep a target false positive rate. We also make available a reference implementation of the data structure as a Redis module. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: This version extends the 4-page version published in ACM SAC 2023 and adds a section on Experimental Evaluation

arXiv:2108.03284 [pdf, other]

Estimating Active Cases of COVID-19

Authors: Javier Álvarez, Carlos Baquero, Elisa Cabana, Jaya Prakash Champati, Antonio Fernández Anta, Davide Frey, Augusto García-Agúndez, Chryssis Georgiou, Mathieu Goessens, Harold Hernández, Rosa Lillo, Raquel Menezes, Raúl Moreno, Nicolas Nicolaou, Oluwasegun Ojo, Antonio Ortega, Jesús Rufino, Efstathios Stavrakis, Govind Jeevan, Christin Glorioso

Abstract: Having accurate and timely data on confirmed active COVID-19 cases is challenging, since it depends on testing capacity and the availability of an appropriate infrastructure to perform tests and aggregate their results. In this paper, we propose methods to estimate the number of active cases of COVID-19 from the official data (of confirmed cases and fatalities) and from survey data. We show that t… ▽ More Having accurate and timely data on confirmed active COVID-19 cases is challenging, since it depends on testing capacity and the availability of an appropriate infrastructure to perform tests and aggregate their results. In this paper, we propose methods to estimate the number of active cases of COVID-19 from the official data (of confirmed cases and fatalities) and from survey data. We show that the latter is a viable option in countries with reduced testing capacity or suboptimal infrastructures. △ Less

Submitted 6 August, 2021; originally announced August 2021.

Comments: Presented at the 2nd KDD Workshop on Data-driven Humanitarian Map**: Harnessing Human-Machine Intelligence for High-Stake Public Policy and Resiliency Planning, August 15, 2021

arXiv:2104.01142 [pdf, other]

Efficient Replication via Timestamp Stability (Extended Version)

Authors: Vitor Enes, Carlos Baquero, Alexey Gotsman, Pierre Sutra

Abstract: Modern web applications replicate their data across the globe and require strong consistency guarantees for their most critical data. These guarantees are usually provided via state-machine replication (SMR). Recent advances in SMR have focused on leaderless protocols, which improve the availability and performance of traditional Paxos-based solutions. We propose Tempo - a leaderless SMR protocol… ▽ More Modern web applications replicate their data across the globe and require strong consistency guarantees for their most critical data. These guarantees are usually provided via state-machine replication (SMR). Recent advances in SMR have focused on leaderless protocols, which improve the availability and performance of traditional Paxos-based solutions. We propose Tempo - a leaderless SMR protocol that, in comparison to prior solutions, achieves superior throughput and offers predictable performance even in contended workloads. To achieve these benefits, Tempo timestamps each application command and executes it only after the timestamp becomes stable, i.e., all commands with a lower timestamp are known. Both the timestam** and stability detection mechanisms are fully decentralized, thus obviating the need for a leader replica. Our protocol furthermore generalizes to partial replication settings, enabling scalability in highly parallel workloads. We evaluate the protocol in both real and simulated geo-distributed environments and demonstrate that it outperforms state-of-the-art alternatives. △ Less

Submitted 25 April, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

Comments: Extended version of a EuroSys'21 paper

arXiv:2012.09086 [pdf, other]

Causality is Graphically Simple

Authors: Carlos Baquero

Abstract: Events in distributed systems include sending or receiving messages, or changing some state in a node. Not all events are related, but some events can cause and influence how other, later events, occur. For instance, a reply to a received mail message is influenced by that message, and maybe by other prior messages also received. This article brings an introduction to classic causality tracking me… ▽ More Events in distributed systems include sending or receiving messages, or changing some state in a node. Not all events are related, but some events can cause and influence how other, later events, occur. For instance, a reply to a received mail message is influenced by that message, and maybe by other prior messages also received. This article brings an introduction to classic causality tracking mechanisms and covers some more recent developments. The presentation is supported by a new graphical notation that allows an intuitive interpretation of the causality relations described. △ Less

Submitted 16 December, 2020; originally announced December 2020.

Comments: 19 pages

ACM Class: A.1

arXiv:2005.12783 [pdf, other]

CoronaSurveys: Using Surveys with Indirect Reporting to Estimate the Incidence and Evolution of Epidemics

Authors: Oluwasegun Ojo, Augusto García-Agundez, Benjamin Girault, Harold Hernández, Elisa Cabana, Amanda García-García, Payman Arabshahi, Carlos Baquero, Paolo Casari, Ednaldo José Ferreira, Davide Frey, Chryssis Georgiou, Mathieu Goessens, Anna Ishchenko, Ernesto Jiménez, Oleksiy Kebkal, Rosa Lillo, Raquel Menezes, Nicolas Nicolaou, Antonio Ortega, Paul Patras, Julian C Roberts, Efstathios Stavrakis, Yuichi Tanaka, Antonio Fernández Anta

Abstract: The world is suffering from a pandemic called COVID-19, caused by the SARS-CoV-2 virus. National governments have problems evaluating the reach of the epidemic, due to having limited resources and tests at their disposal. This problem is especially acute in low and middle-income countries (LMICs). Hence, any simple, cheap and flexible means of evaluating the incidence and evolution of the epidemic… ▽ More The world is suffering from a pandemic called COVID-19, caused by the SARS-CoV-2 virus. National governments have problems evaluating the reach of the epidemic, due to having limited resources and tests at their disposal. This problem is especially acute in low and middle-income countries (LMICs). Hence, any simple, cheap and flexible means of evaluating the incidence and evolution of the epidemic in a given country with a reasonable level of accuracy is useful. In this paper, we propose a technique based on (anonymous) surveys in which participants report on the health status of their contacts. This indirect reporting technique, known in the literature as network scale-up method, preserves the privacy of the participants and their contacts, and collects information from a larger fraction of the population (as compared to individual surveys). This technique has been deployed in the CoronaSurveys project, which has been collecting reports for the COVID-19 pandemic for more than two months. Results obtained by CoronaSurveys show the power and flexibility of the approach, suggesting that it could be an inexpensive and powerful tool for LMICs. △ Less

Submitted 26 June, 2020; v1 submitted 24 May, 2020; originally announced May 2020.

Comments: Presented at The KDD Workshop on Humanitarian Map**, San Diego, California USA, August 24, 2020

arXiv:2003.11789 [pdf, other]

State-Machine Replication for Planet-Scale Systems (Extended Version)

Authors: Vitor Enes, Carlos Baquero, Tuanir França Rezende, Alexey Gotsman, Matthieu Perrin, Pierre Sutra

Abstract: Online applications now routinely replicate their data at multiple sites around the world. In this paper we present Atlas, the first state-machine replication protocol tailored for such planet-scale systems. Atlas does not rely on a distinguished leader, so clients enjoy the same quality of service independently of their geographical locations. Furthermore, client-perceived latency improves as we… ▽ More Online applications now routinely replicate their data at multiple sites around the world. In this paper we present Atlas, the first state-machine replication protocol tailored for such planet-scale systems. Atlas does not rely on a distinguished leader, so clients enjoy the same quality of service independently of their geographical locations. Furthermore, client-perceived latency improves as we add sites closer to clients. To achieve this, Atlas minimizes the size of its quorums using an observation that concurrent data center failures are rare. It also processes a high percentage of accesses in a single round trip, even when these conflict. We experimentally demonstrate that Atlas consistently outperforms state-of-the-art protocols in planet-scale scenarios. In particular, Atlas is up to two times faster than Flexible Paxos with identical failure assumptions, and more than doubles the performance of Egalitarian Paxos in the YCSB benchmark. △ Less

Submitted 18 May, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

Comments: Extended version of a EuroSys'20 paper

arXiv:2001.03147 [pdf, other]

Age-Partitioned Bloom Filters

Authors: Ariel Shtul, Carlos Baquero, Paulo Sérgio Almeida

Abstract: Bloom filters (BF) are widely used for approximate membership queries over a set of elements. BF variants allow removals, sets of unbounded size or querying a sliding window over an unbounded stream. However, for this last case the best current approaches are dictionary based (e.g., based on Cuckoo Filters or TinyTable), and it may seem that BF-based approaches will never be competitive to diction… ▽ More Bloom filters (BF) are widely used for approximate membership queries over a set of elements. BF variants allow removals, sets of unbounded size or querying a sliding window over an unbounded stream. However, for this last case the best current approaches are dictionary based (e.g., based on Cuckoo Filters or TinyTable), and it may seem that BF-based approaches will never be competitive to dictionary-based ones. In this paper we present Age-Partitioned Bloom Filters, a BF-based approach for duplicate detection in sliding windows that not only is competitive in time-complexity, but has better space usage than current dictionary-based approaches (e.g., SWAMP), at the cost of some moderate slack. APBFs retain the BF simplicity, unlike dictionary-based approaches, important for hardware-based implementations, and can integrate known improvements such as double hashing or blocking. We present an Age-Partitioned Blocked Bloom Filter variant which can operate with 2-3 cache-line accesses per insertion and around 2-4 per query, even for high accuracy filters. △ Less

Submitted 9 January, 2020; originally announced January 2020.

Comments: 25 pages

ACM Class: E.1; H.3

arXiv:1805.06358 [pdf, other]

doi 10.1007/978-3-319-63962-8\_185-1

Conflict-free Replicated Data Types (CRDTs)

Authors: Nuno Preguiça, Carlos Baquero, Marc Shapiro

Abstract: A conflict-free replicated data type (CRDT) is an abstract data type, with a well defined interface, designed to be replicated at multiple processes and exhibiting the following properties: (1) any replica can be modified without coordinating with another replicas; (2) when any two replicas have received the same set of updates, they reach the same state, deterministically, by adopting mathematica… ▽ More A conflict-free replicated data type (CRDT) is an abstract data type, with a well defined interface, designed to be replicated at multiple processes and exhibiting the following properties: (1) any replica can be modified without coordinating with another replicas; (2) when any two replicas have received the same set of updates, they reach the same state, deterministically, by adopting mathematically sound rules to guarantee state convergence. △ Less

Submitted 16 May, 2018; originally announced May 2018.

Journal ref: Sakr, Sherif and Zomaya, Albert. Encyclopedia of Big Data Technologies, Springer International Publishing, 2018, Encyclopedia of Big Data Technologies, 978-3-319-63962-8

arXiv:1803.02750 [pdf, other]

Efficient Synchronization of State-based CRDTs

Authors: Vitor Enes, Paulo Sérgio Almeida, Carlos Baquero, João Leitão

Abstract: To ensure high availability in large scale distributed systems, Conflict-free Replicated Data Types (CRDTs) relax consistency by allowing immediate query and update operations at the local replica, with no need for remote synchronization. State-based CRDTs synchronize replicas by periodically sending their full state to other replicas, which can become extremely costly as the CRDT state grows. Del… ▽ More To ensure high availability in large scale distributed systems, Conflict-free Replicated Data Types (CRDTs) relax consistency by allowing immediate query and update operations at the local replica, with no need for remote synchronization. State-based CRDTs synchronize replicas by periodically sending their full state to other replicas, which can become extremely costly as the CRDT state grows. Delta-based CRDTs address this problem by producing small incremental states (deltas) to be used in synchronization instead of the full state. However, current synchronisation algorithms for Delta-based CRDTs induce redundant wasteful delta propagation, performing worse than expected, and surprisingly, no better than State-based. In this paper we: 1) identify two sources of inefficiency in current synchronization algorithms for delta-based CRDTs; 2) bring the concept of join decomposition to state-based CRDTs; 3) exploit join decompositions to obtain optimal deltas and 4) improve the efficiency of synchronization algorithms; and finally, 5) evaluate the improved algorithms. △ Less

Submitted 11 March, 2019; v1 submitted 7 March, 2018; originally announced March 2018.

Comments: To be published at the 35th IEEE International Conference on Data Engineering

arXiv:1710.04469 [pdf, ps, other]

Pure Operation-Based Replicated Data Types

Authors: Carlos Baquero, Paulo Sergio Almeida, Ali Shoker

Abstract: Distributed systems designed to serve clients across the world often make use of geo-replication to attain low latency and high availability. Conflict-free Replicated Data Types (CRDTs) allow the design of predictable multi-master replication and support eventual consistency of replicas that are allowed to transiently diverge. CRDTs come in two flavors: state-based, where a state is changed locall… ▽ More Distributed systems designed to serve clients across the world often make use of geo-replication to attain low latency and high availability. Conflict-free Replicated Data Types (CRDTs) allow the design of predictable multi-master replication and support eventual consistency of replicas that are allowed to transiently diverge. CRDTs come in two flavors: state-based, where a state is changed locally and shipped and merged into other replicas; operation-based, where operations are issued locally and reliably causal broadcast to all other replicas. However, the standard definition of op-based CRDTs is very encompassing, allowing even sending the full-state, and thus imposing storage and dissemination overheads as well as blurring the distinction from state-based CRDTs. We introduce pure op-based CRDTs, that can only send operations to other replicas, drawing a clear distinction from state-based ones. Data types with commutative operations can be trivially implemented as pure op-based CRDTs using standard reliable causal delivery; whereas data types having non-commutative operations are implemented using a PO-Log, a partially ordered log of operations, and making use of an extended API, i.e., a Tagged Causal Stable Broadcast (TCSB), that provides extra causality information upon delivery and later informs when delivered messages become causally stable, allowing further PO-Log compaction. The framework is illustrated by a catalog of pure op-based specifications for classic CRDTs, including counters, multi-value registers, add-wins and remove-wins sets. △ Less

Submitted 12 October, 2017; originally announced October 2017.

Comments: 30 pages

arXiv:1708.06423 [pdf, other]

doi 10.1145/3131851.3131862

Practical Evaluation of the Lasp Programming Model at Large Scale - An Experience Report

Authors: Christopher S. Meiklejohn, Vitor Enes, Junghun Yoo, Carlos Baquero, Peter Van Roy, Annette Bieniusa

Abstract: Programming models for building large-scale distributed applications assist the developer in reasoning about consistency and distribution. However, many of the programming models for weak consistency, which promise the largest scalability gains, have little in the way of evaluation to demonstrate the promised scalability. We present an experience report on the implementation and large-scale evalua… ▽ More Programming models for building large-scale distributed applications assist the developer in reasoning about consistency and distribution. However, many of the programming models for weak consistency, which promise the largest scalability gains, have little in the way of evaluation to demonstrate the promised scalability. We present an experience report on the implementation and large-scale evaluation of one of these models, Lasp, originally presented at PPDP `15, which provides a declarative, functional programming style for distributed applications. We demonstrate the scalability of Lasp's prototype runtime implementation up to 1024 nodes in the Amazon cloud computing environment. It achieves high scalability by uniquely combining hybrid gossip with a programming model based on convergent computation. We report on the engineering challenges of this implementation and its evaluation, specifically related to operating research prototypes in a production cloud environment. △ Less

Submitted 21 August, 2017; originally announced August 2017.

arXiv:1705.03704 [pdf, other]

Global-Local View: Scalable Consistency for Concurrent Data Types

Authors: Deepthi Devaki Akkoorath, José Brandão, Annette Bieniusa, Carlos Baquero

Abstract: Concurrent linearizable access to shared objects can be prohibitively expensive in a high contention workload. Many applications apply ad-hoc techniques to eliminate the need of synchronous atomic updates, which may result in non-linearizable implementations. We propose a new programming model which leverages such patterns for concurrent access to objects in a shared memory system. In this model,… ▽ More Concurrent linearizable access to shared objects can be prohibitively expensive in a high contention workload. Many applications apply ad-hoc techniques to eliminate the need of synchronous atomic updates, which may result in non-linearizable implementations. We propose a new programming model which leverages such patterns for concurrent access to objects in a shared memory system. In this model, each thread maintains different views on the shared object - a thread-local view and a global view. As the thread-local view is not shared, it can be updated without incurring synchronization costs. These local updates become visible to other threads only after the thread-local view is merged with the global view. This enables better performance at the expense of linearizability. We show that it is possible to maintain thread-local views and to perform merge efficiently for several data types and evaluate their performance and scalability compared to linearizable implementations. Further, we discuss the consistency semantics of the data types and the associated programming model. △ Less

Submitted 10 May, 2017; originally announced May 2017.

Comments: 16 pages

arXiv:1608.03326 [pdf, other]

doi 10.4204/EPTCS.223.8

Worlds of Events: Deduction with Partial Knowledge about Causality

Authors: Seyed Hossein Haeri, Peter Van Roy, Carlos Baquero, Christopher Meiklejohn

Abstract: Interactions between internet users are mediated by their devices and the common support infrastructure in data centres. Kee** track of causality amongst actions that take place in this distributed system is key to provide a seamless interaction where effects follow causes. Tracking causality in large scale interactions is difficult due to the cost of kee** large quantities of metadata; even m… ▽ More Interactions between internet users are mediated by their devices and the common support infrastructure in data centres. Kee** track of causality amongst actions that take place in this distributed system is key to provide a seamless interaction where effects follow causes. Tracking causality in large scale interactions is difficult due to the cost of kee** large quantities of metadata; even more challenging when dealing with resource-limited devices. In this paper, we focus on kee** partial knowledge on causality and address deduction from that knowledge. We provide the first proof-theoretic causality modelling for distributed partial knowledge. We prove computability and consistency results. We also prove that the partial knowledge gives rise to a weaker model than classical causality. We provide rules for offline deduction about causality and refute some related folklore. We define two notions of forward and backward bisimilarity between devices, using which we prove two important results. Namely, no matter the order of addition/removal, two devices deduce similarly about causality so long as: (1) the same causal information is fed to both. (2) they start bisimilar and erase the same causal information. Thanks to our establishment of forward and backward bisimilarity, respectively, proofs of the latter two results work by simple induction on length. △ Less

Submitted 10 August, 2016; originally announced August 2016.

Comments: In Proceedings ICE 2016, arXiv:1608.03131

ACM Class: C.2.4; F.4.1

Journal ref: EPTCS 223, 2016, pp. 113-127

arXiv:1603.01529 [pdf, ps, other]

doi 10.1016/j.jpdc.2017.08.003

Delta State Replicated Data Types

Authors: Paulo Sérgio Almeida, Ali Shoker, Carlos Baquero

Abstract: CRDTs are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs ensure convergence through disseminating the entire state, that may be large, and merging it to other replicas; whereas operation-based CRDTs disseminate operations (i.e., small states) assuming an exactly-once reliable dissemination layer. We introduce D… ▽ More CRDTs are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs ensure convergence through disseminating the entire state, that may be large, and merging it to other replicas; whereas operation-based CRDTs disseminate operations (i.e., small states) assuming an exactly-once reliable dissemination layer. We introduce Delta State Conflict-Free Replicated Data Types ($δ$-CRDTs) that can achieve the best of both worlds: small messages with an incremental nature, as in operation-based CRDTs, disseminated over unreliable communication channels, as in traditional state-based CRDTs. This is achieved by defining delta mutators to return a delta-state, typically with a much smaller size than the full state, that to be joined with both local and remote states. We introduce the $δ$-CRDT framework, and we explain it through establishing a correspondence to current state-based CRDTs. In addition, we present an anti-entropy algorithm for eventual convergence, and another one that ensures causal consistency. Finally, we introduce several $δ$-CRDT specifications of both well-known replicated datatypes and novel datatypes, including a generic map composition. △ Less

Submitted 4 March, 2016; originally announced March 2016.

Comments: arXiv admin note: substantial text overlap with arXiv:1410.2803

Journal ref: Journal of Parallel and Distributed Computing, Volume 111, January 2018, Pages 162-173

arXiv:1511.05010 [pdf, other]

Eventually Consistent Register Revisited

Authors: Marek Zawirski, Carlos Baquero, Annette Bieniusa, Nuno Preguiça, Marc Shapiro

Abstract: In order to converge in the presence of concurrent updates, modern eventually consistent replication systems rely on causality information and operation semantics. It is relatively easy to use semantics of high-level operations on replicated data structures, such as sets, lists, etc. However, it is difficult to exploit semantics of operations on registers, which store opaque data. In existing regi… ▽ More In order to converge in the presence of concurrent updates, modern eventually consistent replication systems rely on causality information and operation semantics. It is relatively easy to use semantics of high-level operations on replicated data structures, such as sets, lists, etc. However, it is difficult to exploit semantics of operations on registers, which store opaque data. In existing register designs, concurrent writes are resolved either by the application, or by arbitrating them according to their timestamps. The former is complex and may require user intervention, whereas the latter causes arbitrary updates to be lost. In this work, we identify a register construction that generalizes existing ones by combining runtime causality ordering, to identify concurrent writes, with static data semantics, to resolve them. We propose a simple conflict resolution template based on an application-predefined order on the domain of values. It eliminates or reduces the number of conflicts that need to be resolved by the user or by an explicit application logic. We illustrate some variants of our approach with use cases, and how it generalizes existing designs. △ Less

Submitted 16 November, 2015; originally announced November 2015.

Comments: 8 pages

arXiv:1410.2803 [pdf, ps, other]

Efficient State-based CRDTs by Delta-Mutation

Authors: Paulo Sérgio Almeida, Ali Shoker, Carlos Baquero

Abstract: CRDTs are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs ensure convergence through disseminating the en- tire state, that may be large, and merging it to other replicas; whereas operation-based CRDTs disseminate operations (i.e., small states) assuming an exactly-once reliable dissemination layer. We introduce… ▽ More CRDTs are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs ensure convergence through disseminating the en- tire state, that may be large, and merging it to other replicas; whereas operation-based CRDTs disseminate operations (i.e., small states) assuming an exactly-once reliable dissemination layer. We introduce Delta State Conflict-Free Replicated Datatypes (δ-CRDT) that can achieve the best of both worlds: small messages with an incremental nature, as in operation-based CRDTs, disseminated over unreliable communication channels, as in traditional state-based CRDTs. This is achieved by defining δ-mutators to return a delta-state, typically with a much smaller size than the full state, that is joined to both: local and remote states. We introduce the δ-CRDT framework, and we explain it through establishing a correspondence to current state-based CRDTs. In addition, we present an anti-entropy algorithm that ensures causal consistency, and we introduce two δ-CRDT specifications of well-known replicated datatypes. △ Less

Submitted 3 March, 2015; v1 submitted 10 October, 2014; originally announced October 2014.

Comments: 19 pages

arXiv:1310.3107 [pdf, ps, other]

SwiftCloud: Fault-Tolerant Geo-Replication Integrated all the Way to the Client Machine

Authors: Marek Zawirski, Annette Bieniusa, Valter Balegas, Sérgio Duarte, Carlos Baquero, Marc Shapiro, Nuno Preguiça

Abstract: Client-side logic and storage are increasingly used in web and mobile applications to improve response time and availability. Current approaches tend to be ad-hoc and poorly integrated with the server-side logic. We present a principled approach to integrate client- and server-side storage. We support mergeable and strongly consistent transactions that target either client or server replicas and p… ▽ More Client-side logic and storage are increasingly used in web and mobile applications to improve response time and availability. Current approaches tend to be ad-hoc and poorly integrated with the server-side logic. We present a principled approach to integrate client- and server-side storage. We support mergeable and strongly consistent transactions that target either client or server replicas and provide access to causally-consistent snapshots efficiently. In the presence of infrastructure faults, a client-assisted failover solution allows client execution to resume immediately and seamlessly access consistent snapshots without waiting. We implement this approach in SwiftCloud, the first transactional system to bring geo-replication all the way to the client machine. Example applications show that our programming model is useful across a range of application areas. Our experimental evaluation shows that SwiftCloud provides better fault tolerance and at the same time can improve both latency and throughput by up to an order of magnitude, compared to classical geo-replication techniques. △ Less

Submitted 11 October, 2013; originally announced October 2013.

Report number: RR-8347

Journal ref: N° RR-8347 (2013)

arXiv:1307.3207 [pdf, other]

Scalable Eventually Consistent Counters over Unreliable Networks

Authors: Paulo Sérgio Almeida, Carlos Baquero

Abstract: Counters are an important abstraction in distributed computing, and play a central role in large scale geo-replicated systems, counting events such as web page impressions or social network "likes". Classic distributed counters, strongly consistent, cannot be made both available and partition-tolerant, due to the CAP Theorem, being unsuitable to large scale scenarios. This paper defines Eventually… ▽ More Counters are an important abstraction in distributed computing, and play a central role in large scale geo-replicated systems, counting events such as web page impressions or social network "likes". Classic distributed counters, strongly consistent, cannot be made both available and partition-tolerant, due to the CAP Theorem, being unsuitable to large scale scenarios. This paper defines Eventually Consistent Distributed Counters (ECDC) and presents an implementation of the concept, Handoff Counters, that is scalable and works over unreliable networks. By giving up the sequencer aspect of classic distributed counters, ECDC implementations can be made AP in the CAP design space, while retaining the essence of counting. Handoff Counters are the first CRDT (Conflict-free Replicated Data Type) based mechanism that overcomes the identity explosion problem in naive CRDTs, such as G-Counters (where state size is linear in the number of independent actors that ever incremented the counter), by managing identities towards avoiding global propagation and garbage collecting temporary entries. The approach used in Handoff Counters is not restricted to counters, being more generally applicable to other data types with associative and commutative operations. △ Less

Submitted 11 July, 2013; originally announced July 2013.

ACM Class: C.2.4; E.1; H.2.4

arXiv:1303.5909 [pdf]

doi 10.1080/18756891.2013.773175

Genetic Algorithm with a Local Search Strategy for Discovering Communities in Complex Networks

Authors: Dayou Liu, Di **, Carlos Baquero, Dongxiao He, Bo Yang, Qiangyuan Yu

Abstract: In order to further improve the performance of current genetic algorithms aiming at discovering communities, a local search based genetic algorithm GALS is here proposed. The core of GALS is a local search based mutation technique. In order to overcome the drawbacks of traditional mutation methods, the paper develops the concept of marginal gene and then the local monotonicity of modularity functi… ▽ More In order to further improve the performance of current genetic algorithms aiming at discovering communities, a local search based genetic algorithm GALS is here proposed. The core of GALS is a local search based mutation technique. In order to overcome the drawbacks of traditional mutation methods, the paper develops the concept of marginal gene and then the local monotonicity of modularity function Q is deduced from each nodes local view. Based on these two elements, a new mutation method combined with a local search strategy is presented. GALS has been evaluated on both synthetic benchmarks and several real networks, and compared with some presently competing algorithms. Experimental results show that GALS is highly effective and efficient for discovering community structure. △ Less

Submitted 23 March, 2013; originally announced March 2013.

Comments: 17 pages, 8 figures. arXiv admin note: text overlap with arXiv:1303.4711

Journal ref: International Journal of Computational Intelligence Systems, Vol. 6, No. 2 (March, 2013), 354-369

arXiv:1303.5675 [pdf]

doi 10.1088/1742-5468/2011/05/P05031

Markov random walk under constraint for discovering overlap** communities in complex networks

Authors: Di **, Bo Yang, Carlos Baquero, Dayou Liu, Dongxiao He, Jie Liu

Abstract: Detection of overlap** communities in complex networks has motivated recent research in the relevant fields. Aiming this problem, we propose a Markov dynamics based algorithm, called UEOC, which means, 'unfold and extract overlap** communities'. In UEOC, when identifying each natural community that overlaps, a Markov random walk method combined with a constraint strategy, which is based on the… ▽ More Detection of overlap** communities in complex networks has motivated recent research in the relevant fields. Aiming this problem, we propose a Markov dynamics based algorithm, called UEOC, which means, 'unfold and extract overlap** communities'. In UEOC, when identifying each natural community that overlaps, a Markov random walk method combined with a constraint strategy, which is based on the corresponding annealed network (degree conserving random network), is performed to unfold the community. Then, a cutoff criterion with the aid of a local community function, called conductance, which can be thought of as the ratio between the number of edges inside the community and those leaving it, is presented to extract this emerged community from the entire network. The UEOC algorithm depends on only one parameter whose value can be easily set, and it requires no prior knowledge on the hidden community structures. The proposed UEOC has been evaluated both on synthetic benchmarks and on some real-world networks, and was compared with a set of competing algorithms. Experimental result has shown that UEOC is highly effective and efficient for discovering overlap** communities. △ Less

Submitted 22 March, 2013; originally announced March 2013.

Comments: 21 pages, 8 pages, 2 tables

Journal ref: Journal of Statistical Mechanics: Theory and Experiment, P05031, 2011

arXiv:1210.3368 [pdf, other]

An optimized conflict-free replicated set

Authors: Annette Bieniusa, Marek Zawirski, Nuno Preguiça, Marc Shapiro, Carlos Baquero, Valter Balegas, Sérgio Duarte

Abstract: Eventual consistency of replicated data supports concurrent updates, reduces latency and improves fault tolerance, but forgoes strong consistency. Accordingly, several cloud computing platforms implement eventually-consistent data types. The set is a widespread and useful abstraction, and many replicated set designs have been proposed. We present a reasoning abstraction, permutation equivalence, t… ▽ More Eventual consistency of replicated data supports concurrent updates, reduces latency and improves fault tolerance, but forgoes strong consistency. Accordingly, several cloud computing platforms implement eventually-consistent data types. The set is a widespread and useful abstraction, and many replicated set designs have been proposed. We present a reasoning abstraction, permutation equivalence, that systematizes the characterization of the expected concurrency semantics of concurrent types. Under this framework we present one of the existing conflict-free replicated data types, Observed-Remove Set. Furthermore, in order to decrease the size of meta-data, we propose a new optimization to avoid tombstones. This approach that can be transposed to other data types, such as maps, graphs or sequences. △ Less

Submitted 11 October, 2012; originally announced October 2012.

Comments: No. RR-8083 (2012)

arXiv:1204.1373 [pdf, other]

Spectra: Robust Estimation of Distribution Functions in Networks

Authors: Miguel Borges, Paulo Jesus, Carlos Baquero, Paulo Sérgio Almeida

Abstract: Distributed aggregation allows the derivation of a given global aggregate property from many individual local values in nodes of an interconnected network system. Simple aggregates such as minima/maxima, counts, sums and averages have been thoroughly studied in the past and are important tools for distributed algorithms and network coordination. Nonetheless, this kind of aggregates may not be comp… ▽ More Distributed aggregation allows the derivation of a given global aggregate property from many individual local values in nodes of an interconnected network system. Simple aggregates such as minima/maxima, counts, sums and averages have been thoroughly studied in the past and are important tools for distributed algorithms and network coordination. Nonetheless, this kind of aggregates may not be comprehensive enough to characterize biased data distributions or when in presence of outliers, making the case for richer estimates of the values on the network. This work presents Spectra, a distributed algorithm for the estimation of distribution functions over large scale networks. The estimate is available at all nodes and the technique depicts important properties, namely: robust when exposed to high levels of message loss, fast convergence speed and fine precision in the estimate. It can also dynamically cope with changes of the sampled local property, not requiring algorithm restarts, and is highly resilient to node churn. The proposed approach is experimentally evaluated and contrasted to a competing state of the art distribution aggregation technique. △ Less

Submitted 5 April, 2012; originally announced April 2012.

Comments: Full version of the paper published at 12th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS), Stockholm (Sweden), June 2012

ACM Class: C.2.4; G.3

arXiv:1111.6087 [pdf, other]

doi 10.1109/CDC.2012.6426872

Fast Distributed Computation of Distances in Networks

Authors: Paulo Sérgio Almeida, Carlos Baquero, Alcino Cunha

Abstract: This paper presents a distributed algorithm to simultaneously compute the diameter, radius and node eccentricity in all nodes of a synchronous network. Such topological information may be useful as input to configure other algorithms. Previous approaches have been modular, progressing in sequential phases using building blocks such as BFS tree construction, thus incurring longer executions than st… ▽ More This paper presents a distributed algorithm to simultaneously compute the diameter, radius and node eccentricity in all nodes of a synchronous network. Such topological information may be useful as input to configure other algorithms. Previous approaches have been modular, progressing in sequential phases using building blocks such as BFS tree construction, thus incurring longer executions than strictly required. We present an algorithm that, by timely propagation of available estimations, achieves a faster convergence to the correct values. We show local criteria for detecting convergence in each node. The algorithm avoids the creation of BFS trees and simply manipulates sets of node ids and hop counts. For the worst scenario of variable start times, each node i with eccentricity ecc(i) can compute: the node eccentricity in diam(G)+ecc(i)+2 rounds; the diameter in 2*diam(G)+ecc(i)+2 rounds; and the radius in diam(G)+ecc(i)+2*radius(G) rounds. △ Less

Submitted 25 November, 2011; originally announced November 2011.

Comments: 12 pages

Journal ref: IEEE 51st Annual Conference on Decision and Control (2012), 5215-5220

arXiv:1110.0725 [pdf, other]

A Survey of Distributed Data Aggregation Algorithms

Authors: Paulo Jesus, Carlos Baquero, Paulo Sérgio Almeida

Abstract: Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, that can then be used to direct the execution of other applications. The resulting values result from the distributed computation of functions like COUNT, SUM and AVERAGE. Some application examples can found to determine the network size, total storage capacity, average load… ▽ More Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, that can then be used to direct the execution of other applications. The resulting values result from the distributed computation of functions like COUNT, SUM and AVERAGE. Some application examples can found to determine the network size, total storage capacity, average load, majorities and many others. In the last decade, many different approaches have been proposed, with different trade-offs in terms of accuracy, reliability, message and time complexity. Due to the considerable amount and variety of aggregation algorithms, it can be difficult and time consuming to determine which techniques will be more appropriate to use in specific settings, justifying the existence of a survey to aid in this task. This work reviews the state of the art on distributed data aggregation algorithms, providing three main contributions. First, it formally defines the concept of aggregation, characterizing the different types of aggregation functions. Second, it succinctly describes the main aggregation techniques, organizing them in a taxonomy. Finally, it provides some guidelines toward the selection and use of the most relevant techniques, summarizing their principal characteristics. △ Less

Submitted 4 October, 2011; originally announced October 2011.

Comments: 45 pages, Technical Report

ACM Class: C.2.4; A.1

arXiv:1109.4373 [pdf, other]

doi 10.1007/978-3-642-25873-2_35

Fault-Tolerant Aggregation: Flow-Updating Meets Mass-Distribution

Authors: Paulo S. Almeida, Carlos Baquero, Martin Farach-Colton, Paulo Jesus, Miguel A. Mosteiro

Abstract: Flow-Updating (FU) is a fault-tolerant technique that has proved to be efficient in practice for the distributed computation of aggregate functions in communication networks where individual processors do not have access to global information. Previous distributed aggregation protocols, based on repeated sharing of input values (or mass) among processors, sometimes called Mass-Distribution (MD) pr… ▽ More Flow-Updating (FU) is a fault-tolerant technique that has proved to be efficient in practice for the distributed computation of aggregate functions in communication networks where individual processors do not have access to global information. Previous distributed aggregation protocols, based on repeated sharing of input values (or mass) among processors, sometimes called Mass-Distribution (MD) protocols, are not resilient to communication failures (or message loss) because such failures yield a loss of mass. In this paper, we present a protocol which we call Mass-Distribution with Flow-Updating (MDFU). We obtain MDFU by applying FU techniques to classic MD. We analyze the convergence time of MDFU showing that stochastic message loss produces low overhead. This is the first convergence proof of an FU-based algorithm. We evaluate MDFU experimentally, comparing it with previous MD and FU protocols, and verifying the behavior predicted by the analysis. Finally, given that MDFU incurs a fixed deviation proportional to the message-loss rate, we adjust the accuracy of MDFU heuristically in a new protocol called MDFU with Linear Prediction (MDFU-LP). The evaluation shows that both MDFU and MDFU-LP behave very well in practice, even under high rates of message loss and even changing the input values dynamically. △ Less

Submitted 20 September, 2011; originally announced September 2011.

Comments: 18 pages, 5 figures, To appear in OPODIS 2011

arXiv:1011.6596 [pdf, ps, other]

Dependability in Aggregation by Averaging

Authors: Paulo Jesus, Carlos Baquero, Paulo Sérgio Almeida

Abstract: Aggregation is an important building block of modern distributed applications, allowing the determination of meaningful properties (e.g. network size, total storage capacity, average load, majorities, etc.) that are used to direct the execution of the system. However, the majority of the existing aggregation algorithms exhibit relevant dependability issues, when prospecting their use in real appli… ▽ More Aggregation is an important building block of modern distributed applications, allowing the determination of meaningful properties (e.g. network size, total storage capacity, average load, majorities, etc.) that are used to direct the execution of the system. However, the majority of the existing aggregation algorithms exhibit relevant dependability issues, when prospecting their use in real application environments. In this paper, we reveal some dependability issues of aggregation algorithms based on iterative averaging techniques, giving some directions to solve them. This class of algorithms is considered robust (when compared to common tree-based approaches), being independent from the used routing topology and providing an aggregation result at all nodes. However, their robustness is strongly challenged and their correctness often compromised, when changing the assumptions of their working environment to more realistic ones. The correctness of this class of algorithms relies on the maintenance of a fundamental invariant, commonly designated as "mass conservation". We will argue that this main invariant is often broken in practical settings, and that additional mechanisms and modifications are required to maintain it, incurring in some degradation of the algorithms performance. In particular, we discuss the behavior of three representative algorithms Push-Sum Protocol, Push-Pull Gossip protocol and Distributed Random Grou** under asynchronous and faulty (with message loss and node crashes) environments. More specifically, we propose and evaluate two new versions of the Push-Pull Gossip protocol, which solve its message interleaving problem (evidenced even in a synchronous operation mode). △ Less

Submitted 30 November, 2010; originally announced November 2010.

Comments: 14 pages. Presented in Inforum 2009

ACM Class: C.2.4

arXiv:1011.5808 [pdf, other]

Dotted Version Vectors: Logical Clocks for Optimistic Replication

Authors: Nuno Preguiça, Carlos Baquero, Paulo Sérgio Almeida, Victor Fonte, Ricardo Gonçalves

Abstract: In cloud computing environments, a large number of users access data stored in highly available storage systems. To provide good performance to geographically disperse users and allow operation even in the presence of failures or network partitions, these systems often rely on optimistic replication solutions that guarantee only eventual consistency. In this scenario, it is important to be able to… ▽ More In cloud computing environments, a large number of users access data stored in highly available storage systems. To provide good performance to geographically disperse users and allow operation even in the presence of failures or network partitions, these systems often rely on optimistic replication solutions that guarantee only eventual consistency. In this scenario, it is important to be able to accurately and efficiently identify updates executed concurrently. In this paper, first we review, and expose problems with current approaches to causality tracking in optimistic replication: these either lose information about causality or do not scale, as they require replicas to maintain information that grows linearly with the number of clients or updates. Then, we propose a novel solution that fully captures causality while being very concise in that it maintains information that grows linearly only with the number of servers that register updates for a given data element, bounded by the degree of replication. △ Less

Submitted 26 November, 2010; originally announced November 2010.

Comments: Preprint, submitted for publication. 12 pages

ACM Class: C.2.4; E.1

Showing 1–27 of 27 results for author: Baquero, C