-
A1: A Distributed In-Memory Graph Database
Authors:
Chiranjeeb Buragohain,
Knut Magne Risvik,
Paul Brett,
Miguel Castro,
Wonhee Cho,
Joshua Cowhig,
Nikolas Gloy,
Karthik Kalyanaraman,
Richendra Khanna,
John Pao,
Matthew Renzelmann,
Alex Shamis,
Timothy Tan,
Shuheng Zheng
Abstract:
A1 is an in-memory distributed database used by the Bing search engine to support complex queries over structured data. The key enablers for A1 are availability of cheap DRAM and high speed RDMA (Remote Direct Memory Access) networking in commodity hardware. A1 uses FaRM as its underlying storage layer and builds the graph abstraction and query engine on top. The combination of in-memory storage a…
▽ More
A1 is an in-memory distributed database used by the Bing search engine to support complex queries over structured data. The key enablers for A1 are availability of cheap DRAM and high speed RDMA (Remote Direct Memory Access) networking in commodity hardware. A1 uses FaRM as its underlying storage layer and builds the graph abstraction and query engine on top. The combination of in-memory storage and RDMA access requires rethinking how data is allocated, organized and queried in a large distributed system. A single A1 cluster can store tens of billions of vertices and edges and support a throughput of 350+ million of vertex reads per second with end to end query latency in single digit milliseconds. In this paper we describe the A1 data model, RDMA optimized data structures and query execution.
△ Less
Submitted 12 April, 2020;
originally announced April 2020.
-
Untangling the Braid: Finding Outliers in a Set of Streams
Authors:
Chiranjeeb Buragohain,
Luca Foschini,
Subhash Suri
Abstract:
Monitoring the performance of large shared computing systems such as the cloud computing infrastructure raises many challenging algorithmic problems. One common problem is to track users with the largest deviation from the norm (outliers), for some measure of performance. Taking a stream-computing perspective, we can think of each user's performance profile as a stream of numbers (such as respon…
▽ More
Monitoring the performance of large shared computing systems such as the cloud computing infrastructure raises many challenging algorithmic problems. One common problem is to track users with the largest deviation from the norm (outliers), for some measure of performance. Taking a stream-computing perspective, we can think of each user's performance profile as a stream of numbers (such as response times), and the aggregate performance profile of the shared infrastructure as a "braid" of these intermixed streams. The monitoring system's goal then is to untangle this braid sufficiently to track the top k outliers. This paper investigates the space complexity of one-pass algorithms for approximating outliers of this kind, proves lower bounds using multi-party communication complexity, and proposes small-memory heuristic algorithms. On one hand, stream outliers are easily tracked for simple measures, such as max or min, but our theoretical results rule out even good approximations for most of the natural measures such as average, median, or the quantiles. On the other hand, we show through simulation that our proposed heuristics perform quite well for a variety of synthetic data.
△ Less
Submitted 16 July, 2009;
originally announced July 2009.
-
Distributed Navigation Algorithms for Sensor Networks
Authors:
Chiranjeeb Buragohain,
Divyakant Agrawal,
Subhash Suri
Abstract:
We propose efficient distributed algorithms to aid navigation of a user through a geographic area covered by sensors. The sensors sense the level of danger at their locations and we use this information to find a safe path for the user through the sensor field. Traditional distributed navigation algorithms rely upon flooding the whole network with packets to find an optimal safe path. To reduce…
▽ More
We propose efficient distributed algorithms to aid navigation of a user through a geographic area covered by sensors. The sensors sense the level of danger at their locations and we use this information to find a safe path for the user through the sensor field. Traditional distributed navigation algorithms rely upon flooding the whole network with packets to find an optimal safe path. To reduce the communication expense, we introduce the concept of a skeleton graph which is a sparse subset of the true sensor network communication graph. Using skeleton graphs we show that it is possible to find approximate safe paths with much lower communication cost. We give tight theoretical guarantees on the quality of our approximation and by simulation, show the effectiveness of our algorithms in realistic sensor network situations.
△ Less
Submitted 14 December, 2005;
originally announced December 2005.
-
Power Aware Routing for Sensor Databases
Authors:
Chiranjeeb Buragohain,
Divyakant Agrawal,
Subhash Suri
Abstract:
Wireless sensor networks offer the potential to span and monitor large geographical areas inexpensively. Sensor network databases like TinyDB are the dominant architectures to extract and manage data in such networks. Since sensors have significant power constraints (battery life), and high communication costs, design of energy efficient communication algorithms is of great importance. The data…
▽ More
Wireless sensor networks offer the potential to span and monitor large geographical areas inexpensively. Sensor network databases like TinyDB are the dominant architectures to extract and manage data in such networks. Since sensors have significant power constraints (battery life), and high communication costs, design of energy efficient communication algorithms is of great importance. The data flow in a sensor database is very different from data flow in an ordinary network and poses novel challenges in designing efficient routing algorithms. In this work we explore the problem of energy efficient routing for various different types of database queries and show that in general, this problem is NP-complete. We give a constant factor approximation algorithm for one class of query, and for other queries give heuristic algorithms. We evaluate the efficiency of the proposed algorithms by simulation and demonstrate their near optimal performance for various network sizes.
△ Less
Submitted 29 December, 2004;
originally announced December 2004.
-
Medians and Beyond: New Aggregation Techniques for Sensor Networks
Authors:
Nisheeth Shrivastava,
Chiranjeeb Buragohain,
Divyakant Agrawal,
Subhash Suri
Abstract:
Wireless sensor networks offer the potential to span and monitor large geographical areas inexpensively. Sensors, however, have significant power constraint (battery life), making communication very expensive. Another important issue in the context of sensor-based information systems is that individual sensor readings are inherently unreliable. In order to address these two aspects, sensor datab…
▽ More
Wireless sensor networks offer the potential to span and monitor large geographical areas inexpensively. Sensors, however, have significant power constraint (battery life), making communication very expensive. Another important issue in the context of sensor-based information systems is that individual sensor readings are inherently unreliable. In order to address these two aspects, sensor database systems like TinyDB and Cougar enable in-network data aggregation to reduce the communication cost and improve reliability. The existing data aggregation techniques, however, are limited to relatively simple types of queries such as SUM, COUNT, AVG, and MIN/MAX. In this paper we propose a data aggregation scheme that significantly extends the class of queries that can be answered using sensor networks. These queries include (approximate) quantiles, such as the median, the most frequent data values, such as the consensus value, a histogram of the data distribution, as well as range queries. In our scheme, each sensor aggregates the data it has received from other sensors into a fixed (user specified) size message. We provide strict theoretical guarantees on the approximation quality of the queries in terms of the message size. We evaluate the performance of our aggregation scheme by simulation and demonstrate its accuracy, scalability and low resource utilization for highly variable input data sets.
△ Less
Submitted 16 August, 2004;
originally announced August 2004.
-
A Game Theoretic Framework for Incentives in P2P Systems
Authors:
Chiranjeeb Buragohain,
Divyakant Agrawal,
Subhash Suri
Abstract:
Peer-To-Peer (P2P) networks are self-organizing, distributed systems, with no centralized authority or infrastructure. Because of the voluntary participation, the availability of resources in a P2P system can be highly variable and unpredictable. In this paper, we use ideas from Game Theory to study the interaction of strategic and rational peers, and propose a differential service-based incenti…
▽ More
Peer-To-Peer (P2P) networks are self-organizing, distributed systems, with no centralized authority or infrastructure. Because of the voluntary participation, the availability of resources in a P2P system can be highly variable and unpredictable. In this paper, we use ideas from Game Theory to study the interaction of strategic and rational peers, and propose a differential service-based incentive scheme to improve the system's performance.
△ Less
Submitted 17 October, 2003;
originally announced October 2003.