Search | arXiv e-print repository

arXiv:2004.05712 [pdf, other]

doi 10.1145/3318464.3386135

A1: A Distributed In-Memory Graph Database

Authors: Chiranjeeb Buragohain, Knut Magne Risvik, Paul Brett, Miguel Castro, Wonhee Cho, Joshua Cowhig, Nikolas Gloy, Karthik Kalyanaraman, Richendra Khanna, John Pao, Matthew Renzelmann, Alex Shamis, Timothy Tan, Shuheng Zheng

Abstract: A1 is an in-memory distributed database used by the Bing search engine to support complex queries over structured data. The key enablers for A1 are availability of cheap DRAM and high speed RDMA (Remote Direct Memory Access) networking in commodity hardware. A1 uses FaRM as its underlying storage layer and builds the graph abstraction and query engine on top. The combination of in-memory storage a… ▽ More A1 is an in-memory distributed database used by the Bing search engine to support complex queries over structured data. The key enablers for A1 are availability of cheap DRAM and high speed RDMA (Remote Direct Memory Access) networking in commodity hardware. A1 uses FaRM as its underlying storage layer and builds the graph abstraction and query engine on top. The combination of in-memory storage and RDMA access requires rethinking how data is allocated, organized and queried in a large distributed system. A single A1 cluster can store tens of billions of vertices and edges and support a throughput of 350+ million of vertex reads per second with end to end query latency in single digit milliseconds. In this paper we describe the A1 data model, RDMA optimized data structures and query execution. △ Less

Submitted 12 April, 2020; originally announced April 2020.

arXiv:0907.2951 [pdf, ps, other]

Untangling the Braid: Finding Outliers in a Set of Streams

Authors: Chiranjeeb Buragohain, Luca Foschini, Subhash Suri

Abstract: Monitoring the performance of large shared computing systems such as the cloud computing infrastructure raises many challenging algorithmic problems. One common problem is to track users with the largest deviation from the norm (outliers), for some measure of performance. Taking a stream-computing perspective, we can think of each user's performance profile as a stream of numbers (such as respon… ▽ More Monitoring the performance of large shared computing systems such as the cloud computing infrastructure raises many challenging algorithmic problems. One common problem is to track users with the largest deviation from the norm (outliers), for some measure of performance. Taking a stream-computing perspective, we can think of each user's performance profile as a stream of numbers (such as response times), and the aggregate performance profile of the shared infrastructure as a "braid" of these intermixed streams. The monitoring system's goal then is to untangle this braid sufficiently to track the top k outliers. This paper investigates the space complexity of one-pass algorithms for approximating outliers of this kind, proves lower bounds using multi-party communication complexity, and proposes small-memory heuristic algorithms. On one hand, stream outliers are easily tracked for simple measures, such as max or min, but our theoretical results rule out even good approximations for most of the natural measures such as average, median, or the quantiles. On the other hand, we show through simulation that our proposed heuristics perform quite well for a variety of synthetic data. △ Less

Submitted 16 July, 2009; originally announced July 2009.

arXiv:cs/0512060 [pdf, ps, other]

Distributed Navigation Algorithms for Sensor Networks

Authors: Chiranjeeb Buragohain, Divyakant Agrawal, Subhash Suri

Abstract: We propose efficient distributed algorithms to aid navigation of a user through a geographic area covered by sensors. The sensors sense the level of danger at their locations and we use this information to find a safe path for the user through the sensor field. Traditional distributed navigation algorithms rely upon flooding the whole network with packets to find an optimal safe path. To reduce… ▽ More We propose efficient distributed algorithms to aid navigation of a user through a geographic area covered by sensors. The sensors sense the level of danger at their locations and we use this information to find a safe path for the user through the sensor field. Traditional distributed navigation algorithms rely upon flooding the whole network with packets to find an optimal safe path. To reduce the communication expense, we introduce the concept of a skeleton graph which is a sparse subset of the true sensor network communication graph. Using skeleton graphs we show that it is possible to find approximate safe paths with much lower communication cost. We give tight theoretical guarantees on the quality of our approximation and by simulation, show the effectiveness of our algorithms in realistic sensor network situations. △ Less

Submitted 14 December, 2005; originally announced December 2005.

Comments: To Appear in INFOCOM 2006

arXiv:cs/0412118 [pdf, ps, other]

doi 10.1109/INFCOM.2005.1498455

Power Aware Routing for Sensor Databases

Authors: Chiranjeeb Buragohain, Divyakant Agrawal, Subhash Suri

Abstract: Wireless sensor networks offer the potential to span and monitor large geographical areas inexpensively. Sensor network databases like TinyDB are the dominant architectures to extract and manage data in such networks. Since sensors have significant power constraints (battery life), and high communication costs, design of energy efficient communication algorithms is of great importance. The data… ▽ More Wireless sensor networks offer the potential to span and monitor large geographical areas inexpensively. Sensor network databases like TinyDB are the dominant architectures to extract and manage data in such networks. Since sensors have significant power constraints (battery life), and high communication costs, design of energy efficient communication algorithms is of great importance. The data flow in a sensor database is very different from data flow in an ordinary network and poses novel challenges in designing efficient routing algorithms. In this work we explore the problem of energy efficient routing for various different types of database queries and show that in general, this problem is NP-complete. We give a constant factor approximation algorithm for one class of query, and for other queries give heuristic algorithms. We evaluate the efficiency of the proposed algorithms by simulation and demonstrate their near optimal performance for various network sizes. △ Less

Submitted 29 December, 2004; originally announced December 2004.

ACM Class: C.2.1, C.2.2, F.1.3

Journal ref: Proceedings of IEEE INFOCOM 2005, March 13-17, 2005 Miami

arXiv:cs/0408039 [pdf, ps, other]

Medians and Beyond: New Aggregation Techniques for Sensor Networks

Authors: Nisheeth Shrivastava, Chiranjeeb Buragohain, Divyakant Agrawal, Subhash Suri

Abstract: Wireless sensor networks offer the potential to span and monitor large geographical areas inexpensively. Sensors, however, have significant power constraint (battery life), making communication very expensive. Another important issue in the context of sensor-based information systems is that individual sensor readings are inherently unreliable. In order to address these two aspects, sensor datab… ▽ More Wireless sensor networks offer the potential to span and monitor large geographical areas inexpensively. Sensors, however, have significant power constraint (battery life), making communication very expensive. Another important issue in the context of sensor-based information systems is that individual sensor readings are inherently unreliable. In order to address these two aspects, sensor database systems like TinyDB and Cougar enable in-network data aggregation to reduce the communication cost and improve reliability. The existing data aggregation techniques, however, are limited to relatively simple types of queries such as SUM, COUNT, AVG, and MIN/MAX. In this paper we propose a data aggregation scheme that significantly extends the class of queries that can be answered using sensor networks. These queries include (approximate) quantiles, such as the median, the most frequent data values, such as the consensus value, a histogram of the data distribution, as well as range queries. In our scheme, each sensor aggregates the data it has received from other sensors into a fixed (user specified) size message. We provide strict theoretical guarantees on the approximation quality of the queries in terms of the message size. We evaluate the performance of our aggregation scheme by simulation and demonstrate its accuracy, scalability and low resource utilization for highly variable input data sets. △ Less

Submitted 16 August, 2004; originally announced August 2004.

Journal ref: Proceedings of the Second ACM Conference on Embedded Networked Sensor Systems (SenSys 2004)

arXiv:cs/0310039 [pdf, ps, other]

A Game Theoretic Framework for Incentives in P2P Systems

Authors: Chiranjeeb Buragohain, Divyakant Agrawal, Subhash Suri

Abstract: Peer-To-Peer (P2P) networks are self-organizing, distributed systems, with no centralized authority or infrastructure. Because of the voluntary participation, the availability of resources in a P2P system can be highly variable and unpredictable. In this paper, we use ideas from Game Theory to study the interaction of strategic and rational peers, and propose a differential service-based incenti… ▽ More Peer-To-Peer (P2P) networks are self-organizing, distributed systems, with no centralized authority or infrastructure. Because of the voluntary participation, the availability of resources in a P2P system can be highly variable and unpredictable. In this paper, we use ideas from Game Theory to study the interaction of strategic and rational peers, and propose a differential service-based incentive scheme to improve the system's performance. △ Less

Submitted 17 October, 2003; originally announced October 2003.

ACM Class: H.4, J.4

Journal ref: Proc. of the Third International Conference on P2P Computing (P2P2003), Linko** Sweden, 2003

Showing 1–6 of 6 results for author: Buragohain, C