-
Secrecy: Secure collaborative analytics on secret-shared data
Authors:
John Liagouris,
Vasiliki Kalavri,
Muhammad Faisal,
Mayank Varia
Abstract:
We present a relational MPC framework for secure collaborative analytics on private data with no information leakage. Our work targets challenging use cases where data owners may not have private resources to participate in the computation, thus, they need to securely outsource the data analysis to untrusted third parties. We define a set of oblivious operators, explain the secure primitives they…
▽ More
We present a relational MPC framework for secure collaborative analytics on private data with no information leakage. Our work targets challenging use cases where data owners may not have private resources to participate in the computation, thus, they need to securely outsource the data analysis to untrusted third parties. We define a set of oblivious operators, explain the secure primitives they rely on, and analyze their costs in terms of operations and inter-party communication. We show how these operators can be composed to form end-to-end oblivious queries, and we introduce logical and physical optimizations that dramatically reduce the space and communication requirements during query execution, in some cases from quadratic to linear or from linear to logarithmic with respect to the cardinality of the input.
We implement our framework on top of replicated secret sharing in a system called Secrecy and evaluate it using real queries from several MPC application areas. Our experiments demonstrate that the proposed optimizations can result in over 1000x lower execution times compared to baseline approaches, enabling Secrecy to outperform state-of-the-art frameworks and compute MPC queries on millions of input rows with a single thread per party.
△ Less
Submitted 3 February, 2022; v1 submitted 1 February, 2021;
originally announced February 2021.
-
The Future is Big Graphs! A Community View on Graph Processing Systems
Authors:
Sherif Sakr,
Angela Bonifati,
Hannes Voigt,
Alexandru Iosup,
Khaled Ammar,
Renzo Angles,
Walid Aref,
Marcelo Arenas,
Maciej Besta,
Peter A. Boncz,
Khuzaima Daudjee,
Emanuele Della Valle,
Stefania Dumbrava,
Olaf Hartig,
Bernhard Haslhofer,
Tim Hegeman,
Jan Hidders,
Katja Hose,
Adriana Iamnitchi,
Vasiliki Kalavri,
Hugo Kapp,
Wim Martens,
M. Tamer Özsu,
Eric Peukert,
Stefan Plantikow
, et al. (16 additional authors not shown)
Abstract:
Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue t…
▽ More
Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue to succeed?
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
A Survey on the Evolution of Stream Processing Systems
Authors:
Marios Fragkoulis,
Paris Carbone,
Vasiliki Kalavri,
Asterios Katsifodimos
Abstract:
Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. This survey provides a comprehensive overview of fundamental aspects of stream processing systems and their evolution in the functional areas of out-of-order data management, st…
▽ More
Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. This survey provides a comprehensive overview of fundamental aspects of stream processing systems and their evolution in the functional areas of out-of-order data management, state management, fault tolerance, high availability, load management, elasticity, and reconfiguration. We review noteworthy past research findings, outline the similarities and differences between early ('00-'10) and modern ('11-'22) streaming systems, and discuss recent trends and open problems.
△ Less
Submitted 14 January, 2023; v1 submitted 3 August, 2020;
originally announced August 2020.
-
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems
Authors:
Maciej Besta,
Marc Fischer,
Vasiliki Kalavri,
Michael Kapralov,
Torsten Hoefler
Abstract:
Graph processing has become an important part of various areas of computing, including machine learning, medical applications, social network analysis, computational sciences, and others. A growing amount of the associated graph processing workloads are dynamic, with millions of edges added or removed per second. Graph streaming frameworks are specifically crafted to enable the processing of such…
▽ More
Graph processing has become an important part of various areas of computing, including machine learning, medical applications, social network analysis, computational sciences, and others. A growing amount of the associated graph processing workloads are dynamic, with millions of edges added or removed per second. Graph streaming frameworks are specifically crafted to enable the processing of such highly dynamic workloads. Recent years have seen the development of many such frameworks. However, they differ in their general architectures (with key details such as the support for the concurrent execution of graph updates and queries, or the incorporated graph data organization), the types of updates and workloads allowed, and many others. To facilitate the understanding of this growing field, we provide the first analysis and taxonomy of dynamic and streaming graph processing. We focus on identifying the fundamental system designs and on understanding their support for concurrency, and for different graph updates as well as analytics workloads. We also crystallize the meaning of different concepts associated with streaming graph processing, such as dynamic, temporal, online, and time-evolving graphs, edge-centric processing, models for the maintenance of updates, and graph databases. Moreover, we provide a bridge with the very rich landscape of graph streaming theory by giving a broad overview of recent theoretical related advances, and by discussing which graph streaming models and settings could be helpful in develo** more powerful streaming frameworks and designs. We also outline graph streaming workloads and research challenges.
△ Less
Submitted 27 October, 2021; v1 submitted 29 December, 2019;
originally announced December 2019.
-
Megaphone: Latency-conscious state migration for distributed streaming dataflows
Authors:
Moritz Hoffmann,
Andrea Lattuada,
Frank McSherry,
Vasiliki Kalavri,
John Liagouris,
Timothy Roscoe
Abstract:
We design and implement Megaphone, a data migration mechanism for stateful distributed dataflow engines with latency objectives. When compared to existing migration mechanisms, Megaphone has the following differentiating characteristics: (i) migrations can be subdivided to a configurable granularity to avoid latency spikes, and (ii) migrations can be prepared ahead of time to avoid runtime coordin…
▽ More
We design and implement Megaphone, a data migration mechanism for stateful distributed dataflow engines with latency objectives. When compared to existing migration mechanisms, Megaphone has the following differentiating characteristics: (i) migrations can be subdivided to a configurable granularity to avoid latency spikes, and (ii) migrations can be prepared ahead of time to avoid runtime coordination. Megaphone is implemented as a library on an unmodified timely dataflow implementation, and provides an operator interface compatible with its existing APIs. We evaluate Megaphone on established benchmarks with varying amounts of state and observe that compared to naïve approaches Megaphone reduces service latencies during reconfiguration by orders of magnitude without significantly increasing steady-state overhead.
△ Less
Submitted 16 April, 2019; v1 submitted 4 December, 2018;
originally announced December 2018.
-
DeltaPath: dataflow-based high-performance incremental routing
Authors:
Desislava Dimitrova,
John Liagouris,
Sebastian Wicki,
Moritz Hoffmann,
Vasiliki Kalavri,
Timothy Roscoe
Abstract:
Routing controllers must react quickly to failures, reconfigurations and workload or policy changes, to ensure service performance and cost-efficient network operation. We propose a general execution model which views routing as an incremental data-parallel computation on a graph-based network model plus a continuous stream of network changes. Our approach supports different routing objectives wit…
▽ More
Routing controllers must react quickly to failures, reconfigurations and workload or policy changes, to ensure service performance and cost-efficient network operation. We propose a general execution model which views routing as an incremental data-parallel computation on a graph-based network model plus a continuous stream of network changes. Our approach supports different routing objectives with only minor re-configuration of its core algorithm, and easily accomodates dynamic user-defined routing policies. Moreover, our prototype demonstrates excellent performance: on Google Jupiter topology it reacts with a median time of 350ms to link failures and serves more than two million path requests per second each with latency under 1ms. This is three orders-of-magnitude faster than the popular ONOS open-source SDN controller.
△ Less
Submitted 21 August, 2018;
originally announced August 2018.
-
High-Level Programming Abstractions for Distributed Graph Processing
Authors:
Vasiliki Kalavri,
Vladimir Vlassov,
Seif Haridi
Abstract:
Efficient processing of large-scale graphs in distributed environments has been an increasingly popular topic of research in recent years. Inter-connected data that can be modeled as graphs arise in application domains such as machine learning, recommendation, web search, and social network analysis. Writing distributed graph applications is inherently hard and requires programming models that can…
▽ More
Efficient processing of large-scale graphs in distributed environments has been an increasingly popular topic of research in recent years. Inter-connected data that can be modeled as graphs arise in application domains such as machine learning, recommendation, web search, and social network analysis. Writing distributed graph applications is inherently hard and requires programming models that can cover a diverse set of problem domains, including iterative refinement algorithms, graph transformations, graph aggregations, pattern matching, ego-network analysis, and graph traversals. Several high-level programming abstractions have been proposed and adopted by distributed graph processing systems and big data platforms. Even though significant work has been done to experimentally compare distributed graph processing frameworks, no qualitative study and comparison of graph programming abstractions has been conducted yet. In this survey, we review and analyze the most prevalent high-level programming models for distributed graph processing, in terms of their semantics and applicability. We identify the classes of graph applications that can be naturally expressed by each abstraction and we also give examples of applications that are hard or impossible to express. We review 34 distributed graph processing systems with respect to their programming abstractions, execution models, and communication mechanisms. Finally, we discuss trends and open research questions in the area of distributed graph processing.
△ Less
Submitted 9 July, 2016;
originally announced July 2016.