Skip to main content

Showing 1–22 of 22 results for author: Özsu, M T

.
  1. arXiv:2406.06754  [pdf, other

    cs.DB

    Incremental Sliding Window Connectivity over Streaming Graphs

    Authors: Chao Zhang, Angela Bonifati, M. Tamer Özsu

    Abstract: We study index-based processing for connectivity queries within sliding windows on streaming graphs. These queries, which determine whether two vertices belong to the same connected component, are fundamental operations in real-time graph data processing and demand high throughput and low latency. While indexing methods that leverage data structures for fully dynamic connectivity can facilitate ef… ▽ More

    Submitted 12 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: To appear in VLDB 2024

  2. arXiv:2311.03542  [pdf, other

    cs.DB

    Indexing Techniques for Graph Reachability Queries

    Authors: Chao Zhang, Angela Bonifati, M. Tamer Özsu

    Abstract: We survey graph reachability indexing techniques for efficient processing of graph reachability queries in two types of popular graph models: plain graphs and edge-labeled graphs. Reachability queries are fundamental in graph processing, and reachability indexes are specialized data structures tailored for speeding up such queries. Work on this topic goes back four decades -- we include 33 of the… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  3. arXiv:2303.13844  [pdf, other

    cs.DB

    Efficient Execution of SPARQL Queries with OPTIONAL and UNION Expressions

    Authors: Lei Zou, Yue Pang, M. Tamer Özsu, Jiaqi Chen

    Abstract: The proliferation of RDF datasets has resulted in studies focusing on optimizing SPARQL query processing. Most existing work focuses on basic graph patterns (BGPs) and ignores other vital operators in SPARQL, such as UNION and OPTIONAL. SPARQL queries with these operators, which we abbreviate as SPARQL-UO, pose serious query plan generation challenges. In this paper, we propose techniques for exec… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  4. arXiv:2301.13761  [pdf, other

    cs.OH

    Foundations and Sco** of Data Science

    Authors: M. Tamer Özsu

    Abstract: There has been an increasing recognition of the value of data and of data-based decision making. As a consequence, the development of data science as a field of study has intensified in recent years. However, there is no systematic and comprehensive treatment and understanding of data science. This article describes a systematic and end-to-end framing of the field based on an inclusive definition.… ▽ More

    Submitted 4 January, 2024; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: This is an extended version of the original submission. The original has now been published by Communications of ACM, Volume 66, Number 7, pages 106-116, 2023. The original version was only 10 pages and the new version is 100 pages; the original had a restricted number of references (42) while the longer one is much more complete with over 150 references

  5. Optimizing Differentially-Maintained Recursive Queries on Dynamic Graphs

    Authors: Khaled Ammar, Siddhartha Sahu, Semih Salihoglu, M. Tamer Ozsu

    Abstract: Differential computation (DC) is a highly general incremental computation/view maintenance technique that can maintain the output of an arbitrary and possibly recursive dataflow computation upon changes to its base inputs. As such, it is a promising technique for graph database management systems (GDBMS) that support continuous recursive queries over dynamic graphs. Although differential computati… ▽ More

    Submitted 30 July, 2022; originally announced August 2022.

    Journal ref: PVLDB, 15(11): 3186 - 3198, 2022

  6. arXiv:2207.03027  [pdf, other

    cs.DB

    The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation

    Authors: Ruihong Wang, Jianguo Wang, Stratos Idreos, M. Tamer Özsu, Walid G. Aref

    Abstract: Memory disaggregation (MD) allows for scalable and elastic data center design by separating compute (CPU) from memory. With MD, compute and memory are no longer coupled into the same server box. Instead, they are connected to each other via ultra-fast networking such as RDMA. MD can bring many advantages, e.g., higher memory utilization, better independent scaling (of compute and memory), and lowe… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

  7. arXiv:2111.12217  [pdf, other

    cs.DS cs.DB cs.DM cs.SI

    Scale-Invariant Strength Assortativity of Streaming Butterflies

    Authors: Aida Sheshbolouki, M. Tamer Özsu

    Abstract: Bipartite graphs are rich data structures with prevalent applications and identifier structural features. However, less is known about their growth patterns, particularly in streaming settings. Current works study the patterns of static or aggregated temporal graphs optimized for certain down-stream analytics or ignoring multipartite/non-stationary data distributions, emergence patterns of subgrap… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

    Comments: Submitted for publication

  8. arXiv:2106.14038  [pdf

    cs.DB cs.DC

    GSmart: An Efficient SPARQL Query Engine Using Sparse Matrix Algebra -- Full Version

    Authors: Yuedan Chen, M. Tamer Özsu, Guoqing Xiao, Zhuo Tang, Kenli Li

    Abstract: Efficient execution of SPARQL queries over large RDF datasets is a topic of considerable interest due to increased use of RDF to encode data. Most of this work has followed either relational or graph-based approaches. In this paper, we propose an alternative query engine, called gSmart, based on matrix algebra. This approach can potentially better exploit the computing power of high-performance he… ▽ More

    Submitted 26 June, 2021; originally announced June 2021.

  9. arXiv:2101.12334  [pdf, other

    cs.DB cs.DS

    sGrapp: Butterfly Approximation in Streaming Graphs

    Authors: Aida Sheshbolouki, M. Tamer Özsu

    Abstract: We study the fundamental problem of butterfly (i.e. (2,2)-bicliques) counting in bipartite streaming graphs. Similar to triangles in unipartite graphs, enumerating butterflies is crucial in understanding the structure of bipartite graphs. This benefits many applications where studying the cohesion in a graph shaped data is of particular interest. Examples include investigating the structure of com… ▽ More

    Submitted 3 February, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

  10. arXiv:2101.12305  [pdf, other

    cs.DB

    Evaluating Complex Queries on Streaming Graphs

    Authors: Anil Pacaci, Angela Bonifati, M. Tamer Özsu

    Abstract: We study the problem of evaluating persistent queries over streaming graphs in a principled fashion. These queries need to be evaluated over unbounded and very high speed graph streams. We define a streaming graph data model and query model incorporating navigational queries, subgraph queries and paths as first-class citizens. To support this full-fledged query model we develop a streaming graph a… ▽ More

    Submitted 1 August, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

    Comments: 18 pages; typos fixed; examples, experimental setup and analysis updated

  11. arXiv:2012.06171  [pdf, other

    cs.DC cs.DB

    The Future is Big Graphs! A Community View on Graph Processing Systems

    Authors: Sherif Sakr, Angela Bonifati, Hannes Voigt, Alexandru Iosup, Khaled Ammar, Renzo Angles, Walid Aref, Marcelo Arenas, Maciej Besta, Peter A. Boncz, Khuzaima Daudjee, Emanuele Della Valle, Stefania Dumbrava, Olaf Hartig, Bernhard Haslhofer, Tim Hegeman, Jan Hidders, Katja Hose, Adriana Iamnitchi, Vasiliki Kalavri, Hugo Kapp, Wim Martens, M. Tamer Özsu, Eric Peukert, Stefan Plantikow , et al. (16 additional authors not shown)

    Abstract: Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue t… ▽ More

    Submitted 11 December, 2020; originally announced December 2020.

    Comments: 12 pages, 3 figures, collaboration between the large-scale systems and data management communities, work started at the Dagstuhl Seminar 19491 on Big Graph Processing Systems, to be published in the Communications of the ACM

    ACM Class: C.3; E.0; H.2; J.0

  12. arXiv:2005.00081  [pdf, other

    cs.DC cs.SI

    Scalable Mining of Maximal Quasi-Cliques: An Algorithm-System Codesign Approach

    Authors: Guimu Guo, Da Yan, M. Tamer Özsu, Zhe Jiang, Jalal Khalil

    Abstract: Given a user-specified minimum degree threshold $γ$, a $γ$-quasi-clique is a subgraph $g=(V_g,E_g)$ where each vertex $v\in V_g$ connects to at least $γ$ fraction of the other vertices (i.e., $\lceil γ\cdot(|V_g|-1)\rceil$ vertices) in $g$. Quasi-clique is one of the most natural definitions for dense structures useful in finding communities in social networks and discovering significant biomolecu… ▽ More

    Submitted 10 May, 2021; v1 submitted 30 April, 2020; originally announced May 2020.

    Comments: Guimu Guo and Da Yan are parallel first authors; this is the full version of our PVLDB 2021 paper with the same title

  13. arXiv:2004.02012  [pdf, other

    cs.DB

    Regular Path Query Evaluation on Streaming Graphs

    Authors: Anil Pacaci, Angela Bonifati, M. Tamer Özsu

    Abstract: We study persistent query evaluation over streaming graphs, which is becoming increasingly important. We focus on navigational queries that determine if there exists a path between two entities that satisfies a user-specified constraint. We adopt the Regular Path Query (RPQ) model that specifies navigational patterns with labeled constraints. We propose deterministic algorithms to efficiently eval… ▽ More

    Submitted 4 April, 2020; originally announced April 2020.

    Comments: A shorter version of this paper has been accepted for publication in 2020 International Conference on Management of Data (SIGMOD 2020)

  14. GSI: GPU-friendly Subgraph Isomorphism

    Authors: Li Zeng, Lei Zou, M. Tamer Özsu, Lin Hu, Fan Zhang

    Abstract: Subgraph isomorphism is a well-known NP-hard problem that is widely used in many applications, such as social network analysis and query over the knowledge graph. Due to the inherent hardness, its performance is often a bottleneck in various real-world applications. Therefore, we address this by designing an efficient subgraph isomorphism algorithm leveraging features of GPU architecture, such as… ▽ More

    Submitted 20 April, 2021; v1 submitted 8 June, 2019; originally announced June 2019.

    Comments: 15 pages, 17 figures, conference

    Journal ref: IEEE International Conference on Data Engineering 2020

  15. arXiv:1801.09240  [pdf, other

    cs.DB

    Time Constrained Continuous Subgraph Search over Streaming Graphs

    Authors: Youhuan Li, Lei Zou, M. Tamer Ozsu, Dongyan Zhao

    Abstract: The growing popularity of dynamic applications such as social networks provides a promising way to detect valuable information in real time. Efficient analysis over high-speed data from dynamic applications is of great significance. Data from these dynamic applications can be easily modeled as streaming graph. In this paper, we study the subgraph (isomorphism) search over streaming graph data that… ▽ More

    Submitted 3 September, 2018; v1 submitted 28 January, 2018; originally announced January 2018.

  16. The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: Extended Survey

    Authors: Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, M. Tamer Özsu

    Abstract: Graph processing is becoming increasingly prevalent across many application domains. In spite of this prevalence, there is little research about how graphs are actually used in practice. We performed an extensive study that consisted of an online survey of 89 users, a review of the mailing lists, source repositories, and whitepapers of a large suite of graph software products, and in-person interv… ▽ More

    Submitted 4 September, 2019; v1 submitted 10 September, 2017; originally announced September 2017.

    Journal ref: The VLDB Journal, 2019

  17. arXiv:1709.03110  [pdf, other

    cs.DC

    G-thinker: Big Graph Mining Made Easier and Faster

    Authors: Da Yan, Hongzhi Chen, James Cheng, M. Tamer Özsu, Qizhen Zhang, John C. S. Lui

    Abstract: This paper proposes a general system for compute-intensive graph mining tasks that find from a big graph all subgraphs that satisfy certain requirements (e.g., graph matching and community detection). Due to the broad range of applications of such tasks, many single-threaded algorithms have been proposed. However, graphs such as online social networks and knowledge graphs often have billions of ve… ▽ More

    Submitted 10 September, 2017; originally announced September 2017.

  18. arXiv:1607.01046  [pdf, ps, other

    cs.DB

    Walking without a Map: Optimizing Response Times of Traversal-Based Linked Data Queries (Extended Version)

    Authors: Olaf Hartig, M. Tamer Özsu

    Abstract: The emergence of Linked Data on the WWW has spawned research interest in an online execution of declarative queries over this data. A particularly interesting approach is traversal-based query execution which fetches data by traversing data links and, thus, is able to make use of up-to-date data from initially unknown data sources. The downside of this approach is the delay before the query engine… ▽ More

    Submitted 4 July, 2016; originally announced July 2016.

    Comments: This document is an extended version of a paper published in ISWC 2016. In addition to a more detailed discussion of the experimental results presented in the conference version, this extended version provides an in-depth description of our approach to implement traversal-based query execution, and we present a number of additional experiments

  19. arXiv:1601.06497  [pdf, other

    cs.DC cs.DB

    Quegel: A General-Purpose Query-Centric Framework for Querying Big Graphs

    Authors: Da Yan, James Cheng, M. Tamer Özsu, Fan Yang, Yi Lu, John C. S. Lui, Qizhen Zhang, Wilfred Ng

    Abstract: Pioneered by Google's Pregel, many distributed systems have been developed for large-scale graph analytics. These systems expose the user-friendly "think like a vertex" programming interface to users, and exhibit good horizontal scalability. However, these systems are designed for tasks where the majority of graph vertices participate in computation, but are not suitable for processing light-workl… ▽ More

    Submitted 25 January, 2016; originally announced January 2016.

    Comments: This is a full version of our VLDB paper

  20. arXiv:1601.00707  [pdf, other

    cs.DB

    A Survey of RDF Data Management Systems

    Authors: M. Tamer Özsu

    Abstract: RDF is increasingly being used to encode data for the semantic web and for data exchange. There have been a large number of works that address RDF data management. In this paper we provide an overview of these works.

    Submitted 4 January, 2016; originally announced January 2016.

  21. arXiv:1504.02523  [pdf, other

    cs.DB

    Clustering RDF Databases Using Tunable-LSH

    Authors: Güneş Aluç, M. Tamer Özsu, Khuzaima Daudjee

    Abstract: The Resource Description Framework (RDF) is a W3C standard for representing graph-structured data, and SPARQL is the standard query language for RDF. Recent advances in Information Extraction, Linked Data Management and the Semantic Web have led to a rapid increase in both the volume and the variety of RDF data that are publicly available. As businesses start to capitalize on RDF data, RDF data ma… ▽ More

    Submitted 18 April, 2015; v1 submitted 9 April, 2015; originally announced April 2015.

    Comments: Fixed typos, updated related work section

  22. arXiv:1411.6763  [pdf, other

    cs.DB cs.DC

    Processing SPARQL Queries Over Distributed RDF Graphs

    Authors: Peng Peng, Lei Zou, M. Tamer Özsu, Lei Chen, Dongyan Zhao

    Abstract: We propose techniques for processing SPARQL queries over a large RDF graph in a distributed environment. We adopt a "partial evaluation and assembly" framework. Answering a SPARQL query Q is equivalent to finding subgraph matches of the query graph Q over RDF graph G. Based on properties of subgraph matching over a distributed graph, we introduce local partial match as partial answers in each frag… ▽ More

    Submitted 21 March, 2016; v1 submitted 25 November, 2014; originally announced November 2014.

    Comments: 30 pages