Skip to main content

Showing 1–31 of 31 results for author: Aref, W

.
  1. arXiv:2406.09372  [pdf, other

    cs.DB

    Investigation of Adaptive Hotspot-Aware Indexes for Oscillating Write-Heavy and Read-Heavy Workloads -- An Experimental Study

    Authors: Lu Xing, Walid G. Aref

    Abstract: HTAP systems are designed to handle transactional and analytical workloads. Besides a mixed workload at any given time, the workload can also change over time. A popular kind of continuously changing workload is one that oscillates between being write-heavy and being read-heavy. These oscillating workloads can be observed in many applications. Indexes, e.g., the B+-tree and the LSM-Tree cannot per… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2406.08746  [pdf, other

    cs.DB

    The AHA-Tree: An Adaptive Index for HTAP Workloads

    Authors: Lu Xing, Walid G. Aref

    Abstract: In this demo, we realize data indexes that can morph from being write-optimized at times to being read-optimized at other times nonstop with zero-down time during the workload transitioning. These data indexes are useful for HTAP systems (Hybrid Transactional and Analytical Processing Systems), where transactional workloads are write-heavy while analytical workloads are read-heavy. Traditional ind… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  3. arXiv:2406.05327  [pdf, other

    cs.DB

    Multi-Entry Generalized Search Trees for Indexing Trajectories

    Authors: Maxime Schoemans, Walid G. Aref, Esteban Zimányi, Mahmoud Sakr

    Abstract: The idea of generalized indices is one of the success stories of database systems research. It has found its way to implementation in common database systems. GiST (Generalized Search Tree) and SP-GiST (Space-Partitioned Generalized Search Tree) are two widely-used generalized indices that are typically used for multidimensional data. Currently, the generalized indices GiST and SP-GiST represent o… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  4. arXiv:2405.01448  [pdf, other

    cs.DB

    GTX: A Transactional Graph Data System For HTAP Workloads

    Authors: Libin Zhou, Walid Aref

    Abstract: Processing, managing, and analyzing dynamic graphs are the cornerstone in multiple application domains including fraud detection, recommendation system, graph neural network training, etc. This demo presents GTX, a latch-free write-optimized transactional graph data system that supports high throughput read-write transactions while maintaining competitive graph analytics. GTX has a unique latch-fr… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 4 pages 2 figures for VLDB 2024 DEMO

    ACM Class: H.2.4

  5. arXiv:2405.01418  [pdf, other

    cs.DB

    GTX: A Write-Optimized Latch-free Graph Data System with Transactional Support

    Authors: Libin Zhou, Yeasir Rayhan, Lu Xing, Walid. G. Aref

    Abstract: This paper introduces GTX a standalone main-memory write-optimized graph system that specializes in structural and graph property updates while maintaining concurrent reads and graph analytics with snapshot isolation-level transactional concurrency. Recent graph libraries target efficient concurrent read and write support while guaranteeing transactional consistency. However, their performance suf… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 12 pages, 13 figures, submitted to VLDB 2025

    ACM Class: H.2.4

  6. arXiv:2403.06456  [pdf, other

    cs.DB cs.LG

    A Survey of Learned Indexes for the Multi-dimensional Space

    Authors: Abdullah Al-Mamun, Hao Wu, Qiyang He, Jianguo Wang, Walid G. Aref

    Abstract: A recent research trend involves treating database index structures as Machine Learning (ML) models. In this domain, single or multiple ML models are trained to learn the map** from keys to positions inside a data set. This class of indexes is known as "Learned Indexes." Learned indexes have demonstrated improved search performance and reduced space requirements for one-dimensional data. The con… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  7. arXiv:2403.04582  [pdf, other

    cs.DB

    The Ubiquitous Skiplist: A Survey of What Cannot be Skipped About the Skiplist and its Applications in Big Data Systems

    Authors: Venkata Sai Pavan Kumar Vadrevu, Lu Xing, Walid G. Aref

    Abstract: Skiplists have become prevalent in systems. The main advantages of skiplists are their simplicity and ease of implementation, and the ability to support operations in the same asymptotic complexities as their tree-based counterparts. In this survey, we explore skiplists and their many variants. We highlight many scenarios of how skiplists are useful and fit well in these usage scenarios. We study… ▽ More

    Submitted 22 May, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  8. SIMD-ified R-tree Query Processing and Optimization

    Authors: Yeasir Rayhan, Walid G. Aref

    Abstract: The introduction of Single Instruction Multiple Data (SIMD) instructions in mainstream CPUs has enabled modern database engines to leverage data parallelism by performing more computation with a single instruction, resulting in a reduced number of instructions required to execute a query as well as the elimination of conditional branches. Though SIMD in the context of traditional database engines… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: To appear at ACM SIGSPATIAL 2023

  9. arXiv:2307.05717  [pdf, other

    cs.OH

    Towards Mobility Data Science (Vision Paper)

    Authors: Mohamed Mokbel, Mahmoud Sakr, Li Xiong, Andreas Züfle, Jussara Almeida, Taylor Anderson, Walid Aref, Gennady Andrienko, Natalia Andrienko, Yang Cao, Sanjay Chawla, Reynold Cheng, Panos Chrysanthis, Xiqi Fei, Gabriel Ghinita, Anita Graser, Dimitrios Gunopulos, Christian Jensen, Joon-Seok Kim, Kyoung-Sook Kim, Peer Kröger, John Krumm, Johannes Lauer, Amr Magdy, Mario Nascimento , et al. (23 additional authors not shown)

    Abstract: Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences… ▽ More

    Submitted 7 March, 2024; v1 submitted 21 June, 2023; originally announced July 2023.

    Comments: Updated to reflect the major revision for ACM Transactions on Spatial Algorithms and Systems (TSAS). This version reflects the final version accepted by ACM TSAS

  10. arXiv:2305.01087  [pdf, other

    cs.DS

    An Update-intensive LSM-based R-tree Index

    Authors: Jaewoo Shin, Jianguo Wang, Walid G. Aref

    Abstract: Many applications require update-intensive workloads on spatial objects, e.g., social-network services and shared-riding services that track moving objects. By buffering insert and delete operations in memory, the Log Structured Merge Tree (LSM) has been used widely in various systems because of its ability to handle write-heavy workloads. While the focus on LSM has been on key-value stores and th… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

  11. arXiv:2304.09983  [pdf

    cs.DB

    Tutorial: The Ubiquitous Skiplist, its Variants, and Applications in Modern Big Data Systems

    Authors: Venkata Sai Pavan Kumar Vadrevu, Lu Xing, Walid G. Aref

    Abstract: The Skiplist, or skip list, originally designed as an in-memory data structure, has attracted a lot of attention in recent years as a main-memory component in many NoSQL, cloud-based, and big data systems. Unlike the B-tree, the skiplist does not need complex rebalancing mechanisms, but it still shows expected logarithmic performance. It supports a variety of operations, including insert, point re… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

  12. arXiv:2207.03027  [pdf, other

    cs.DB

    The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation

    Authors: Ruihong Wang, Jianguo Wang, Stratos Idreos, M. Tamer Özsu, Walid G. Aref

    Abstract: Memory disaggregation (MD) allows for scalable and elastic data center design by separating compute (CPU) from memory. With MD, compute and memory are no longer coupled into the same server box. Instead, they are connected to each other via ultra-fast networking such as RDMA. MD can bring many advantages, e.g., higher memory utilization, better independent scaling (of compute and memory), and lowe… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

  13. arXiv:2207.00550  [pdf, other

    cs.DB cs.LG

    The "AI+R"-tree: An Instance-optimized R-tree

    Authors: Abdullah-Al-Mamun, Ch. Md. Rakin Haider, Jianguo Wang, Walid G. Aref

    Abstract: The emerging class of instance-optimized systems has shown potential to achieve high performance by specializing to a specific data and query workloads. Particularly, Machine Learning (ML) techniques have been applied successfully to build various instance-optimized components (e.g., learned indexes). This paper investigates to leverage ML techniques to enhance the performance of spatial indexes,… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

    Comments: To appear in the proceedings of The 23rd IEEE International Conference on Mobile Data Management (2022)

  14. arXiv:2206.09520  [pdf, other

    cs.DB

    ILX: Intelligent "Location+X" Data Systems (Vision Paper)

    Authors: Walid G. Aref, Ahmed M. Aly, Anas Daghistani, Yeasir Rayhan, Jianguo Wang, Libin Zhou

    Abstract: Due to the ubiquity of mobile phones and location-detection devices, location data is being generated in very large volumes. Queries and operations that are performed on location data warrant the use of database systems. Despite that, location data is being supported in data systems as an afterthought. Typically, relational or NoSQL data systems that are mostly designed with non-location data in m… ▽ More

    Submitted 1 August, 2022; v1 submitted 19 June, 2022; originally announced June 2022.

  15. An Experimental Evaluation and Investigation of Waves of Misery in R-trees

    Authors: Lu Xing, Eric Lee, Tong An, Bo-Cheng Chu, Ahmed Mahmood, Ahmed M. Aly, Jianguo Wang, Walid G. Aref

    Abstract: Waves of misery is a phenomenon where spikes of many node splits occur over short periods of time in tree indexes. Waves of misery negatively affect the performance of tree indexes in insertion-heavy workloads.Waves of misery have been first observed in the context of the B-tree, where these waves cause unpredictable index performance. In particular, the performance of search and index-update oper… ▽ More

    Submitted 24 December, 2021; originally announced December 2021.

    Comments: To appear in VLDB 2022

  16. arXiv:2110.01767  [pdf, ps, other

    cs.DB

    Scalable Relational Query Processing on Big Matrix Data

    Authors: Yongyang Yu, Mingjie Tang, Walid G. Aref

    Abstract: The use of large-scale machine learning methods is becoming ubiquitous in many applications ranging from business intelligence to self-driving cars. These methods require a complex computation pipeline consisting of various types of operations, e.g., relational operations for pre-processing or post-processing the dataset, and matrix operations for core model computations. Many existing systems foc… ▽ More

    Submitted 9 November, 2021; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: 29 pages, 11 figures, 6 tables

  17. arXiv:2012.06171  [pdf, other

    cs.DC cs.DB

    The Future is Big Graphs! A Community View on Graph Processing Systems

    Authors: Sherif Sakr, Angela Bonifati, Hannes Voigt, Alexandru Iosup, Khaled Ammar, Renzo Angles, Walid Aref, Marcelo Arenas, Maciej Besta, Peter A. Boncz, Khuzaima Daudjee, Emanuele Della Valle, Stefania Dumbrava, Olaf Hartig, Bernhard Haslhofer, Tim Hegeman, Jan Hidders, Katja Hose, Adriana Iamnitchi, Vasiliki Kalavri, Hugo Kapp, Wim Martens, M. Tamer Özsu, Eric Peukert, Stefan Plantikow , et al. (16 additional authors not shown)

    Abstract: Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue t… ▽ More

    Submitted 11 December, 2020; originally announced December 2020.

    Comments: 12 pages, 3 figures, collaboration between the large-scale systems and data management communities, work started at the Dagstuhl Seminar 19491 on Big Graph Processing Systems, to be published in the Communications of the ACM

    ACM Class: C.3; E.0; H.2; J.0

  18. arXiv:2008.13028  [pdf, other

    cs.DB cs.HC

    STULL: Unbiased Online Sampling for Visual Exploration of Large Spatiotemporal Data

    Authors: Guizhen Wang, **g**g Guo, Mingjie Tang, José Florencio de Queiroz Neto, Calvin Yau, Anas Daghistani, Morteza Karimzadeh, Walid G. Aref, David S. Ebert

    Abstract: Online sampling-supported visual analytics is increasingly important, as it allows users to explore large datasets with acceptable approximate answers at interactive rates. However, existing online spatiotemporal sampling techniques are often biased, as most researchers have primarily focused on reducing computational latency. Biased sampling approaches select data with unequal probabilities and p… ▽ More

    Submitted 29 August, 2020; originally announced August 2020.

    Comments: IEEE VIS (InfoVis/VAST/SciVis) 2020 ACM 2012 CCS - Human-centered computing, Visualization, Visualization design and evaluation methods

    ACM Class: H.3.3

  19. arXiv:2002.11862  [pdf, other

    cs.DB

    SWARM: Adaptive Load Balancing in Distributed Streaming Systems for Big Spatial Data

    Authors: Anas Daghistani, Walid G. Aref, Arif Ghafoor, Ahmed R. Mahmood

    Abstract: The proliferation of GPS-enabled devices has led to the development of numerous location-based services. These services need to process massive amounts of spatial data in real-time. The current scale of spatial data cannot be handled using centralized systems. This has led to the development of distributed spatial streaming systems. Existing systems are using static spatial partitioning to distrib… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

  20. arXiv:1907.03736  [pdf, other

    cs.DB

    LocationSpark: In-memory Distributed Spatial Query Processing and Optimization

    Authors: Mingjie Tang, Yongyang Yu, Walid G. Aref, Ahmed R. Mahmood, Qutaibah M. Malluhi, Mourad Ouzzani

    Abstract: Due to the ubiquity of spatial data applications and the large amounts of spatial data that these applications generate and process, there is a pressing need for scalable spatial query processing. In this paper, we present new techniques for spatial query processing and optimization in an in-memory and distributed setup to address scalability. More specifically, we introduce new techniques for han… ▽ More

    Submitted 16 July, 2019; v1 submitted 8 July, 2019; originally announced July 2019.

  21. arXiv:1810.02061  [pdf, other

    cs.CR

    Design and Evaluation of A Data Partitioning-Based Intrusion Management Architecture for Database Systems

    Authors: Muhamad Felemban, Yahya Javeed, Jason Kobes, Thamir Qadah, Arif Ghafoor, Walid Aref

    Abstract: Data-intensive applications exhibit increasing reliance on Database Management Systems (DBMSs, for short). With the growing cyber-security threats to government and commercial infrastructures, the need to develop high resilient cyber systems is becoming increasingly important. Cyber-attacks on DBMSs include intrusion attacks that may result in severe degradation in performance. Several efforts hav… ▽ More

    Submitted 5 October, 2018; v1 submitted 4 October, 2018; originally announced October 2018.

  22. arXiv:1712.09437  [pdf, other

    cs.DB

    Pattern-Driven Data Cleaning

    Authors: El Kindi Rezig, Mourad Ouzzani, Walid G. Aref, Ahmed K. Elmagarmid, Ahmed R. Mahmood

    Abstract: Data is inherently dirty and there has been a sustained effort to come up with different approaches to clean it. A large class of data repair algorithms rely on data-quality rules and integrity constraints to detect and repair the data. A well-studied class of integrity constraints is Functional Dependencies (FDs, for short) that specify dependencies among attributes in a relation. In this paper,… ▽ More

    Submitted 26 December, 2017; originally announced December 2017.

  23. arXiv:1712.08971  [pdf, other

    cs.DB

    Human-Centric Data Cleaning [Vision]

    Authors: El Kindi Rezig, Mourad Ouzzani, Ahmed K. Elmagarmid, Walid G. Aref

    Abstract: Data Cleaning refers to the process of detecting and fixing errors in the data. Human involvement is instrumental at several stages of this process, e.g., to identify and repair errors, to validate computed repairs, etc. There is currently a plethora of data cleaning algorithms addressing a wide range of data errors (e.g., detecting duplicates, violations of integrity constraints, missing values,… ▽ More

    Submitted 30 December, 2017; v1 submitted 24 December, 2017; originally announced December 2017.

  24. arXiv:1709.06723  [pdf, other

    cs.DB

    SBG-Sketch: A Self-Balanced Sketch for Labeled-Graph Stream Summarization

    Authors: Mohamed S. Hassan, Bruno Ribeiro, Walid G. Aref

    Abstract: Applications in various domains rely on processing graph streams, e.g., communication logs of a cloud-troubleshooting system, road-network traffic updates, and interactions on a social network. A labeled-graph stream refers to a sequence of streamed edges that form a labeled graph. Label-aware applications need to filter the graph stream before performing a graph operation. Due to the large volume… ▽ More

    Submitted 20 September, 2017; originally announced September 2017.

  25. arXiv:1709.06715  [pdf, other

    cs.DB

    Empowering In-Memory Relational Database Engines with Native Graph Processing

    Authors: Mohamed S. Hassan, Tatiana Kuznetsova, Hyun Chai Jeong, Walid G. Aref, Mohammad Sadoghi

    Abstract: The plethora of graphs and relational data give rise to many interesting graph-relational queries in various domains, e.g., finding related proteins satisfying relational predicates in a biological network. The maturity of RDBMSs motivated academia and industry to invest efforts in leveraging RDBMSs for graph processing, where efficiency is proven for vital graph queries. However, none of these ef… ▽ More

    Submitted 12 October, 2017; v1 submitted 19 September, 2017; originally announced September 2017.

  26. arXiv:1709.02533  [pdf, other

    cs.DC

    Adaptive Processing of Spatial-Keyword Data Over a Distributed Streaming Cluster

    Authors: Ahmed R. Mahmood, Anas Daghistani, Ahmed M. Aly, Walid G. Aref, Mingjie Tang, Saleh Basalamah, Sunil Prabhakar

    Abstract: The widespread use of GPS-enabled smartphones along with the popularity of micro-blogging and social networking applications, e.g., Twitter and Facebook, has resulted in the generation of huge streams of geo-tagged textual data. Many applications require real-time processing of these streams. For example, location-based e-coupon and ad-targeting systems enable advertisers to register millions of a… ▽ More

    Submitted 8 September, 2017; originally announced September 2017.

  27. arXiv:1709.02529  [pdf, other

    cs.DB

    FAST: Frequency-Aware Spatio-Textual Indexing for In-Memory Continuous Filter Query Processing

    Authors: Ahmed R. Mahmood, Ahmed M. Aly, Walid G. Aref

    Abstract: Many applications need to process massive streams of spatio-textual data in real-time against continuous spatio-textual queries. For example, in location-aware ad targeting publish/subscribe systems, it is required to disseminate millions of ads and promotions to millions of users based on the locations and textual profiles of users. In this paper, we study indexing of continuous spatio-textual qu… ▽ More

    Submitted 4 October, 2017; v1 submitted 8 September, 2017; originally announced September 2017.

  28. arXiv:1705.02044  [pdf, ps, other

    cs.DS

    A Survey of Shortest-Path Algorithms

    Authors: Amgad Madkour, Walid G. Aref, Faizan Ur Rehman, Mohamed Abdur Rahman, Saleh Basalamah

    Abstract: A shortest-path algorithm finds a path containing the minimal cost between two vertices in a graph. A plethora of shortest-path algorithms is studied in the literature that span across multiple disciplines. This paper presents a survey of shortest-path algorithms based on a taxonomy that is introduced in the paper. One dimension of this taxonomy is the various flavors of the shortest-path problem.… ▽ More

    Submitted 4 May, 2017; originally announced May 2017.

  29. arXiv:1412.4303  [pdf, other

    cs.DB

    On Order-independent Semantics of the Similarity Group-By Relational Database Operator

    Authors: Mingjie Tang, Ruby Y. Tahboub, Walid G. Aref, Qutaibah M. Malluhi, Mourad Ouzzani

    Abstract: Similarity group-by (SGB, for short) has been proposed as a relational database operator to match the needs of emerging database applications. Many SGB operators that extend SQL have been proposed in the literature, e.g., similarity operators in the one-dimensional space. These operators have various semantics. Depending on how these operators are implemented, some of the implementations may lead… ▽ More

    Submitted 13 December, 2014; originally announced December 2014.

    Comments: 13 pages

  30. arXiv:1208.0074  [pdf, other

    cs.DB

    Spatial Queries with Two kNN Predicates

    Authors: Ahmed M. Aly, Walid G. Aref, Mourad Ouzzani

    Abstract: The widespread use of location-aware devices has led to countless location-based services in which a user query can be arbitrarily complex, i.e., one that embeds multiple spatial selection and join predicates. Amongst these predicates, the k-Nearest-Neighbor (kNN) predicate stands as one of the most important and widely used predicates. Unlike related research, this paper goes beyond the optimizat… ▽ More

    Submitted 31 July, 2012; originally announced August 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 11, pp. 1100-1111 (2012)

  31. arXiv:cs/0612127  [pdf, ps, other

    cs.DB

    bdbms -- A Database Management System for Biological Data

    Authors: Mohamed Y. Eltabakh, Mourad Ouzzani, Walid G. Aref

    Abstract: Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database man… ▽ More

    Submitted 22 December, 2006; originally announced December 2006.

    Comments: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, USA