Skip to main content

Showing 1–5 of 5 results for author: Eltabakh, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.09637  [pdf, other

    cs.DB

    climber++: Pivot-Based Approximate Similarity Search over Big Data Series

    Authors: Liang Zhang, Mohamed Y. Eltabakh, Elke A. Rundensteiner, Khalid Alnuaim

    Abstract: The generation and collection of big data series are becoming an integral part of many emerging applications in sciences, IoT, finance, and web applications among several others. The terabyte-scale of data series has motivated recent efforts to design fully distributed techniques for supporting operations such as approximate kNN similarity search, which is a building block operation in most analyt… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 16 pages, 14 figures, 1 table

    Journal ref: ICDE 2024

  2. arXiv:2306.00932  [pdf

    cs.AI cs.DB

    Cross Modal Data Discovery over Structured and Unstructured Data Lakes

    Authors: Mohamed Y. Eltabakh, Mayuresh Kunjir, Ahmed Elmagarmid, Mohammad Shahmeer Ahmad

    Abstract: Organizations are collecting increasingly large amounts of data for data driven decision making. These data are often dumped into a centralized repository, e.g., a data lake, consisting of thousands of structured and unstructured datasets. Perversely, such mixture of datasets makes the problem of discovering elements (e.g., tables or documents) that are relevant to a user's query or an analytical… ▽ More

    Submitted 16 July, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Report number: 17

  3. arXiv:2303.16909  [pdf, other

    cs.DB cs.AI

    RetClean: Retrieval-Based Data Cleaning Using Foundation Models and Data Lakes

    Authors: Mohammad Shahmeer Ahmad, Zan Ahmad Naeem, Mohamed Eltabakh, Mourad Ouzzani, Nan Tang

    Abstract: Can foundation models (such as ChatGPT) clean your data? In this proposal, we demonstrate that indeed ChatGPT can assist in data cleaning by suggesting corrections for specific cells in a data table (scenario 1). However, ChatGPT may struggle with datasets it has never encountered before (e.g., local enterprise data) or when the user requires an explanation of the source of the suggested clean val… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

  4. arXiv:2303.06720  [pdf, other

    cs.DB

    QTrail-DB: A Query Processing Engine for Imperfect Databases with Evolving Qualities

    Authors: Maha Asiri, Mohamed Y. Eltabakh

    Abstract: Imperfect databases are very common in many applications due to various reasons ranging from data-entry errors, transmission or integration errors, and wrong instruments' readings, to faulty experimental setups leading to incorrect results. The management and query processing of imperfect databases is a very challenging problem as it requires incorporating the data's qualities within the database… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

    Comments: 17 Pages, 13 Figures

  5. arXiv:cs/0612127  [pdf, ps, other

    cs.DB

    bdbms -- A Database Management System for Biological Data

    Authors: Mohamed Y. Eltabakh, Mourad Ouzzani, Walid G. Aref

    Abstract: Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database man… ▽ More

    Submitted 22 December, 2006; originally announced December 2006.

    Comments: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, USA