Skip to main content

Showing 1–14 of 14 results for author: Hassanzadeh, O

.
  1. arXiv:2407.01619  [pdf, other

    cs.LG cs.AI cs.DB

    TabSketchFM: Sketch-based Tabular Representation Learning for Data Discovery over Data Lakes

    Authors: Aamod Khatiwada, Harsha Kokel, Ibrahim Abdelaziz, Subhajit Chaudhury, Julian Dolby, Oktie Hassanzadeh, Zhenhan Huang, Tejaswini Pedapati, Horst Samulowitz, Kavitha Srinivas

    Abstract: Enterprises have a growing need to identify relevant tables in data lakes; e.g. tables that are unionable, joinable, or subsets of each other. Tabular neural models can be helpful for such data discovery tasks. In this paper, we present TabSketchFM, a neural tabular model for data discovery over data lakes. First, we propose a novel pre-training sketch-based approach to enhance the effectiveness o… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2307.04217

  2. arXiv:2401.07237  [pdf, other

    cs.CL cs.AI

    Distilling Event Sequence Knowledge From Large Language Models

    Authors: Somin Wadhwa, Oktie Hassanzadeh, Debarun Bhattacharjya, Ken Barker, Jian Ni

    Abstract: Event sequence models have been found to be highly effective in the analysis and prediction of events. Building such models requires availability of abundant high-quality event sequence data. In certain applications, however, clean structured event sequences are not available, and automated sequence extraction results in data that is too noisy and incomplete. In this work, we explore the use of La… ▽ More

    Submitted 1 July, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

    Comments: In Proceedings of 23rd International Semantic Web Conference (ISWC), 2024

  3. arXiv:2312.02334  [pdf, other

    cs.CL cs.AI

    An Evaluation Framework for Map** News Headlines to Event Classes in a Knowledge Graph

    Authors: Steve Fonin Mbouadeu, Martin Lorenzo, Ken Barker, Oktie Hassanzadeh

    Abstract: Map** ongoing news headlines to event-related classes in a rich knowledge base can be an important component in a knowledge-based event analysis and forecasting solution. In this paper, we present a methodology for creating a benchmark dataset of news headlines mapped to event classes in Wikidata, and resources for the evaluation of methods that perform the map**. We use the dataset to study t… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Presented at CASE 2023 @ RANLP https://aclanthology.org/2023.case-1.6/

    Journal ref: Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2023)

  4. Event Prediction using Case-Based Reasoning over Knowledge Graphs

    Authors: Sola Shirai, Debarun Bhattacharjya, Oktie Hassanzadeh

    Abstract: Applying link prediction (LP) methods over knowledge graphs (KG) for tasks such as causal event prediction presents an exciting opportunity. However, typical LP models are ill-suited for this task as they are incapable of performing inductive link prediction for new, unseen event entities and they require retraining as knowledge is added or changed in the underlying KG. We introduce a case-based r… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: published at WWW '23: Proceedings of the ACM Web Conference 2023. Code base: https://github.com/solashirai/WWW-EvCBR

  5. arXiv:2309.11506  [pdf, other

    cs.IR cs.AI cs.CL

    Matching Table Metadata with Business Glossaries Using Large Language Models

    Authors: Elita Lobo, Oktie Hassanzadeh, Nhan Pham, Nandana Mihindukulasooriya, Dharmashankar Subramanian, Horst Samulowitz

    Abstract: Enterprises often own large collections of structured data in the form of large databases or an enterprise data lake. Such data collections come with limited metadata and strict access policies that could limit access to the data contents and, therefore, limit the application of classic retrieval and analysis solutions. As a result, there is a need for solutions that can effectively utilize the av… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: This paper is a work in progress with findings based on limited evidence. Please exercise discretion when interpreting the findings

  6. arXiv:2308.15027  [pdf, ps, other

    cs.IR cs.CL

    Improving Neural Ranking Models with Traditional IR Methods

    Authors: Anik Saha, Oktie Hassanzadeh, Alex Gittens, Jian Ni, Kavitha Srinivas, Bulent Yener

    Abstract: Neural ranking methods based on large transformer models have recently gained significant attention in the information retrieval community, and have been adopted by major commercial solutions. Nevertheless, they are computationally expensive to create, and require a great deal of labeled data for specialized corpora. In this paper, we explore a low resource alternative which is a bag-of-embedding… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: Short paper, 4 pages

  7. arXiv:2308.13560  [pdf, ps, other

    cs.DB

    Open Government Data Corpus for Table Search

    Authors: Michael Glass, Sugato Bagchi, Oktie Hassanzadeh, Gaetano Rossiello, Alfio Gliozzo

    Abstract: Increasing amounts of structured data can provide value for research and business if the relevant data can be located. Often the data is in a data lake without a consistent schema, making locating useful data challenging. Table search is a growing research area, but existing benchmarks have been limited to displayed tables. Tables sized and formatted for display in a Wikipedia page or ArXiv paper… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  8. arXiv:2308.03891  [pdf, other

    cs.CL

    A Cross-Domain Evaluation of Approaches for Causal Knowledge Extraction

    Authors: Anik Saha, Oktie Hassanzadeh, Alex Gittens, Jian Ni, Kavitha Srinivas, Bulent Yener

    Abstract: Causal knowledge extraction is the task of extracting relevant causes and effects from text by detecting the causal relation. Although this task is important for language understanding and knowledge discovery, recent works in this domain have largely focused on binary classification of a text segment as causal or non-causal. In this regard, we perform a thorough analysis of three sequence tagging… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  9. arXiv:2307.04217  [pdf, other

    cs.DB cs.AI

    LakeBench: Benchmarks for Data Discovery over Data Lakes

    Authors: Kavitha Srinivas, Julian Dolby, Ibrahim Abdelaziz, Oktie Hassanzadeh, Harsha Kokel, Aamod Khatiwada, Tejaswini Pedapati, Subhajit Chaudhury, Horst Samulowitz

    Abstract: Within enterprises, there is a growing need to intelligently navigate data lakes, specifically focusing on data discovery. Of particular importance to enterprises is the ability to find related tables in data repositories. These tables can be unionable, joinable, or subsets of each other. There is a dearth of benchmarks for these tasks in the public domain, with related work targeting private data… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

  10. arXiv:2205.03375  [pdf, other

    cs.AI

    Summary Markov Models for Event Sequences

    Authors: Debarun Bhattacharjya, Saurabh Sihag, Oktie Hassanzadeh, Liza Bialik

    Abstract: Datasets involving sequences of different types of events without meaningful time stamps are prevalent in many applications, for instance when extracted from textual corpora. We propose a family of models for such event sequences -- summary Markov models -- where the probability of observing an event type depends only on a summary of historical occurrences of its influencing set of event types. Th… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

    Comments: In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI) 2022

  11. arXiv:2112.14030  [pdf, other

    cs.DB

    Bipartite Graph Matching Algorithms for Clean-Clean Entity Resolution: An Empirical Evaluation

    Authors: George Papadakis, Vasilis Efthymiou, Emanouil Thanos, Oktie Hassanzadeh

    Abstract: Entity Resolution (ER) is the task of finding records that refer to the same real-world entities. A common scenario is when entities across two clean sources need to be resolved, which we refer to as Clean-Clean ER. In this paper, we perform an extensive empirical evaluation of 8 bipartite graph matching algorithms that take in as input a bipartite similarity graph and provide as output a set of m… ▽ More

    Submitted 25 February, 2022; v1 submitted 28 December, 2021; originally announced December 2021.

    Comments: extended version of paper accepted at EDBT 2022

  12. arXiv:0908.0567  [pdf

    cs.DB cs.CE cs.IR

    LinkedCT: A Linked Data Space for Clinical Trials

    Authors: Oktie Hassanzadeh, Anastasios Kementsietsidis, Lipyeow Lim, Renee J. Miller, Min Wang

    Abstract: The Linked Clinical Trials (LinkedCT) project aims at publishing the first open semantic web data source for clinical trials data. The database exposed by LinkedCT is generated by (1) transforming existing data sources of clinical trials into RDF, and (2) discovering semantic links between the records in the trials data and several other data sources. In this paper, we discuss several challenges… ▽ More

    Submitted 4 August, 2009; originally announced August 2009.

    Comments: 5 pages, 1 figure, 4 tables

    ACM Class: H.2.8; H.3.5; J.3

  13. arXiv:0907.2471  [pdf, ps, other

    cs.DB cs.IR

    Benchmarking Declarative Approximate Selection Predicates

    Authors: Oktie Hassanzadeh

    Abstract: Declarative data quality has been an active research topic. The fundamental principle behind a declarative approach to data quality is the use of declarative statements to realize data quality primitives on top of any relational data source. A primary advantage of such an approach is the ease of use and integration with existing applications. Several similarity predicates have been proposed in t… ▽ More

    Submitted 14 July, 2009; originally announced July 2009.

    Comments: 75 pages, 7 figures, February 2007, Masters Thesis at University of Toronto

    ACM Class: H.3.3

  14. arXiv:0907.1990  [pdf, ps, other

    cs.CE q-bio.BM

    Automated Protein Structure Classification: A Survey

    Authors: Oktie Hassanzadeh

    Abstract: Classification of proteins based on their structure provides a valuable resource for studying protein structure, function and evolutionary relationships. With the rapidly increasing number of known protein structures, manual and semi-automatic classification is becoming ever more difficult and prohibitively slow. Therefore, there is a growing need for automated, accurate and efficient classifica… ▽ More

    Submitted 13 July, 2009; originally announced July 2009.

    Comments: 14 pages, Technical Report CSRG-589, University of Toronto

    Report number: CSRG-589 ACM Class: J.3