Skip to main content

Showing 1–12 of 12 results for author: Bizer, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.09444  [pdf, other

    cs.CY cs.AI

    Desk-AId: Humanitarian Aid Desk Assessment with Geospatial AI for Predicting Landmine Areas

    Authors: Flavio Cirillo, Gürkan Solmaz, Yi-Hsuan Peng, Christian Bizer, Martin Jebens

    Abstract: The process of clearing areas, namely demining, starts by assessing and prioritizing potential hazardous areas (i.e., desk assessment) to go under thorough investigation of experts, who confirm the risk and proceed with the mines clearance operations. This paper presents Desk-AId that supports the desk assessment phase by estimating landmine risks using geospatial data and socioeconomic informatio… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  2. arXiv:2403.02130  [pdf, other

    cs.CL

    Using LLMs for the Extraction and Normalization of Product Attribute Values

    Authors: Alexander Brinkmann, Nick Baumann, Christian Bizer

    Abstract: Product offers on e-commerce websites often consist of a product title and a textual product description. In order to enable features such as faceted product search or to generate product comparison tables, it is necessary to extract structured attribute-value pairs from the unstructured product titles and descriptions and to normalize the extracted values to a single, unified scale for each attri… ▽ More

    Submitted 4 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  3. arXiv:2310.12537  [pdf, other

    cs.CL

    Product Attribute Value Extraction using Large Language Models

    Authors: Alexander Brinkmann, Roee Shraga, Christian Bizer

    Abstract: E-commerce platforms rely on structured product descriptions, in the form of attribute/value pairs to enable features such as faceted product search and product comparison. However, vendors on these platforms often provide unstructured product descriptions consisting of a title and a textual description. To process such offers, e-commerce platforms must extract attribute/value pairs from the unstr… ▽ More

    Submitted 26 January, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

  4. arXiv:2310.11244  [pdf, other

    cs.CL cs.LG

    Entity Matching using Large Language Models

    Authors: Ralph Peeters, Christian Bizer

    Abstract: Entity Matching is the task of deciding whether two entity descriptions refer to the same real-world entity and is a central step in most data integration pipelines. Many state-of-the-art entity matching methods rely on pre-trained language models (PLMs) such as BERT or RoBERTa. Two major drawbacks of these models for entity matching are that (i) the models require significant amounts of task-spec… ▽ More

    Submitted 5 June, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

  5. arXiv:2306.14921  [pdf, other

    cs.CL cs.IR

    Product Information Extraction using ChatGPT

    Authors: Alexander Brinkmann, Roee Shraga, Reng Chiz Der, Christian Bizer

    Abstract: Structured product data in the form of attribute/value pairs is the foundation of many e-commerce applications such as faceted product search, product comparison, and product recommendation. Product offers often only contain textual descriptions of the product attributes in the form of titles or free text. Hence, extracting attribute/value pairs from textual product descriptions is an essential en… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  6. arXiv:2306.00745  [pdf, other

    cs.CL

    Column Type Annotation using ChatGPT

    Authors: Keti Korini, Christian Bizer

    Abstract: Column type annotation is the task of annotating the columns of a relational table with the semantic type of the values contained in each column. Column type annotation is an important pre-processing step for data search and data integration in the context of data lakes. State-of-the-art column type annotation methods either rely on matching table columns to properties of a knowledge graph or fine… ▽ More

    Submitted 30 July, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

  7. arXiv:2305.03423  [pdf, other

    cs.CL

    Using ChatGPT for Entity Matching

    Authors: Ralph Peeters, Christian Bizer

    Abstract: Entity Matching is the task of deciding if two entity descriptions refer to the same real-world entity. State-of-the-art entity matching methods often rely on fine-tuning Transformer models such as BERT or RoBERTa. Two major drawbacks of using these models for entity matching are that (i) the models require significant amounts of fine-tuning data for reaching a good performance and (ii) the fine-t… ▽ More

    Submitted 22 June, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted and to be published in Proceedings of ADBIS 2023 as short paper (https://www.essi.upc.edu/dtim/ADBIS2023/index.html)

  8. arXiv:2303.03132  [pdf, other

    cs.DB cs.LG

    SC-Block: Supervised Contrastive Blocking within Entity Resolution Pipelines

    Authors: Alexander Brinkmann, Roee Shraga, Christian Bizer

    Abstract: The goal of entity resolution is to identify records in multiple datasets that represent the same real-world entity. However, comparing all records across datasets can be computationally intensive, leading to long runtimes. To reduce these runtimes, entity resolution pipelines are constructed of two parts: a blocker that applies a computationally cheap method to select candidate record pairs, and… ▽ More

    Submitted 23 June, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

  9. arXiv:2301.09521  [pdf, other

    cs.LG

    WDC Products: A Multi-Dimensional Entity Matching Benchmark

    Authors: Ralph Peeters, Reng Chiz Der, Christian Bizer

    Abstract: The difficulty of an entity matching task depends on a combination of multiple factors such as the amount of corner-case pairs, the fraction of entities in the test set that have not been seen during training, and the size of the development set. Current entity matching benchmarks usually represent single points in the space along such dimensions or they provide for the evaluation of matching meth… ▽ More

    Submitted 30 June, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

    Comments: Accepted and to be published in Proceedings of EDBT 2024 (https://dastlab.github.io/edbticdt2024/)

  10. Supervised Contrastive Learning for Product Matching

    Authors: Ralph Peeters, Christian Bizer

    Abstract: Contrastive learning has moved the state of the art for many tasks in computer vision and information retrieval in recent years. This poster is the first work that applies supervised contrastive learning to the task of product matching in e-commerce using product offers from different e-shops. More specifically, we employ a supervised contrastive learning technique to pre-train a Transformer encod… ▽ More

    Submitted 2 May, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

  11. Cross-Language Learning for Entity Matching

    Authors: Ralph Peeters, Christian Bizer

    Abstract: Transformer-based entity matching methods have significantly moved the state of the art for less-structured matching tasks such as matching product offers in e-commerce. In order to excel at these tasks, Transformer-based matching methods require a decent amount of training pairs. Providing enough training data can be challenging, especially if a matcher for non-English product descriptions should… ▽ More

    Submitted 2 May, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

  12. arXiv:1208.0291  [pdf, other

    cs.DB

    Learning Expressive Linkage Rules using Genetic Programming

    Authors: Robert Isele, Christian Bizer

    Abstract: A central problem in data integration and data cleansing is to find entities in different data sources that describe the same real-world object. Many existing methods for identifying such entities rely on explicit linkage rules which specify the conditions that entities must fulfill in order to be considered to describe the same real-world object. In this paper, we present the GenLink algorithm fo… ▽ More

    Submitted 1 August, 2012; originally announced August 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 11, pp. 1638-1649 (2012)