Skip to main content

Showing 1–17 of 17 results for author: Stockinger, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15015  [pdf, other

    cs.DB cs.AI cs.CL

    GraLMatch: Matching Groups of Entities with Graphs and Language Models

    Authors: Fernando De Meer Pardo, Claude Lehmann, Dennis Gehrig, Andrea Nagy, Stefano Nicoli, Branka Hadji Misheva, Martin Braschler, Kurt Stockinger

    Abstract: In this paper, we present an end-to-end multi-source Entity Matching problem, which we call entity group matching, where the goal is to assign to the same group, records originating from multiple data sources but representing the same real-world entity. We focus on the effects of transitively matched records, i.e. the records connected by paths in the graph G = (V,E) whose nodes and edges represen… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures, accepted as research paper at EDBT 2025

  2. arXiv:2406.03170  [pdf, other

    cs.CL

    StatBot.Swiss: Bilingual Open Data Exploration in Natural Language

    Authors: Farhad Nooralahzadeh, Yi Zhang, Ellery Smith, Sabine Maennel, Cyril Matthey-Doret, Raphaël de Fondville, Kurt Stockinger

    Abstract: The potential for improvements brought by Large Language Models (LLMs) in Text-to-SQL systems is mostly assessed on monolingual English datasets. However, LLMs' performance for other languages remains vastly unexplored. In this work, we release the StatBot.Swiss dataset, the first bilingual benchmark for evaluating Text-to-SQL systems based on real-world applications. The StatBot.Swiss dataset con… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: This work is accepted at ACL Findings 2024

  3. arXiv:2402.08349  [pdf, other

    cs.DB cs.AI cs.CL

    Evaluating the Data Model Robustness of Text-to-SQL Systems Based on Real User Queries

    Authors: Jonathan Fürst, Catherine Kosten, Farhard Nooralahzadeh, Yi Zhang, Kurt Stockinger

    Abstract: Text-to-SQL systems (also known as NL-to-SQL systems) have become an increasingly popular solution for bridging the gap between user capabilities and SQL-based data access. These systems translate user requests in natural language to valid SQL statements for a specific database. Recent Text-to-SQL systems have benefited from the rapid improvement of transformer-based language models. However, whil… ▽ More

    Submitted 18 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  4. Spider4SPARQL: A Complex Benchmark for Evaluating Knowledge Graph Question Answering Systems

    Authors: Catherine Kosten, Philippe Cudré-Mauroux, Kurt Stockinger

    Abstract: With the recent spike in the number and availability of Large Language Models (LLMs), it has become increasingly important to provide large and realistic benchmarks for evaluating Knowledge Graph Question Answering (KGQA) systems. So far the majority of benchmarks rely on pattern-based SPARQL query generation approaches. The subsequent natural language (NL) question generation is conducted through… ▽ More

    Submitted 8 December, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: 10 pages, 5 figures, accepted at IEEE BigData Conference 2023, 8th IEEE Special Session on Machine Learning on Big Data (MLBD 2023)

    Journal ref: IEEE International Conference on Big Data 2023

  5. arXiv:2309.01551  [pdf, other

    cs.DB

    Is Your Learned Query Optimizer Behaving As You Expect? A Machine Learning Perspective

    Authors: Claude Lehmann, Pavel Sulimov, Kurt Stockinger

    Abstract: The current boom of learned query optimizers (LQO) can be explained not only by the general continuous improvement of deep learning (DL) methods but also by the straightforward formulation of a query optimization problem (QOP) as a machine learning (ML) one. The idea is often to replace dynamic programming approaches, widespread for solving QOP, with more powerful methods such as reinforcement lea… ▽ More

    Submitted 26 February, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

    Journal ref: PVLDB Volume 17, 2023-2024

  6. arXiv:2307.00933  [pdf, other

    cs.CL cs.CE cs.DB

    Data-Driven Information Extraction and Enrichment of Molecular Profiling Data for Cancer Cell Lines

    Authors: Ellery Smith, Rahel Paloots, Dimitris Giagkos, Michael Baudis, Kurt Stockinger

    Abstract: With the proliferation of research means and computational methodologies, published biomedical literature is growing exponentially in numbers and volume. Cancer cell lines are frequently used models in biological and medical research that are currently applied for a wide range of purposes, from studies of cellular mechanisms to drug development, which has led to a wealth of related data and public… ▽ More

    Submitted 12 February, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

  7. arXiv:2306.04743  [pdf, other

    cs.DB cs.AI cs.CL

    ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems

    Authors: Yi Zhang, Jan Deriu, George Katsogiannis-Meimarakis, Catherine Kosten, Georgia Koutrika, Kurt Stockinger

    Abstract: Natural Language to SQL systems (NL-to-SQL) have recently shown a significant increase in accuracy for natural language to SQL query translation. This improvement is due to the emergence of transformer-based language models, and the popularity of the Spider benchmark - the de-facto standard for evaluating NL-to-SQL systems. The top NL-to-SQL systems reach accuracies of up to 85\%. However, Spider… ▽ More

    Submitted 5 December, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: 12 pages, 2 figures, 5 tables

    ACM Class: H.2.4; I.2.7

    Journal ref: PVLDB Volume 17, 2023-2024

  8. arXiv:2107.10508  [pdf, other

    cs.DB cs.AI quant-ph

    Multiple Query Optimization using a Hybrid Approach of Classical and Quantum Computing

    Authors: Tobias Fankhauser, Marc E. Solèr, Rudolf M. Füchslin, Kurt Stockinger

    Abstract: Quantum computing promises to solve difficult optimization problems in chemistry, physics and mathematics more efficiently than classical computers, but requires fault-tolerant quantum computers with millions of qubits. To overcome errors introduced by today's quantum computers, hybrid algorithms combining classical and quantum computers are used. In this paper we tackle the multiple query optimiz… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

    Comments: 18 pages, 16 figures

  9. arXiv:2104.13744  [pdf, other

    cs.DB

    Bio-SODA: Enabling Natural Language Question Answering over Knowledge Graphs without Training Data

    Authors: Ana Claudia Sima, Tarcisio Mendes de Farias, Maria Anisimova, Christophe Dessimoz, Marc Robinson-Rechavi, Erich Zbinden, Kurt Stockinger

    Abstract: The problem of natural language processing over structured data has become a growing research field, both within the relational database and the Semantic Web community, with significant efforts involved in question answering over knowledge graphs (KGQA). However, many of these approaches are either specifically targeted at open-domain question answering using DBpedia, or require large training dat… ▽ More

    Submitted 14 June, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

    Journal ref: 33rd International Conference on Scientific and Statistical Database Management (SSDBM 2021)

  10. arXiv:2104.04194  [pdf, other

    cs.LG cs.AI cs.DB

    INODE: Building an End-to-End Data Exploration System in Practice [Extended Vision]

    Authors: Sihem Amer-Yahia, Georgia Koutrika, Frederic Bastian, Theofilos Belmpas, Martin Braschler, Ursin Brunner, Diego Calvanese, Maximilian Fabricius, Orest Gkini, Catherine Kosten, Davide Lanti, Antonis Litke, Hendrik Lücke-Tieke, Francesco Alessandro Massucci, Tarcisio Mendes de Farias, Alessandro Mosca, Francesco Multari, Nikolaos Papadakis, Dimitris Papadopoulos, Yogendra Patil, Aurélien Personnaz, Guillem Rull, Ana Sima, Ellery Smith, Dimitrios Skoutas , et al. (3 additional authors not shown)

    Abstract: A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE -- an end-to-end data expl… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: 8 pages, 5 figures

    ACM Class: I.2; H.2

  11. arXiv:2006.00888  [pdf, other

    cs.DB cs.AI

    ValueNet: A Natural Language-to-SQL System that Learns from Database Information

    Authors: Ursin Brunner, Kurt Stockinger

    Abstract: Building natural language (NL) interfaces for databases has been a long-standing challenge for several decades. The major advantage of these so-called NL-to-SQL systems is that end-users can query complex databases without the need to know SQL or the underlying database schema. Due to significant advancements in machine learning, the recent focus of research has been on neural networks to tackle t… ▽ More

    Submitted 22 February, 2021; v1 submitted 29 May, 2020; originally announced June 2020.

    Journal ref: 37th IEEE International Conference on Data Engineering (ICDE 2021)

  12. arXiv:2004.07633  [pdf, other

    cs.AI cs.CL cs.LG

    A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation

    Authors: Jan Deriu, Katsiaryna Mlynchyk, Philippe Schläpfer, Alvaro Rodrigo, Dirk von Grünigen, Nicolas Kaiser, Kurt Stockinger, Eneko Agirre, Mark Cieliebak

    Abstract: In this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database called Operation Trees (OT). This representation allows us to invert the annotation process without losing flexibility in the types of queries that we generate. Furt… ▽ More

    Submitted 25 June, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Journal ref: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020

  13. arXiv:1911.11689  [pdf, other

    cs.DB cs.AI cs.LG

    Join Query Optimization with Deep Reinforcement Learning Algorithms

    Authors: Jonas Heitz, Kurt Stockinger

    Abstract: Join query optimization is a complex task and is central to the performance of query processing. In fact it belongs to the class of NP-hard problems. Traditional query optimizers use dynamic programming (DP) methods combined with a set of rules and restrictions to avoid exhaustive enumeration of all possible join orders. However, DP methods are very resource intensive. Moreover, given simplifying… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

  14. A Comparative Survey of Recent Natural Language Interfaces for Databases

    Authors: Katrin Affolter, Kurt Stockinger, Abraham Bernstein

    Abstract: Over the last few years natural language interfaces (NLI) for databases have gained significant traction both in academia and industry. These systems use very different approaches as described in recent survey papers. However, these systems have not been systematically compared against a set of benchmark questions in order to rigorously evaluate their functionalities and expressive power. In thi… ▽ More

    Submitted 21 June, 2019; originally announced June 2019.

    Journal ref: VLDB Journal 2019

  15. arXiv:1906.01950  [pdf, other

    cs.DB cs.IR

    VoIDext: Vocabulary and Patterns for Enhancing Interoperable Datasets with Virtual Links

    Authors: Tarcisio Mendes de Farias, Kurt Stockinger, Christophe Dessimoz

    Abstract: Semantic heterogeneity remains a problem when interoperating with data from sources of different scopes and knowledge domains. Causes for this challenge are context-specific requirements (i.e. no "one model fits all"), different data modelling decisions, domain-specific purposes, and technical constraints. Moreover, even if the problem of semantic heterogeneity among different RDF publishers and k… ▽ More

    Submitted 8 September, 2019; v1 submitted 5 June, 2019; originally announced June 2019.

  16. arXiv:1808.01166  [pdf, ps, other

    cs.DC

    ViPIOS - VIenna Parallel Input Output System: Language, Compiler and Advanced Data Structure Support for Parallel I/O Operations

    Authors: Erich Schikuta, Helmut Wanek, Heinz Stockinger, Kurt Stockinger, Thomas Fürle, Oliver Jorns, Christoph Löffelhardt, Peter Brezany, Minh Dang, Thomas Mück

    Abstract: For an increasing number of data intensive scientific applications, parallel I/O concepts are a major performance issue. Tackling this issue, we develop an input/output system designed for highly efficient, scalable and conveniently usable parallel I/O on distributed memory systems. The main focus of this research is the parallel I/O runtime system support provided for software-generated programs… ▽ More

    Submitted 3 August, 2018; originally announced August 2018.

    Comments: 210 pages

  17. arXiv:1207.0134  [pdf, other

    cs.DB

    SODA: Generating SQL for Business Users

    Authors: Lukas Blunschi, Claudio Jossen, Donald Kossman, Magdalini Mori, Kurt Stockinger

    Abstract: The purpose of data warehouses is to enable business analysts to make better decisions. Over the years the technology has matured and data warehouses have become extremely successful. As a consequence, more and more data has been added to the data warehouses and their schemas have become increasingly complex. These systems still work great in order to generate pre-canned reports. However, with the… ▽ More

    Submitted 30 June, 2012; originally announced July 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 10, pp. 932-943 (2012)