Skip to main content

Showing 1–2 of 2 results for author: Schoening, H

.
  1. AutoCure: Automated Tabular Data Curation Technique for ML Pipelines

    Authors: Mohamed Abdelaal, Rashmi Koparde, Harald Schoening

    Abstract: Machine learning algorithms have become increasingly prevalent in multiple domains, such as autonomous driving, healthcare, and finance. In such domains, data preparation remains a significant challenge in develo** accurate models, requiring significant expertise and time investment to search the huge search space of well-suited data curation and transformation tools. To address this challenge,… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

  2. arXiv:2302.04702  [pdf, other

    cs.DB cs.LG

    REIN: A Comprehensive Benchmark Framework for Data Cleaning Methods in ML Pipelines

    Authors: Mohamed Abdelaal, Christian Hammacher, Harald Schoening

    Abstract: Nowadays, machine learning (ML) plays a vital role in many aspects of our daily life. In essence, building well-performing ML applications requires the provision of high-quality data throughout the entire life-cycle of such applications. Nevertheless, most of the real-world tabular data suffer from different types of discrepancies, such as missing values, outliers, duplicates, pattern violation, a… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.