Skip to main content

Showing 1–8 of 8 results for author: Gijsbers, P

.
  1. arXiv:2403.19546  [pdf, other

    cs.LG cs.AI cs.DB cs.IR

    Croissant: A Metadata Format for ML-Ready Datasets

    Authors: Mubashara Akhtar, Omar Benjelloun, Costanza Conforti, Pieter Gijsbers, Joan Giner-Miguelez, Nitisha Jain, Michael Kuchnik, Quentin Lhoest, Pierre Marcenac, Manil Maskey, Peter Mattson, Luis Oala, Pierre Ruyssen, Rajat Shinde, Elena Simperl, Goeffry Thomas, Slava Tykhonov, Joaquin Vanschoren, Jos van der Velde, Steffen Vogler, Carole-Jean Wu

    Abstract: Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is… ▽ More

    Submitted 30 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: Published in Proceedings of ACM SIGMOD/PODS'24 Data Management for End-to-End Machine Learning (DEEM) Workshop https://dl.acm.org/doi/10.1145/3650203.3663326

  2. arXiv:2207.12560  [pdf, other

    cs.LG stat.ML

    AMLB: an AutoML Benchmark

    Authors: Pieter Gijsbers, Marcos L. P. Bueno, Stefan Coors, Erin LeDell, Sébastien Poirier, Janek Thomas, Bernd Bischl, Joaquin Vanschoren

    Abstract: Comparing different AutoML frameworks is notoriously challenging and often done incorrectly. We introduce an open and extensible benchmark that follows best practices and avoids common mistakes when comparing AutoML frameworks. We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks. The differences between the AutoML frameworks are explo… ▽ More

    Submitted 16 November, 2023; v1 submitted 25 July, 2022; originally announced July 2022.

    Comments: UNDER REVIEW: Revised submission to JMLR, with updated results from June 2023

  3. Meta-Learning for Symbolic Hyperparameter Defaults

    Authors: Pieter Gijsbers, Florian Pfisterer, Jan N. van Rijn, Bernd Bischl, Joaquin Vanschoren

    Abstract: Hyperparameter optimization in machine learning (ML) deals with the problem of empirically learning an optimal algorithm configuration from data, usually formulated as a black-box optimization problem. In this work, we propose a zero-shot method to meta-learn symbolic default hyperparameter configurations that are expressed in terms of the properties of the dataset. This enables a much faster, but… ▽ More

    Submitted 11 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Pieter Gijsbers and Florian Pfisterer contributed equally to the paper. V1: Two page GECCO poster paper accepted at GECCO 2021. V2: The original full length paper (8 pages) with appendix

  4. GAMA: a General Automated Machine learning Assistant

    Authors: Pieter Gijsbers, Joaquin Vanschoren

    Abstract: The General Automated Machine learning Assistant (GAMA) is a modular AutoML system developed to empower users to track and control how AutoML algorithms search for optimal machine learning pipelines, and facilitate AutoML research itself. In contrast to current, often black-box systems, GAMA allows users to plug in different AutoML and post-processing techniques, logs and visualizes the search pro… ▽ More

    Submitted 7 October, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: Accepted at ECML-PKDD 2020 Demo Track

    Journal ref: Lecture Notes in Computer Science, vol 12461 (2021). p560-564

  5. arXiv:1911.02490  [pdf, other

    cs.LG stat.ML

    OpenML-Python: an extensible Python API for OpenML

    Authors: Matthias Feurer, Jan N. van Rijn, Arlind Kadra, Pieter Gijsbers, Neeratyoy Mallik, Sahithya Ravi, Andreas Müller, Joaquin Vanschoren, Frank Hutter

    Abstract: OpenML is an online platform for open science collaboration in machine learning, used to share datasets and results of machine learning experiments. In this paper we introduce OpenML-Python, a client API for Python, opening up the OpenML platform for a wide range of Python-based tools. It provides easy access to all datasets, tasks and experiments on OpenML from within Python. It also provides fun… ▽ More

    Submitted 23 June, 2021; v1 submitted 6 November, 2019; originally announced November 2019.

    Journal ref: Journal of Machine Learning Research 22(100), 2021

  6. arXiv:1907.00909  [pdf, other

    cs.LG stat.ML

    An Open Source AutoML Benchmark

    Authors: Pieter Gijsbers, Erin LeDell, Janek Thomas, Sébastien Poirier, Bernd Bischl, Joaquin Vanschoren

    Abstract: In recent years, an active field of research has developed around automated machine learning (AutoML). Unfortunately, comparing different AutoML systems is hard and often done incorrectly. We introduce an open, ongoing, and extensible benchmark framework which follows best practices and avoids common mistakes. The framework is open-source, uses public datasets and has a website with up-to-date res… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: Accepted paper at the AutoML Workshop at ICML 2019. Code: https://github.com/openml/automlbenchmark/ Accompanying website: https://openml.github.io/automlbenchmark/

  7. arXiv:1801.06007  [pdf, ps, other

    cs.NE cs.AI

    Layered TPOT: Speeding up Tree-based Pipeline Optimization

    Authors: Pieter Gijsbers, Joaquin Vanschoren, Randal S. Olson

    Abstract: With the demand for machine learning increasing, so does the demand for tools which make it easier to use. Automated machine learning (AutoML) tools have been developed to address this need, such as the Tree-Based Pipeline Optimization Tool (TPOT) which uses genetic programming to build optimal pipelines. We introduce Layered TPOT, a modification to TPOT which aims to create pipelines equally good… ▽ More

    Submitted 12 March, 2018; v1 submitted 18 January, 2018; originally announced January 2018.

    Comments: Update to include a reference to Zutty et al. after it was brought to our attention

  8. arXiv:1708.03731  [pdf, other

    stat.ML cs.LG

    OpenML Benchmarking Suites

    Authors: Bernd Bischl, Giuseppe Casalicchio, Matthias Feurer, Pieter Gijsbers, Frank Hutter, Michel Lang, Rafael G. Mantovani, Jan N. van Rijn, Joaquin Vanschoren

    Abstract: Machine learning research depends on objectively interpretable, comparable, and reproducible algorithm benchmarks. We advocate the use of curated, comprehensive suites of machine learning tasks to standardize the setup, execution, and reporting of benchmarks. We enable this through software tools that help to create and leverage these benchmarking suites. These are seamlessly integrated into the O… ▽ More

    Submitted 22 November, 2021; v1 submitted 11 August, 2017; originally announced August 2017.

    Comments: Accepted for publication in the Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS 2021)

    Journal ref: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2021)