Skip to main content

Showing 1–5 of 5 results for author: Gorishniy, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19380  [pdf, other

    cs.LG

    TabReD: A Benchmark of Tabular Machine Learning in-the-Wild

    Authors: Ivan Rubachev, Nikolay Kartashev, Yury Gorishniy, Artem Babenko

    Abstract: Benchmarks that closely reflect downstream application scenarios are essential for the streamlined adoption of new research in tabular machine learning (ML). In this work, we examine existing tabular benchmarks and find two common characteristics of industry-grade tabular data that are underrepresented in the datasets available to the academic community. First, tabular data often changes over time… ▽ More

    Submitted 1 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/yandex-research/tabred (V2: fix the link to the code in this comment; no changes to the PDF)

  2. arXiv:2307.14338  [pdf, other

    cs.LG

    TabR: Tabular Deep Learning Meets Nearest Neighbors in 2023

    Authors: Yury Gorishniy, Ivan Rubachev, Nikolay Kartashev, Daniil Shlenskii, Akim Kotelnikov, Artem Babenko

    Abstract: Deep learning (DL) models for tabular data problems (e.g. classification, regression) are currently receiving increasingly more attention from researchers. However, despite the recent efforts, the non-DL algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution for these problems. One of the research directions aimed at improving the position of tabular DL involves… ▽ More

    Submitted 26 October, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: Code: https://github.com/yandex-research/tabular-dl-tabr

  3. arXiv:2207.03208  [pdf, other

    cs.LG

    Revisiting Pretraining Objectives for Tabular Deep Learning

    Authors: Ivan Rubachev, Artem Alekberov, Yury Gorishniy, Artem Babenko

    Abstract: Recent deep learning models for tabular data currently compete with the traditional ML models based on decision trees (GBDT). Unlike GBDT, deep models can additionally benefit from pretraining, which is a workhorse of DL for vision and NLP. For tabular problems, several pretraining methods were proposed, but it is not entirely clear if pretraining provides consistent noticeable improvements and wh… ▽ More

    Submitted 12 July, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: Code: https://github.com/puhsu/tabular-dl-pretrain-objectives

  4. arXiv:2203.05556  [pdf, other

    cs.LG

    On Embeddings for Numerical Features in Tabular Deep Learning

    Authors: Yury Gorishniy, Ivan Rubachev, Artem Babenko

    Abstract: Recently, Transformer-like deep architectures have shown strong performance on tabular data problems. Unlike traditional models, e.g., MLP, these architectures map scalar values of numerical features to high-dimensional embeddings before mixing them in the main backbone. In this work, we argue that embeddings for numerical features are an underexplored degree of freedom in tabular DL, which allows… ▽ More

    Submitted 26 October, 2023; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: NeurIPS 2022 camera-ready. Code: https://github.com/yandex-research/tabular-dl-num-embeddings (v3-v4: minor changes)

  5. arXiv:2106.11959  [pdf, other

    cs.LG

    Revisiting Deep Learning Models for Tabular Data

    Authors: Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, Artem Babenko

    Abstract: The existing literature on deep learning for tabular data proposes a wide range of novel architectures and reports competitive results on various datasets. However, the proposed models are usually not properly compared to each other and existing works often use different benchmarks and experiment protocols. As a result, it is unclear for both researchers and practitioners what models perform best.… ▽ More

    Submitted 26 October, 2023; v1 submitted 22 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 camera-ready. Code: https://github.com/yandex-research/tabular-dl-revisiting-models (v3-v5: minor changes)