Skip to main content

Showing 1–7 of 7 results for author: van Leeuwen, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2103.13686  [pdf, other

    cs.LG cs.AI stat.ML

    Robust subgroup discovery

    Authors: Hugo Manuel Proença, Peter Grünwald, Thomas Bäck, Matthijs van Leeuwen

    Abstract: We introduce the problem of robust subgroup discovery, i.e., finding a set of interpretable descriptions of subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made to mine either locally robust subgroups or to tackle the pattern explosion, but we are the first to address both challenges at the same tim… ▽ More

    Submitted 30 June, 2022; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: For associated code, see https://github.com/HMProenca/RuleList ; submitted to Data Mining and Knowledge Discovery Journal

    Journal ref: Data Mining and Knowledge Discovery 36 (2022)1885-1970

  2. arXiv:2101.05009  [pdf, other

    cs.IT stat.AP

    Estimating Conditional Mutual Information for Discrete-Continuous Mixtures using Multi-Dimensional Adaptive Histograms

    Authors: Alexander Marx, Lincen Yang, Matthijs van Leeuwen

    Abstract: Estimating conditional mutual information (CMI) is an essential yet challenging step in many machine learning and data mining tasks. Estimating CMI from data that contains both discrete and continuous variables, or even discrete-continuous mixture variables, is a particularly hard problem. In this paper, we show that CMI for such mixture variables, defined based on the Radon-Nikodym derivate, can… ▽ More

    Submitted 13 January, 2021; originally announced January 2021.

    Comments: Extended version including supplementary material for main paper which is (will be) published in: Proceedings of the SIAM International Conference on Data Mining (SDM'21)

  3. Discovering outstanding subgroup lists for numeric targets using MDL

    Authors: Hugo M. Proença, Peter Grünwald, Thomas Bäck, Matthijs van Leeuwen

    Abstract: The task of subgroup discovery (SD) is to find interpretable descriptions of subsets of a dataset that stand out with respect to a target attribute. To address the problem of mining large numbers of redundant subgroups, subgroup set discovery (SSD) has been proposed. State-of-the-art SSD methods have their limitations though, as they typically heavily rely on heuristics and/or user-chosen hyperpar… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

    Comments: Extended version of conference paper at ECML-PKDD

    Journal ref: ECML PKDD 2020, LNAI 12457, pp. 19-35, 2021

  4. arXiv:2006.01893  [pdf, other

    cs.LG stat.ML

    Unsupervised Discretization by Two-dimensional MDL-based Histogram

    Authors: Lincen Yang, Mitra Baratchi, Matthijs van Leeuwen

    Abstract: Unsupervised discretization is a crucial step in many knowledge discovery tasks. The state-of-the-art method for one-dimensional data infers locally adaptive histograms using the minimum description length (MDL) principle, but the multi-dimensional case is far less studied: current methods consider the dimensions one at a time (if not independently), which result in discretizations based on rectan… ▽ More

    Submitted 9 December, 2022; v1 submitted 2 June, 2020; originally announced June 2020.

    Comments: Accepted version at Machine Learning

  5. arXiv:1905.00328  [pdf, other

    cs.LG cs.AI stat.ML

    Interpretable multiclass classification by MDL-based rule lists

    Authors: Hugo M. Proença, Matthijs van Leeuwen

    Abstract: Interpretable classifiers have recently witnessed an increase in attention from the data mining community because they are inherently easier to understand and explain than their more complex counterparts. Examples of interpretable classification models include decision trees, rule sets, and rule lists. Learning such models often involves optimizing hyperparameters, which typically requires substan… ▽ More

    Submitted 31 October, 2019; v1 submitted 1 May, 2019; originally announced May 2019.

    Journal ref: Information Sciences 2019

  6. Learning what matters - Sampling interesting patterns

    Authors: Vladimir Dzyuba, Matthijs van Leeuwen

    Abstract: In the field of exploratory data mining, local structure in data can be described by patterns and discovered by mining algorithms. Although many solutions have been proposed to address the redundancy problems in pattern mining, most of them either provide succinct pattern sets or take the interests of the user into account-but not both. Consequently, the analyst has to invest substantial effort in… ▽ More

    Submitted 10 February, 2017; v1 submitted 7 February, 2017; originally announced February 2017.

    Comments: PAKDD 2017, extended version

    Journal ref: Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science, vol.10234, 2017, pp.534-546

  7. arXiv:1610.09263  [pdf, other

    cs.AI cs.DB stat.ML

    Flexible constrained sampling with guarantees for pattern mining

    Authors: Vladimir Dzyuba, Matthijs van Leeuwen, Luc De Raedt

    Abstract: Pattern sampling has been proposed as a potential solution to the infamous pattern explosion. Instead of enumerating all patterns that satisfy the constraints, individual patterns are sampled proportional to a given quality measure. Several sampling algorithms have been proposed, but each of them has its limitations when it comes to 1) flexibility in terms of quality measures and constraints that… ▽ More

    Submitted 1 March, 2017; v1 submitted 28 October, 2016; originally announced October 2016.

    Comments: Accepted for publication in Data Mining & Knowledge Discovery journal (ECML/PKDD 2017 journal track)