Skip to main content

Showing 1–17 of 17 results for author: Krijthe, J H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.01210  [pdf, other

    stat.ME cs.LG stat.ML

    When accurate prediction models yield harmful self-fulfilling prophecies

    Authors: Wouter A. C. van Amsterdam, Nan van Geloven, Jesse H. Krijthe, Rajesh Ranganath, Giovanni Ciná

    Abstract: Objective: Prediction models are popular in medical research and practice. By predicting an outcome of interest for specific patients, these models may help inform difficult treatment decisions, and are often hailed as the poster children for personalized, data-driven healthcare. Many prediction models are deployed for decision support based on their prediction accuracy in validation studies. We i… ▽ More

    Submitted 8 February, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

  2. arXiv:2205.13935  [pdf, other

    stat.ME cs.LG stat.ML

    Detecting hidden confounding in observational data using multiple environments

    Authors: Rickard K. A. Karlsson, Jesse H. Krijthe

    Abstract: A common assumption in causal inference from observational data is that there is no hidden confounding. Yet it is, in general, impossible to verify this assumption from a single dataset. Under the assumption of independent causal mechanisms underlying the data-generating process, we demonstrate a way to detect unobserved confounders when having multiple observational datasets coming from different… ▽ More

    Submitted 3 November, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2023 camera-ready version; 30 pages including references and appendix

  3. ReproducedPapers.org: Openly teaching and structuring machine learning reproducibility

    Authors: Burak Yildiz, Hayley Hung, Jesse H. Krijthe, Cynthia C. S. Liem, Marco Loog, Gosia Migut, Frans Oliehoek, Annibale Panichella, Przemyslaw Pawelczak, Stjepan Picek, Mathijs de Weerdt, Jan van Gemert

    Abstract: We present ReproducedPapers.org: an open online repository for teaching and structuring machine learning reproducibility. We evaluate doing a reproduction project among students and the added value of an online reproduction repository among AI researchers. We use anonymous self-assessment surveys and obtained 144 responses. Results suggest that students who do a reproduction project place more val… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

    Comments: Accepted to RRPR 2020: Third Workshop on Reproducible Research in Pattern Recognition

  4. A Brief Prehistory of Double Descent

    Authors: Marco Loog, Tom Viering, Alexander Mey, Jesse H. Krijthe, David M. J. Tax

    Abstract: In their thought-provoking paper [1], Belkin et al. illustrate and discuss the shape of risk curves in the context of modern high-complexity learners. Given a fixed training sample size $n$, such curves show the risk of a learner as a function of some (approximate) measure of its complexity $N$. With $N$ the number of features, these curves are also referred to as feature curves. A salient observa… ▽ More

    Submitted 7 April, 2020; originally announced April 2020.

  5. arXiv:1710.06514  [pdf, ps, other

    cs.LG stat.ML

    Robust importance-weighted cross-validation under sample selection bias

    Authors: Wouter M. Kouw, Jesse H. Krijthe, Marco Loog

    Abstract: Cross-validation under sample selection bias can, in principle, be done by importance-weighting the empirical risk. However, the importance-weighted risk estimator produces sub-optimal hyperparameter estimates in problem settings where large weights arise with high probability. We study its sampling variance as a function of the training data distribution and introduce a control variate to increas… ▽ More

    Submitted 27 August, 2019; v1 submitted 17 October, 2017; originally announced October 2017.

    Comments: 6 pages, 8 figures, Accepted to the IEEE International Workshop on Machine Learning for Signal Processing 2019

  6. arXiv:1707.04025  [pdf, other

    cs.LG cs.CV stat.ML

    On Measuring and Quantifying Performance: Error Rates, Surrogate Loss, and an Example in SSL

    Authors: Marco Loog, Jesse H. Krijthe, Are C. Jensen

    Abstract: In various approaches to learning, notably in domain adaptation, active learning, learning under covariate shift, semi-supervised learning, learning with concept drift, and the like, one often wants to compare a baseline classifier to one or more advanced (or at least different) strategies. In this chapter, we basically argue that if such classifiers, in their respective training phases, optimize… ▽ More

    Submitted 13 July, 2017; originally announced July 2017.

    Journal ref: In Handbook of Pattern Recognition and Computer Vision (pp. 53-68) (2016)

  7. arXiv:1706.02645  [pdf, other

    cs.LG stat.ML

    Nuclear Discrepancy for Active Learning

    Authors: Tom J. Viering, Jesse H. Krijthe, Marco Loog

    Abstract: Active learning algorithms propose which unlabeled objects should be queried for their labels to improve a predictive model the most. We study active learners that minimize generalization bounds and uncover relationships between these bounds that lead to an improved approach to active learning. In particular we show the relation between the bound of the state-of-the-art Maximum Mean Discrepancy (M… ▽ More

    Submitted 8 June, 2017; originally announced June 2017.

    Comments: 32 pages, 5 figures, 4 tables

  8. arXiv:1612.08875  [pdf, other

    stat.ML cs.LG

    The Pessimistic Limits and Possibilities of Margin-based Losses in Semi-supervised Learning

    Authors: Jesse H. Krijthe, Marco Loog

    Abstract: Consider a classification problem where we have both labeled and unlabeled data available. We show that for linear classifiers defined by convex margin-based surrogate losses that are decreasing, it is impossible to construct any semi-supervised approach that is able to guarantee an improvement over the supervised classifier measured by this surrogate loss on the labeled and unlabeled data. For co… ▽ More

    Submitted 8 January, 2019; v1 submitted 28 December, 2016; originally announced December 2016.

    Comments: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal, Canada

  9. arXiv:1612.08650  [pdf, other

    stat.ML cs.LG

    Reproducible Pattern Recognition Research: The Case of Optimistic SSL

    Authors: Jesse H. Krijthe, Marco Loog

    Abstract: In this paper, we discuss the approaches we took and trade-offs involved in making a paper on a conceptual topic in pattern recognition research fully reproducible. We discuss our definition of reproducibility, the tools used, how the analysis was set up, show some examples of alternative analyses the code enables and discuss our views on reproducibility.

    Submitted 27 December, 2016; originally announced December 2016.

    Comments: Presented at RRPR 2016: 1st Workshop on Reproducible Research in Pattern Recognition

  10. arXiv:1612.07993  [pdf, other

    stat.ML cs.LG

    RSSL: Semi-supervised Learning in R

    Authors: Jesse H. Krijthe

    Abstract: In this paper, we introduce a package for semi-supervised learning research in the R programming language called RSSL. We cover the purpose of the package, the methods it includes and comment on their use and implementation. We then show, using several code examples, how the package can be used to replicate well-known results from the semi-supervised learning literature.

    Submitted 23 December, 2016; originally announced December 2016.

    Comments: Presented at RRPR 2016: 1st Workshop on Reproducible Research in Pattern Recognition

  11. arXiv:1610.05160  [pdf, other

    stat.ML cs.LG

    The Peaking Phenomenon in Semi-supervised Learning

    Authors: Jesse H. Krijthe, Marco Loog

    Abstract: For the supervised least squares classifier, when the number of training objects is smaller than the dimensionality of the data, adding more data to the training set may first increase the error rate before decreasing it. This, possibly counterintuitive, phenomenon is known as peaking. In this work, we observe that a similar but more pronounced version of this phenomenon also occurs in the semi-su… ▽ More

    Submitted 17 October, 2016; originally announced October 2016.

    Comments: 11 pages, 5 figures. S+SSPR 2016, Mérida, Mexico

  12. arXiv:1610.03713  [pdf, other

    stat.ML cs.LG

    Optimistic Semi-supervised Least Squares Classification

    Authors: Jesse H. Krijthe, Marco Loog

    Abstract: The goal of semi-supervised learning is to improve supervised classifiers by using additional unlabeled training examples. In this work we study a simple self-learning approach to semi-supervised learning applied to the least squares classifier. We show that a soft-label and a hard-label variant of self-learning can be derived by applying block coordinate descent to two related but slightly differ… ▽ More

    Submitted 12 October, 2016; originally announced October 2016.

    Comments: 6 pages, 6 figures. International Conference on Pattern Recognition (ICPR) 2016, Cancun, Mexico

  13. arXiv:1602.07865  [pdf, other

    stat.ML cs.LG

    Projected Estimators for Robust Semi-supervised Classification

    Authors: Jesse H. Krijthe, Marco Loog

    Abstract: For semi-supervised techniques to be applied safely in practice we at least want methods to outperform their supervised counterparts. We study this question for classification using the well-known quadratic surrogate loss function. Using a projection of the supervised estimate onto a set of constraints imposed by the unlabeled data, we find we can safely improve over the supervised solution in ter… ▽ More

    Submitted 25 February, 2016; originally announced February 2016.

    Comments: 13 pages, 2 figures, 1 table

  14. Robust Semi-supervised Least Squares Classification by Implicit Constraints

    Authors: Jesse H. Krijthe, Marco Loog

    Abstract: We introduce the implicitly constrained least squares (ICLS) classifier, a novel semi-supervised version of the least squares classifier. This classifier minimizes the squared loss on the labeled data among the set of parameters implied by all possible labelings of the unlabeled data. Unlike other discriminative semi-supervised methods, this approach does not introduce explicit additional assumpti… ▽ More

    Submitted 27 January, 2017; v1 submitted 27 December, 2015; originally announced December 2015.

    Comments: Appeared as Pattern Recognition Volume 63, March 2017, Pages 115-126. This version of the manuscript fixes some typos in the equations on page 9 that are incorrect in the published version

    Journal ref: Pattern Recognition Volume 63, March 2017, Pages 115-126

  15. arXiv:1512.04829  [pdf, other

    stat.ML cs.LG

    Feature-Level Domain Adaptation

    Authors: Wouter M. Kouw, Jesse H. Krijthe, Marco Loog, Laurens J. P. van der Maaten

    Abstract: Domain adaptation is the supervised learning setting in which the training and test data are sampled from different distributions: training data is sampled from a source domain, whilst test data is sampled from a target domain. This paper proposes and studies an approach, called feature-level domain adaptation (FLDA), that models the dependence between the two domains by means of a feature-level t… ▽ More

    Submitted 7 June, 2016; v1 submitted 15 December, 2015; originally announced December 2015.

    Comments: 32 pages, 13 figures, 9 tables

    Journal ref: JMLR 17:171 (2016) 1-32

  16. arXiv:1507.06802  [pdf, other

    stat.ML cs.LG

    Implicitly Constrained Semi-Supervised Least Squares Classification

    Authors: Jesse H. Krijthe, Marco Loog

    Abstract: We introduce a novel semi-supervised version of the least squares classifier. This implicitly constrained least squares (ICLS) classifier minimizes the squared loss on the labeled data among the set of parameters implied by all possible labelings of the unlabeled data. Unlike other discriminative semi-supervised methods, our approach does not introduce explicit additional assumptions into the obje… ▽ More

    Submitted 24 July, 2015; originally announced July 2015.

    Comments: 12 pages, 2 figures, 1 table. The Fourteenth International Symposium on Intelligent Data Analysis (2015), Saint-Etienne, France

  17. arXiv:1411.4521  [pdf, other

    stat.ML cs.LG

    Implicitly Constrained Semi-Supervised Linear Discriminant Analysis

    Authors: Jesse H. Krijthe, Marco Loog

    Abstract: Semi-supervised learning is an important and active topic of research in pattern recognition. For classification using linear discriminant analysis specifically, several semi-supervised variants have been proposed. Using any one of these methods is not guaranteed to outperform the supervised classifier which does not take the additional unlabeled data into account. In this work we compare traditio… ▽ More

    Submitted 17 November, 2014; originally announced November 2014.

    Comments: 6 pages, 3 figures and 3 tables. International Conference on Pattern Recognition (ICPR) 2014, Stockholm, Sweden