Skip to main content

Showing 1–7 of 7 results for author: Wainer, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2208.04935  [pdf, other

    cs.LG stat.ME stat.ML

    A Bayesian Bradley-Terry model to compare multiple ML algorithms on multiple data sets

    Authors: Jacques Wainer

    Abstract: This paper proposes a Bayesian model to compare multiple algorithms on multiple data sets, on any metric. The model is based on the Bradley-Terry model, that counts the number of times one algorithm performs better than another on different data sets. Because of its Bayesian foundations, the Bayesian Bradley Terry model (BBT) has different characteristics than frequentist approaches to comparing m… ▽ More

    Submitted 15 July, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

    Comments: Version 2 - corrected spelling and grammar errors; included use of BBT for comparing many algorithms

    MSC Class: 68U01 ACM Class: I.5.2; I.2.6

  2. arXiv:2008.11655  [pdf, other

    cs.LG stat.ML

    How to tune the RBF SVM hyperparameters?: An empirical evaluation of 18 search algorithms

    Authors: Jacques Wainer, Pablo Fonseca

    Abstract: SVM with an RBF kernel is usually one of the best classification algorithms for most data sets, but it is important to tune the two hyperparameters $C$ and $γ$ to the data itself. In general, the selection of the hyperparameters is a non-convex optimization problem and thus many algorithms have been proposed to solve it, among them: grid search, random search, Bayesian optimization, simulated anne… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

  3. arXiv:1810.07168  [pdf, other

    cs.LG stat.ML

    An empirical evaluation of imbalanced data strategies from a practitioner's point of view

    Authors: Jacques Wainer

    Abstract: This paper evaluates six strategies for mitigating imbalanced data: oversampling, undersampling, ensemble methods, specialized algorithms, class weight adjustments, and a no-mitigation approach referred to as the baseline. These strategies were tested on 58 real-life binary imbalanced datasets with imbalance rates ranging from 3 to 120. We conducted a comparative analysis of 10 under-sampling algo… ▽ More

    Submitted 10 November, 2023; v1 submitted 16 October, 2018; originally announced October 2018.

  4. arXiv:1809.09446  [pdf, other

    cs.LG stat.ML

    Nested cross-validation when selecting classifiers is overzealous for most practical applications

    Authors: Jacques Wainer, Gavin Cawley

    Abstract: When selecting a classification algorithm to be applied to a particular problem, one has to simultaneously select the best algorithm for that dataset \emph{and} the best set of hyperparameters for the chosen model. The usual approach is to apply a nested cross-validation procedure; hyperparameter selection is performed in the inner cross-validation, while the outer cross-validation computes an unb… ▽ More

    Submitted 25 September, 2018; originally announced September 2018.

  5. Open-Set Support Vector Machines

    Authors: Pedro Ribeiro Mendes Júnior, Terrance E. Boult, Jacques Wainer, Anderson Rocha

    Abstract: Often, when dealing with real-world recognition problems, we do not need, and often cannot have, knowledge of the entire set of possible classes that might appear during operational testing. In such cases, we need to think of robust classification methods able to deal with the "unknown" and properly reject samples belonging to classes never seen during training. Notwithstanding, existing classifie… ▽ More

    Submitted 21 February, 2022; v1 submitted 12 June, 2016; originally announced June 2016.

    Comments: Version accepted for publication in IEEE Transactions on Systems, Man, and Cybernetics: Systems

  6. arXiv:1606.00930  [pdf, other

    cs.LG cs.CV

    Comparison of 14 different families of classification algorithms on 115 binary datasets

    Authors: Jacques Wainer

    Abstract: We tested 14 very different classification algorithms (random forest, gradient boosting machines, SVM - linear, polynomial, and RBF - 1-hidden-layer neural nets, extreme learning machines, k-nearest neighbors and a bagging of knn, naive Bayes, learning vector quantization, elastic net logistic regression, sparse linear discriminant analysis, and a boosting of linear classifiers) on 115 real life b… ▽ More

    Submitted 2 June, 2016; originally announced June 2016.

  7. arXiv:1206.6486  [pdf

    cs.LG stat.ML

    Flexible Modeling of Latent Task Structures in Multitask Learning

    Authors: Alexandre Passos, Piyush Rai, Jacques Wainer, Hal Daume III

    Abstract: Multitask learning algorithms are typically designed assuming some fixed, a priori known latent structure shared by all the tasks. However, it is usually unclear what type of latent task structure is the most appropriate for a given multitask learning problem. Ideally, the "right" latent task structure should be learned in a data-driven manner. We present a flexible, nonparametric Bayesian model t… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)