Skip to main content

Showing 1–9 of 9 results for author: Zamar, R H

Searching in archive stat. Search in all archives.
.
  1. arXiv:2102.06851  [pdf, other

    stat.ME math.ST stat.CO

    Robust Model-Based Clustering

    Authors: Juan D. Gonzalez, Ricardo Maronna, Victor J. Yohai, Ruben H. Zamar

    Abstract: We propose a new class of robust and Fisher-consistent estimators for mixture models. These estimators can be used to construct robust model-based clustering procedures. We study in detail the case of multivariate normal mixtures and propose a procedure that uses S estimators of multivariate location and scatter. We develop an algorithm to compute the estimators and to build the clusters which is… ▽ More

    Submitted 8 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

  2. arXiv:2010.00950  [pdf, other

    stat.ML cs.LG stat.ME

    Regularized K-means through hard-thresholding

    Authors: Jakob Raymaekers, Ruben H. Zamar

    Abstract: We study a framework of regularized $K$-means methods based on direct penalization of the size of the cluster centers. Different penalization strategies are considered and compared through simulation and theoretical analysis. Based on the results, we propose HT $K$-means, which uses an $\ell_0$ penalty to induce sparsity in the variables. Different techniques for selecting the tuning parameter are… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

  3. Pooled variable scaling for cluster analysis

    Authors: Jakob Raymaekers, Ruben H. Zamar

    Abstract: We propose a new approach for scaling prior to cluster analysis based on the concept of pooled variance. Unlike available scaling procedures such as the standard deviation and the range, our proposed scale avoids dampening the beneficial effect of informative clustering variables. We confirm through an extensive simulation study and applications to well known real data examples that the proposed s… ▽ More

    Submitted 25 July, 2020; v1 submitted 22 December, 2019; originally announced December 2019.

    Comments: 29 pages, 32 figures

  4. arXiv:1906.08198  [pdf, other

    stat.CO

    Robust Clustering Using Tau-Scales

    Authors: Juan D. Gonzalez, Victor J. Yohai, Ruben H. Zamar

    Abstract: K means is a popular non-parametric clustering procedure introduced by Steinhaus (1956) and further developed by MacQueen (1967). It is known, however, that K means does not perform well in the presence of outliers. Cuesta-Albertos et al (1997) introduced a robust alternative, trimmed K means, which can be tuned to be robust or efficient, but cannot achieve these two properties simultaneously in a… ▽ More

    Submitted 19 June, 2019; originally announced June 2019.

    Comments: 40 pages, 9 figures

    MSC Class: 62G35; 62H30; 62H35

  5. arXiv:1808.06016  [pdf, other

    stat.ME

    A Stepwise Approach for High-Dimensional Gaussian Graphical Models

    Authors: Ginette Lafit, Francisco J. Nogales, Marcelo Ruiz, Ruben H. Zamar

    Abstract: We present a stepwise approach to estimate high dimensional Gaussian graphical models. We exploit the relation between the partial correlation coefficients and the distribution of the prediction errors, and parametrize the model in terms of the Pearson correlation coefficients between the prediction errors of the nodes' best linear predictors. We propose a novel stepwise algorithm for detecting pa… ▽ More

    Submitted 17 August, 2018; originally announced August 2018.

    Comments: 26 pages, 5 figures, 4 tables

  6. arXiv:1707.00727  [pdf, other

    stat.ML

    Regression Phalanxes

    Authors: Hongyang Zhang, William J. Welch, Ruben H. Zamar

    Abstract: Tomal et al. (2015) introduced the notion of "phalanxes" in the context of rare-class detection in two-class classification problems. A phalanx is a subset of features that work well for classification tasks. In this paper, we propose a different class of phalanxes for application in regression settings. We define a "Regression Phalanx" - a subset of features that work well together for prediction… ▽ More

    Submitted 3 July, 2017; originally announced July 2017.

  7. arXiv:1706.06971  [pdf, ps, other

    stat.ML

    Ensembles of phalanxes across assessment metrics for robust ranking of homologous proteins

    Authors: Jabed H Tomal, William J Welch, Ruben H Zamar

    Abstract: Two proteins are homologous if they have a common evolutionary origin, and the binary classification problem is to identify proteins in a candidate set that are homologous to a particular native protein. The feature (explanatory) variables available for classification are various measures of similarity of proteins. There are multiple classification problems of this type for different native protei… ▽ More

    Submitted 9 September, 2019; v1 submitted 21 June, 2017; originally announced June 2017.

    Comments: 29 pages, 4 figures, 8 tables and 2 algorithms

  8. arXiv:1409.0745  [pdf, other

    stat.ML cs.LG

    Multi-rank Sparse Hierarchical Clustering

    Authors: Hongyang Zhang, Ruben H. Zamar

    Abstract: There has been a surge in the number of large and flat data sets - data sets containing a large number of features and a relatively small number of observations - due to the growing ability to collect and store information in medical research and other fields. Hierarchical clustering is a widely used clustering tool. In hierarchical clustering, large and flat data sets may allow for a better cover… ▽ More

    Submitted 3 July, 2017; v1 submitted 2 September, 2014; originally announced September 2014.

  9. arXiv:1303.4805  [pdf, ps, other

    stat.ML stat.CO

    Ensembling classification models based on phalanxes of variables with applications in drug discovery

    Authors: Jabed H. Tomal, William J. Welch, Ruben H. Zamar

    Abstract: Statistical detection of a rare class of objects in a two-class classification problem can pose several challenges. Because the class of interest is rare in the training data, there is relatively little information in the known class response labels for model building. At the same time the available explanatory variables are often moderately high dimensional. In the four assays of our drug-discove… ▽ More

    Submitted 15 May, 2015; v1 submitted 19 March, 2013; originally announced March 2013.

    Comments: Published at http://dx.doi.org/10.1214/14-AOAS778 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS778

    Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 1, 69-93