Skip to main content

Showing 1–13 of 13 results for author: Kaban, A

.
  1. arXiv:2309.05751  [pdf, other

    cs.LG stat.ML

    Compressive Mahalanobis Metric Learning Adapts to Intrinsic Dimension

    Authors: Efstratios Palias, Ata Kabán

    Abstract: Metric learning aims at finding a suitable distance metric over the input space, to improve the performance of distance-based learning algorithms. In high-dimensional settings, it can also serve as dimensionality reduction by imposing a low-rank restriction to the learnt metric. In this paper, we consider the problem of learning a Mahalanobis metric, and instead of training a low-rank metric on hi… ▽ More

    Submitted 13 April, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: 8 pages, 2 figures

  2. arXiv:2203.07989  [pdf, ps, other

    cs.LG stat.ML

    Approximability and Generalisation

    Authors: Andrew J. Turner, Ata Kabán

    Abstract: Approximate learning machines have become popular in the era of small devices, including quantised, factorised, hashed, or otherwise compressed predictors, and the quest to explain and guarantee good generalisation abilities for such methods has just begun. In this paper we study the role of approximability in learning, both in the full precision and the approximated settings of the predictor that… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: 25 pages

  3. arXiv:2106.01092  [pdf, ps, other

    cs.LG math.ST

    Statistical optimality conditions for compressive ensembles

    Authors: Henry W. J. Reeve, Ata Kaban

    Abstract: We present a framework for the theoretical analysis of ensembles of low-complexity empirical risk minimisers trained on independent random compressions of high-dimensional data. First we introduce a general distribution-dependent upper-bound on the excess risk, framed in terms of a natural notion of compressibility. This bound is independent of the dimension of the original data representation, an… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    MSC Class: 62-08

  4. arXiv:2002.09769  [pdf, ps, other

    stat.ML cs.LG

    Optimistic bounds for multi-output prediction

    Authors: Henry WJ Reeve, Ata Kaban

    Abstract: We investigate the challenge of multi-output learning, where the goal is to learn a vector-valued function based on a supervised data set. This includes a range of important problems in Machine Learning including multi-target regression, multi-class classification and multi-label classification. We begin our analysis by introducing the self-bounding Lipschitz condition for multi-output loss functi… ▽ More

    Submitted 22 February, 2020; originally announced February 2020.

  5. arXiv:1906.04542  [pdf, other

    cs.LG stat.ML

    Fast Rates for a kNN Classifier Robust to Unknown Asymmetric Label Noise

    Authors: Henry W. J. Reeve, Ata Kaban

    Abstract: We consider classification in the presence of class-dependent asymmetric label noise with unknown noise probabilities. In this setting, identifiability conditions are known, but additional assumptions were shown to be required for finite sample rates, and so far only the parametric rate has been obtained. Assuming these identifiability conditions, together with a measure-smoothness condition on th… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

    Comments: ICML 2019

  6. arXiv:1902.05627  [pdf, other

    stat.ML cs.LG

    Classification with unknown class-conditional label noise on non-compact feature spaces

    Authors: Henry W J Reeve, Ata Kaban

    Abstract: We investigate the problem of classification in the presence of unknown class-conditional label noise in which the labels observed by the learner have been corrupted with some unknown class dependent probability. In order to obtain finite sample rates, previous approaches to classification with unknown class-conditional label noise have required that the regression function is close to its extrema… ▽ More

    Submitted 9 June, 2019; v1 submitted 14 February, 2019; originally announced February 2019.

  7. arXiv:1709.09782  [pdf, ps, other

    math.ST

    Structure-aware error bounds for linear classification with the zero-one loss

    Authors: Ata Kaban, Robert J. Durrant

    Abstract: We prove risk bounds for binary classification in high-dimensional settings when the sample size is allowed to be smaller than the dimensionality of the training set observations. In particular, we prove upper bounds for both 'compressive learning' by empirical risk minimization (ERM) (that is when the ERM classifier is learned from data that have been projected from high-dimensions onto a randoml… ▽ More

    Submitted 27 September, 2017; originally announced September 2017.

    MSC Class: 62G05; 68Q32; 62H05; 68W25 ACM Class: I.2.6

  8. arXiv:1309.6818  [pdf

    cs.LG stat.ML

    Boosting in the presence of label noise

    Authors: Jakramate Bootkrajang, Ata Kaban

    Abstract: Boosting is known to be sensitive to label noise. We studied two approaches to improve AdaBoost's robustness against labelling errors. One is to employ a label-noise robust classifier as a base learner, while the other is to modify the AdaBoost algorithm to be more robust. Empirical evaluation shows that a committee of robust classifiers, although converges faster than non label-noise aware AdaBoo… ▽ More

    Submitted 26 September, 2013; originally announced September 2013.

    Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

    Report number: UAI-P-2013-PG-82-91

  9. arXiv:0709.0928  [pdf, ps, other

    astro-ph

    Robust mixtures in the presence of measurement errors

    Authors: Jianyong Sun, Ata Kaban, Somak Raychaudhury

    Abstract: We develop a mixture-based approach to robust density modeling and outlier detection for experimental multivariate data that includes measurement error information. Our model is designed to infer atypical measurements that are not due to errors, aiming to retrieve potentially interesting peculiar objects. Since exact inference is not possible in this model, we develop a tree-structured variation… ▽ More

    Submitted 6 September, 2007; originally announced September 2007.

    Comments: (Refereed) Proceedings of the 24-th Annual International Conference on Machine Learning 2007 (ICML07), (Ed.) Z. Ghahramani. June 20-24, 2007, Oregon State University, Corvallis, OR, USA, pp. 847-854; Omnipress. ISBN 978-1-59593-793-3; 8 pages, 6 figures

  10. On class visualisation for high dimensional data: Exploring scientific datasets

    Authors: Ata Kaban, Jianyong Sun, Somak Raychaudhury, Louisa Nolan

    Abstract: Parametric Embedding (PE) has recently been proposed as a general-purpose algorithm for class visualisation. It takes class posteriors produced by a mixture-based clustering algorithm and projects them in 2D for visualisation. However, although this fully modularised combination of objectives (clustering and projection) is attractive for its conceptual simplicity, in the case of high dimensional… ▽ More

    Submitted 4 September, 2006; originally announced September 2006.

    Comments: to appear in Lecture notes in Artificial Intelligence vol. 4265, the (refereed) proceedings of the Ninth International conference on Discovery Science (DS-2006), October 2006, Barcelona, Spain. 12 pages, 8 figures

  11. Young stellar populations in early-type galaxies in the Sloan Digital Sky Survey

    Authors: Louisa A. Nolan, Somak Raychaudhury, Ata Kaban

    Abstract: We use a purely data-driven rectified factor analysis to identify early-type galaxies with recent star formation in DR4 of the SDSS Spectroscopic Catalogue. We compare the spectra and environment of these galaxies with `normal' early-types, and a sample of independently selected E+A galaxies. We calculate the projected local galaxy surface density (Sigma_5 and Sigma_10) for each galaxy in our sa… ▽ More

    Submitted 13 November, 2006; v1 submitted 29 August, 2006; originally announced August 2006.

    Comments: 7 pages, 5 figures, submitted to MNRAS, minor revision

  12. A data-driven Bayesian approach for finding young stellar populations in early-type galaxies from their UV-optical spectra

    Authors: L. A. Nolan, M. O. Harva, A Kaban, S. Raychaudhury

    Abstract: We present the results of a novel application of Bayesian modelling techniques, which, although purely data driven, have a physically interpretable result, and will be useful as an efficient data mining tool. We base our studies on the UV-to-optical spectra (observed and synthetic) of early-type galaxies. A probabilistic latent variable architecture is formulated, and a rigorous Bayesian methodo… ▽ More

    Submitted 16 November, 2005; originally announced November 2005.

    Comments: 19 pages, 15 figures, accepted for publication MNRAS

    Journal ref: Mon.Not.Roy.Astron.Soc.366:321-338,2006

  13. arXiv:astro-ph/0505059  [pdf, ps, other

    astro-ph

    Finding Young Stellar Populations in Elliptical Galaxies from Independent Components of Optical Spectra

    Authors: Ata Kaban, Louisa A. Nolan, Somak Raychaudhury

    Abstract: Elliptical galaxies are believed to consist of a single population of old stars formed together at an early epoch in the Universe, yet recent analyses of galaxy spectra seem to indicate the presence of significant younger populations of stars in them. The detailed physical modelling of such populations is computationally expensive, inhibiting the detailed analysis of the several million galaxy s… ▽ More

    Submitted 3 May, 2005; originally announced May 2005.

    Comments: 12 Pages, 7 figures; accepted in SIAM 2005 International Conference on Data Mining, Newport Beach, CA, April 2005