Skip to main content

Showing 1–6 of 6 results for author: Bac, J

Searching in archive cs. Search in all archives.
.
  1. Domain Adaptation Principal Component Analysis: base linear method for learning with out-of-distribution data

    Authors: Evgeny M Mirkes, Jonathan Bac, Aziz Fouché, Sergey V. Stasenko, Andrei Zinovyev, Alexander N. Gorban

    Abstract: Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets red into a common space in which the source dataset is informative for training while the divergence between s… ▽ More

    Submitted 15 December, 2022; v1 submitted 28 August, 2022; originally announced August 2022.

    Journal ref: Entropy, 25(1), 33, 2023

  2. arXiv:2203.16687  [pdf, other

    cs.LG

    Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation

    Authors: Qinghua Zhou, Alexander N. Gorban, Evgeny M. Mirkes, Jonathan Bac, Andrei Zinovyev, Ivan Y. Tyukin

    Abstract: Finding best architectures of learning machines, such as deep neural networks, is a well-known technical and theoretical challenge. Recent work by Mellor et al (2021) showed that there may exist correlations between the accuracies of trained networks and the values of some easily computable measures defined on randomly initialised networks which may enable to search tens of thousands of neural arc… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    MSC Class: 68T05; 68Q32

  3. arXiv:2109.02596  [pdf, other

    cs.LG stat.ML

    Scikit-dimension: a Python package for intrinsic dimension estimation

    Authors: Jonathan Bac, Evgeny M. Mirkes, Alexander N. Gorban, Ivan Tyukin, Andrei Zinovyev

    Abstract: Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces \texttt{scikit-dimension}, an open-source P… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: 12 pages, 4 figures, 1 table

    Journal ref: Entropy, 2021, 23(10), 1368

  4. Trajectories, bifurcations and pseudotime in large clinical datasets: applications to myocardial infarction and diabetes data

    Authors: Sergey E. Golovenkin, Jonathan Bac, Alexander Chervov, Evgeny M. Mirkes, Yuliya V. Orlova, Emmanuel Barillot, Alexander N. Gorban, Andrei Zinovyev

    Abstract: Large observational clinical datasets become increasingly available for mining associations between various disease traits and administered therapy. These datasets can be considered as representations of the landscape of all possible disease conditions, in which a concrete pathology develops through a number of stereotypical routes, characterized by `points of no return' and `final states' (such a… ▽ More

    Submitted 5 October, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

    ACM Class: I.2.6; J.3; J.2

    Journal ref: GigaScience, Volume 9, Issue 11, 2020, giaa128,

  5. arXiv:2001.11739  [pdf, other

    cs.LG stat.ML

    Local intrinsic dimensionality estimators based on concentration of measure

    Authors: Jonathan Bac, Andrei Zinovyev

    Abstract: Intrinsic dimensionality (ID) is one of the most fundamental characteristics of multi-dimensional data point clouds. Knowing ID is crucial to choose the appropriate machine learning approach as well as to understand its behavior and validate it. ID can be computed globally for the whole data point distribution, or computed locally in different regions of the data space. In this paper, we introduce… ▽ More

    Submitted 19 April, 2020; v1 submitted 31 January, 2020; originally announced January 2020.

    Comments: to be published in the International Joint Conference On Neural Networks (IJCNN) held as part of the IEEE World Congress On Computational Intelligence (WCCI), July 2020

  6. arXiv:1901.06328  [pdf, other

    cs.LG q-bio.QM stat.ML

    Estimating the effective dimension of large biological datasets using Fisher separability analysis

    Authors: Luca Albergante, Jonathan Bac, Andrei Zinovyev

    Abstract: Modern large-scale datasets are frequently said to be high-dimensional. However, their data point clouds frequently possess structures, significantly decreasing their intrinsic dimensionality (ID) due to the presence of clusters, points being located close to low-dimensional varieties or fine-grained lum**. We test a recently introduced dimensionality estimator, based on analysing the separabili… ▽ More

    Submitted 18 January, 2019; originally announced January 2019.

    Comments: 8 pages, submitted to IJCNN-2019