Search | arXiv e-print repository

Stealth edits for provably fixing or attacking large language models

Authors: Oliver J. Sutton, Qinghua Zhou, Wei Wang, Desmond J. Higham, Alexander N. Gorban, Alexander Bastounis, Ivan Y. Tyukin

Abstract: We reveal new methods and the theoretical foundations of techniques for editing large language models. We also show how the new theory can be used to assess the editability of models and to expose their susceptibility to previously unknown malicious attacks. Our theoretical approach shows that a single metric (a specific measure of the intrinsic dimensionality of the model's features) is fundament… ▽ More We reveal new methods and the theoretical foundations of techniques for editing large language models. We also show how the new theory can be used to assess the editability of models and to expose their susceptibility to previously unknown malicious attacks. Our theoretical approach shows that a single metric (a specific measure of the intrinsic dimensionality of the model's features) is fundamental to predicting the success of popular editing approaches, and reveals new bridges between disparate families of editing methods. We collectively refer to these approaches as stealth editing methods, because they aim to directly and inexpensively update a model's weights to correct the model's responses to known hallucinating prompts without otherwise affecting the model's behaviour, without requiring retraining. By carefully applying the insight gleaned from our theoretical investigation, we are able to introduce a new network block -- named a jet-pack block -- which is optimised for highly selective model editing, uses only standard network operations, and can be inserted into existing networks. The intrinsic dimensionality metric also determines the vulnerability of a language model to a stealth attack: a small change to a model's weights which changes its response to a single attacker-chosen prompt. Stealth attacks do not require access to or knowledge of the model's training data, therefore representing a potent yet previously unrecognised threat to redistributed foundation models. They are computationally simple enough to be implemented in malware in many cases. Extensive experimental results illustrate and support the method and its theoretical underpinnings. Demos and source code for editing language models are available at https://github.com/qinghua-zhou/stealth-edits. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 24 pages, 9 figures. Open source implementation: https://github.com/qinghua-zhou/stealth-edits

MSC Class: 68T07; 68T50; 68W40 ACM Class: I.2.7; F.2.0

arXiv:2402.06563 [pdf]

doi 10.1109/BigData59044.2023.10386194

What is Hiding in Medicine's Dark Matter? Learning with Missing Data in Medical Practices

Authors: Neslihan Suzen, Evgeny M. Mirkes, Damian Roland, Jeremy Levesley, Alexander N. Gorban, Tim J. Coats

Abstract: Electronic patient records (EPRs) produce a wealth of data but contain significant missing information. Understanding and handling this missing data is an important part of clinical data analysis and if left unaddressed could result in bias in analysis and distortion in critical conclusions. Missing data may be linked to health care professional practice patterns and imputation of missing data can… ▽ More Electronic patient records (EPRs) produce a wealth of data but contain significant missing information. Understanding and handling this missing data is an important part of clinical data analysis and if left unaddressed could result in bias in analysis and distortion in critical conclusions. Missing data may be linked to health care professional practice patterns and imputation of missing data can increase the validity of clinical decisions. This study focuses on statistical approaches for understanding and interpreting the missing data and machine learning based clinical data imputation using a single centre's paediatric emergency data and the data from UK's largest clinical audit for traumatic injury database (TARN). In the study of 56,961 data points related to initial vital signs and observations taken on children presenting to an Emergency Department, we have shown that missing data are likely to be non-random and how these are linked to health care professional practice patterns. We have then examined 79 TARN fields with missing values for 5,791 trauma cases. Singular Value Decomposition (SVD) and k-Nearest Neighbour (kNN) based missing data imputation methods are used and imputation results against the original dataset are compared and statistically tested. We have concluded that the 1NN imputer is the best imputation which indicates a usual pattern of clinical decision making: find the most similar patients and take their attributes as imputation. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 8 pages

Journal ref: 2023 IEEE International Conference on Big Data (BigData), 4979-4986

arXiv:2402.00899 [pdf, other]

Weakly Supervised Learners for Correction of AI Errors with Provable Performance Guarantees

Authors: Ivan Y. Tyukin, Tatiana Tyukina, Daniel van Helden, Zedong Zheng, Evgeny M. Mirkes, Oliver J. Sutton, Qinghua Zhou, Alexander N. Gorban, Penelope Allison

Abstract: We present a new methodology for handling AI errors by introducing weakly supervised AI error correctors with a priori performance guarantees. These AI correctors are auxiliary maps whose role is to moderate the decisions of some previously constructed underlying classifier by either approving or rejecting its decisions. The rejection of a decision can be used as a signal to suggest abstaining fro… ▽ More We present a new methodology for handling AI errors by introducing weakly supervised AI error correctors with a priori performance guarantees. These AI correctors are auxiliary maps whose role is to moderate the decisions of some previously constructed underlying classifier by either approving or rejecting its decisions. The rejection of a decision can be used as a signal to suggest abstaining from making a decision. A key technical focus of the work is in providing performance guarantees for these new AI correctors through bounds on the probabilities of incorrect decisions. These bounds are distribution agnostic and do not rely on assumptions on the data dimension. Our empirical example illustrates how the framework can be applied to improve the performance of an image classifier in a challenging real-world task where training data are scarce. △ Less

Submitted 13 February, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

MSC Class: 68T05; 68T37

arXiv:2311.13917 [pdf]

doi 10.1016/j.cnsns.2024.107906

Exploring the impact of social stress on the adaptive dynamics of COVID-19: Ty** the behavior of naïve populations faced with epidemics

Authors: Innokentiy Kastalskiy, Andrei Zinovyev, Evgeny Mirkes, Victor Kazantsev, Alexander N. Gorban

Abstract: In the context of natural disasters, human responses inevitably intertwine with natural factors. The COVID-19 pandemic, as a significant stress factor, has brought to light profound variations among different countries in terms of their adaptive dynamics in addressing the spread of infection outbreaks across different regions. This emphasizes the crucial role of cultural characteristics in natural… ▽ More In the context of natural disasters, human responses inevitably intertwine with natural factors. The COVID-19 pandemic, as a significant stress factor, has brought to light profound variations among different countries in terms of their adaptive dynamics in addressing the spread of infection outbreaks across different regions. This emphasizes the crucial role of cultural characteristics in natural disaster analysis. The theoretical understanding of large-scale epidemics primarily relies on mean-field kinetic models. However, conventional SIR-like models failed to fully explain the observed phenomena at the onset of the COVID-19 outbreak. These phenomena included the unexpected cessation of exponential growth, the reaching of plateaus, and the occurrence of multi-wave dynamics. In situations where an outbreak of a highly virulent and unfamiliar infection arises, it becomes crucial to respond swiftly at a non-medical level to mitigate the negative socio-economic impact. Here we present a theoretical examination of the first wave of the epidemic based on a simple SIRSS model (SIR with Social Stress). We conduct an analysis of the socio-cultural features of naïve population behaviors across various countries worldwide. The unique characteristics of each country/territory are encapsulated in only a few constants within our model, derived from the fitted COVID-19 statistics. These constants also reflect the societal response dynamics to the external stress factor, underscoring the importance of studying the mutual behavior of humanity and natural factors during global social disasters. Based on these distinctive characteristics of specific regions, local authorities can optimize their strategies to effectively combat epidemics until vaccines are developed. △ Less

Submitted 12 February, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

Comments: 29 pages, 16 figures, 1 table, 2 appendices

Journal ref: Communications in Nonlinear Science and Numerical Simulation, Volume 132, May 2024, 107906

arXiv:2311.07579 [pdf, other]

doi 10.1007/978-3-031-44207-0_43

Relative intrinsic dimensionality is intrinsic to learning

Authors: Oliver J. Sutton, Qinghua Zhou, Alexander N. Gorban, Ivan Y. Tyukin

Abstract: High dimensional data can have a surprising property: pairs of data points may be easily separated from each other, or even from arbitrary subsets, with high probability using just simple linear classifiers. However, this is more of a rule of thumb than a reliable property as high dimensionality alone is neither necessary nor sufficient for successful learning. Here, we introduce a new notion of t… ▽ More High dimensional data can have a surprising property: pairs of data points may be easily separated from each other, or even from arbitrary subsets, with high probability using just simple linear classifiers. However, this is more of a rule of thumb than a reliable property as high dimensionality alone is neither necessary nor sufficient for successful learning. Here, we introduce a new notion of the intrinsic dimension of a data distribution, which precisely captures the separability properties of the data. For this intrinsic dimension, the rule of thumb above becomes a law: high intrinsic dimension guarantees highly separable data. We extend this notion to that of the relative intrinsic dimension of two data distributions, which we show provides both upper and lower bounds on the probability of successfully learning and generalising in a binary classification problem △ Less

Submitted 10 October, 2023; originally announced November 2023.

Comments: 12 pages, 5 figures

MSC Class: 68T09; 68T10

Journal ref: Artificial Neural Networks and Machine Learning ICANN 2023. Lecture Notes in Computer Science, vol 14254, pp 516-529. Springer, Cham

arXiv:2309.07072 [pdf, ps, other]

The Boundaries of Verifiable Accuracy, Robustness, and Generalisation in Deep Learning

Authors: Alexander Bastounis, Alexander N. Gorban, Anders C. Hansen, Desmond J. Higham, Danil Prokhorov, Oliver Sutton, Ivan Y. Tyukin, Qinghua Zhou

Abstract: In this work, we assess the theoretical limitations of determining guaranteed stability and accuracy of neural networks in classification tasks. We consider classical distribution-agnostic framework and algorithms minimising empirical risks and potentially subjected to some weights regularisation. We show that there is a large family of tasks for which computing and verifying ideal stable and accu… ▽ More In this work, we assess the theoretical limitations of determining guaranteed stability and accuracy of neural networks in classification tasks. We consider classical distribution-agnostic framework and algorithms minimising empirical risks and potentially subjected to some weights regularisation. We show that there is a large family of tasks for which computing and verifying ideal stable and accurate neural networks in the above settings is extremely challenging, if at all possible, even when such ideal solutions exist within the given class of neural architectures. △ Less

Submitted 13 September, 2023; originally announced September 2023.

MSC Class: 68T07; 68T05

arXiv:2309.03665 [pdf, other]

How adversarial attacks can disrupt seemingly stable accurate classifiers

Authors: Oliver J. Sutton, Qinghua Zhou, Ivan Y. Tyukin, Alexander N. Gorban, Alexander Bastounis, Desmond J. Higham

Abstract: Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show th… ▽ More Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability -- notably the simultaneous susceptibility of the (otherwise accurate) model to easily constructed adversarial attacks, and robustness to random perturbations of the input data. We confirm that the same phenomena are directly observed in practical neural networks trained on standard image classification problems, where even large additive random noise fails to trigger the adversarial instability of the network. A surprising takeaway is that even small margins separating a classifier's decision surface from training and testing data can hide adversarial susceptibility from being detected using randomly sampled perturbations. Counterintuitively, using additive noise during training or testing is therefore inefficient for eradicating or detecting adversarial examples, and more demanding adversarial training is required. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 11 pages, 8 figures, additional supplementary materials

arXiv:2211.03607 [pdf, other]

Towards a mathematical understanding of learning from few examples with nonlinear feature maps

Authors: Oliver J. Sutton, Alexander N. Gorban, Ivan Y. Tyukin

Abstract: We consider the problem of data classification where the training set consists of just a few data points. We explore this phenomenon mathematically and reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities. The main thrust of our analysis is to reveal the influence on the model's… ▽ More We consider the problem of data classification where the training set consists of just a few data points. We explore this phenomenon mathematically and reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities. The main thrust of our analysis is to reveal the influence on the model's generalisation capabilities of nonlinear feature transformations map** the original data into high, and possibly infinite, dimensional spaces. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: 18 pages, 8 figures

MSC Class: 68Q32; 68T05

arXiv:2208.13290 [pdf, other]

doi 10.3390/e25010033

Domain Adaptation Principal Component Analysis: base linear method for learning with out-of-distribution data

Authors: Evgeny M Mirkes, Jonathan Bac, Aziz Fouché, Sergey V. Stasenko, Andrei Zinovyev, Alexander N. Gorban

Abstract: Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets red into a common space in which the source dataset is informative for training while the divergence between s… ▽ More Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets red into a common space in which the source dataset is informative for training while the divergence between source and target is minimized. The most popular domain adaptation solutions are based on training neural networks that combine classification and adversarial learning modules, frequently making them both data-hungry and difficult to train. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) that identifies a linear reduced data representation useful for solving the domain adaptation task. DAPCA algorithm introduces positive and negative weights between pairs of data points, and generalizes the supervised extension of principal component analysis. DAPCA is an iterative algorithm that solves a simple quadratic optimization problem at each iteration. The convergence of the algorithm is guaranteed, and the number of iterations is small in practice. We validate the suggested algorithm on previously proposed benchmarks for solving the domain adaptation task. We also show the benefit of using DAPCA in analyzing the single-cell omics datasets in biomedical applications. Overall, DAPCA can serve as a practical preprocessing step in many machine learning applications leading to reduced dataset representations, taking into account possible divergence between source and target domains. △ Less

Submitted 15 December, 2022; v1 submitted 28 August, 2022; originally announced August 2022.

Journal ref: Entropy, 25(1), 33, 2023

arXiv:2205.15696 [pdf]

An Informational Space Based Semantic Analysis for Scientific Texts

Authors: Neslihan Suzen, Alexander N. Gorban, Jeremy Levesley, Evgeny M. Mirkes

Abstract: One major problem in Natural Language Processing is the automatic analysis and representation of human language. Human language is ambiguous and deeper understanding of semantics and creating human-to-machine interaction have required an effort in creating the schemes for act of communication and building common-sense knowledge bases for the 'meaning' in texts. This paper introduces computational… ▽ More One major problem in Natural Language Processing is the automatic analysis and representation of human language. Human language is ambiguous and deeper understanding of semantics and creating human-to-machine interaction have required an effort in creating the schemes for act of communication and building common-sense knowledge bases for the 'meaning' in texts. This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts. Computational methods extracting semantic feature are used to analyse the relations between texts of messages and 'representations of situations' for a newly created large collection of scientific texts, Leicester Scientific Corpus. The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties, with the vectors of some attributes: a list of scientific subject categories that the text belongs to. First, this paper introduces 'Meaning Space' in which the informational representation of the meaning is extracted from the occurrence of the word in texts across the scientific categories, i.e., the meaning of a word is represented by a vector of Relative Information Gain about the subject categories. Then, the meaning space is statistically analysed for Leicester Scientific Dictionary-Core and we investigate 'Principal Components of the Meaning' to describe the adequate dimensions of the meaning. The research in this paper conducts the base for the geometric representation of the meaning of texts. △ Less

Submitted 31 May, 2022; originally announced May 2022.

Comments: 19 pages. arXiv admin note: substantial text overlap with arXiv:2009.08859, arXiv:2004.13717

Journal ref: Computer Science & Information Technology, volume 12, number 08, pp. 81-99, 2022. CS & IT - CSCP 2022

arXiv:2203.16935 [pdf, other]

Learning from few examples with nonlinear feature maps

Authors: Ivan Y. Tyukin, Oliver Sutton, Alexander N. Gorban

Abstract: In this work we consider the problem of data classification in post-classical settings were the number of training examples consists of mere few data points. We explore the phenomenon and reveal key relationships between dimensionality of AI model's feature space, non-degeneracy of data distributions, and the model's generalisation capabilities. The main thrust of our present analysis is on the in… ▽ More In this work we consider the problem of data classification in post-classical settings were the number of training examples consists of mere few data points. We explore the phenomenon and reveal key relationships between dimensionality of AI model's feature space, non-degeneracy of data distributions, and the model's generalisation capabilities. The main thrust of our present analysis is on the influence of nonlinear feature transformations map** original data into higher- and possibly infinite-dimensional spaces on the resulting model's generalisation capabilities. Subject to appropriate assumptions, we establish new relationships between intrinsic dimensions of the transformed data and the probabilities to learn successfully from few presentations. △ Less

Submitted 31 March, 2022; originally announced March 2022.

MSC Class: 68T05; 68Q32

arXiv:2203.16687 [pdf, other]

Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation

Authors: Qinghua Zhou, Alexander N. Gorban, Evgeny M. Mirkes, Jonathan Bac, Andrei Zinovyev, Ivan Y. Tyukin

Abstract: Finding best architectures of learning machines, such as deep neural networks, is a well-known technical and theoretical challenge. Recent work by Mellor et al (2021) showed that there may exist correlations between the accuracies of trained networks and the values of some easily computable measures defined on randomly initialised networks which may enable to search tens of thousands of neural arc… ▽ More Finding best architectures of learning machines, such as deep neural networks, is a well-known technical and theoretical challenge. Recent work by Mellor et al (2021) showed that there may exist correlations between the accuracies of trained networks and the values of some easily computable measures defined on randomly initialised networks which may enable to search tens of thousands of neural architectures without training. Mellor et al used the Hamming distance evaluated over all ReLU neurons as such a measure. Motivated by these findings, in our work, we ask the question of the existence of other and perhaps more principled measures which could be used as determinants of success of a given neural architecture. In particular, we examine, if the dimensionality and quasi-orthogonality of neural networks' feature space could be correlated with the network's performance after training. We showed, using the setup as in Mellor et al, that dimensionality and quasi-orthogonality may jointly serve as network's performance discriminants. In addition to offering new opportunities to accelerate neural architecture search, our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces: data dimension and quasi-orthogonality. △ Less

Submitted 30 March, 2022; originally announced March 2022.

MSC Class: 68T05; 68Q32

arXiv:2109.02596 [pdf, other]

doi 10.3390/e23101368

Scikit-dimension: a Python package for intrinsic dimension estimation

Authors: Jonathan Bac, Evgeny M. Mirkes, Alexander N. Gorban, Ivan Tyukin, Andrei Zinovyev

Abstract: Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces \texttt{scikit-dimension}, an open-source P… ▽ More Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces \texttt{scikit-dimension}, an open-source Python package for intrinsic dimension estimation. \texttt{scikit-dimension} package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface to evaluate global and local intrinsic dimension, as well as generators of synthetic toy and benchmark datasets widespread in the literature. The package is developed with tools assessing the code quality, coverage, unit testing and continuous integration. We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data. The source code is available from https://github.com/j-bac/scikit-dimension , the documentation is available from https://scikit-dimension.readthedocs.io . △ Less

Submitted 6 September, 2021; originally announced September 2021.

Comments: 12 pages, 4 figures, 1 table

Journal ref: Entropy, 2021, 23(10), 1368

arXiv:2106.15416 [pdf, other]

doi 10.3390/e23081090

High-dimensional separability for one- and few-shot learning

Authors: Alexander N. Gorban, Bogdan Grechuk, Evgeny M. Mirkes, Sergey V. Stasenko, Ivan Y. Tyukin

Abstract: This work is driven by a practical question: corrections of Artificial Intelligence (AI) errors. These corrections should be quick and non-iterative. To solve this problem without modification of a legacy AI system, we propose special `external' devices, correctors. Elementary correctors consist of two parts, a classifier that separates the situations with high risk of error from the situations in… ▽ More This work is driven by a practical question: corrections of Artificial Intelligence (AI) errors. These corrections should be quick and non-iterative. To solve this problem without modification of a legacy AI system, we propose special `external' devices, correctors. Elementary correctors consist of two parts, a classifier that separates the situations with high risk of error from the situations in which the legacy AI system works well and a new decision for situations with potential errors. Input signals for the correctors can be the inputs of the legacy AI system, its internal signals, and outputs. If the intrinsic dimensionality of data is high enough then the classifiers for correction of small number of errors can be very simple. According to the blessing of dimensionality effects, even simple and robust Fisher's discriminants can be used for one-shot learning of AI correctors. Stochastic separation theorems provide the mathematical basis for this one-short learning. However, as the number of correctors needed grows, the cluster structure of data becomes important and a new family of stochastic separation theorems is required. We refuse the classical hypothesis of the regularity of the data distribution and assume that the data can have a fine-grained structure with many clusters and peaks in the probability density. New stochastic separation theorems for data with fine-grained structure are formulated and proved. The multi-correctors for granular data are proposed. The advantages of the multi-corrector technology were demonstrated by examples of correcting errors and learning new classes of objects by a deep convolutional neural network on the CIFAR-10 dataset. The key problems of the non-classical high-dimensional data analysis are reviewed together with the basic preprocessing steps including supervised, semi-supervised and domain adaptation Principal Component Analysis. △ Less

Submitted 22 October, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: Corrected and restructured version with some extensions

Journal ref: Entropy. 2021; 23(8):1090

arXiv:2106.13997 [pdf, other]

doi 10.1093/imamat/hxad027

The Feasibility and Inevitability of Stealth Attacks

Authors: Ivan Y. Tyukin, Desmond J. Higham, Alexander Bastounis, Eliyas Woldegeorgis, Alexander N. Gorban

Abstract: We develop and study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence (AI) systems including deep learning neural networks. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself. Such a stealth attack could be conducted by a mischievous, corrupt or disgr… ▽ More We develop and study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence (AI) systems including deep learning neural networks. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself. Such a stealth attack could be conducted by a mischievous, corrupt or disgruntled member of a software development team. It could also be made by those wishing to exploit a ``democratization of AI'' agenda, where network architectures and trained parameter sets are shared publicly. We develop a range of new implementable attack strategies with accompanying analysis, showing that with high probability a stealth attack can be made transparent, in the sense that system performance is unchanged on a fixed validation set which is unknown to the attacker, while evoking any desired output on a trigger input of interest. The attacker only needs to have estimates of the size of the validation set and the spread of the AI's relevant latent space. In the case of deep learning neural networks, we show that a one neuron attack is possible - a modification to the weights and bias associated with a single neuron - revealing a vulnerability arising from over-parameterization. We illustrate these concepts using state of the art architectures on two standard image data sets. Guided by the theory and computational results, we also propose strategies to guard against stealth attacks. △ Less

Submitted 4 January, 2023; v1 submitted 26 June, 2021; originally announced June 2021.

MSC Class: 68T01; 68T05; 90C31

Journal ref: IMA Journal of Applied Mathematics, October 2023, hxad027

arXiv:2104.12174 [pdf, other]

doi 10.1109/IJCNN52387.2021.9534395.

Demystification of Few-shot and One-shot Learning

Authors: Ivan Y. Tyukin, Alexander N. Gorban, Muhammad H. Alkhudaydi, Qinghua Zhou

Abstract: Few-shot and one-shot learning have been the subject of active and intensive research in recent years, with mounting evidence pointing to successful implementation and exploitation of few-shot learning algorithms in practice. Classical statistical learning theories do not fully explain why few- or one-shot learning is at all possible since traditional generalisation bounds normally require large t… ▽ More Few-shot and one-shot learning have been the subject of active and intensive research in recent years, with mounting evidence pointing to successful implementation and exploitation of few-shot learning algorithms in practice. Classical statistical learning theories do not fully explain why few- or one-shot learning is at all possible since traditional generalisation bounds normally require large training and testing samples to be meaningful. This sharply contrasts with numerous examples of successful one- and few-shot learning systems and applications. In this work we present mathematical foundations for a theory of one-shot and few-shot learning and reveal conditions specifying when such learning schemes are likely to succeed. Our theory is based on intrinsic properties of high-dimensional spaces. We show that if the ambient or latent decision space of a learning machine is sufficiently high-dimensional than a large class of objects in this space can indeed be easily learned from few examples provided that certain data non-concentration conditions are met. △ Less

Submitted 29 May, 2021; v1 submitted 25 April, 2021; originally announced April 2021.

Comments: IEEE International Joint Conference on Neural Networks, IJCNN 2021

MSC Class: 68T05; 68T07

Journal ref: In2021 International Joint Conference on Neural Networks (IJCNN) 2021 Jul 18 (pp. 1-7). IEEE

arXiv:2010.05241 [pdf, other]

doi 10.1016/j.neunet.2021.01.034

General stochastic separation theorems with optimal bounds

Authors: Bogdan Grechuk, Alexander N. Gorban, Ivan Y. Tyukin

Abstract: Phenomenon of stochastic separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities. In high-dimensional datasets under broad assumptions each point can be separated from the rest of the set by simple and robust Fisher's discriminant (is Fisher separable). Errors or clusters of errors can be separated from the rest… ▽ More Phenomenon of stochastic separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities. In high-dimensional datasets under broad assumptions each point can be separated from the rest of the set by simple and robust Fisher's discriminant (is Fisher separable). Errors or clusters of errors can be separated from the rest of the data. The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same stochastic separability that holds the keys to understanding the fundamentals of robustness and adaptivity in high-dimensional data-driven AI. To manage errors and analyze vulnerabilities, the stochastic separation theorems should evaluate the probability that the dataset will be Fisher separable in given dimensionality and for a given class of distributions. Explicit and optimal estimates of these separation probabilities are required, and this problem is solved in present work. The general stochastic separation theorems with optimal probability estimates are obtained for important classes of distributions: log-concave distribution, their convex combinations and product distributions. The standard i.i.d. assumption was significantly relaxed. These theorems and estimates can be used both for correction of high-dimensional data driven AI systems and for analysis of their vulnerabilities. The third area of application is the emergence of memories in ensembles of neurons, the phenomena of grandmother's cells and sparse coding in the brain, and explanation of unexpected effectiveness of small neural ensembles in high-dimensional brain. △ Less

Submitted 9 January, 2021; v1 submitted 11 October, 2020; originally announced October 2020.

Comments: Numerical examples and illustrations are added, minor corrections extended discussion and the bibliography

Journal ref: Neural Networks, Volume 138, 2021, Pages 33-56

arXiv:2007.03788 [pdf, other]

doi 10.1093/gigascience/giaa128

Trajectories, bifurcations and pseudotime in large clinical datasets: applications to myocardial infarction and diabetes data

Authors: Sergey E. Golovenkin, Jonathan Bac, Alexander Chervov, Evgeny M. Mirkes, Yuliya V. Orlova, Emmanuel Barillot, Alexander N. Gorban, Andrei Zinovyev

Abstract: Large observational clinical datasets become increasingly available for mining associations between various disease traits and administered therapy. These datasets can be considered as representations of the landscape of all possible disease conditions, in which a concrete pathology develops through a number of stereotypical routes, characterized by `points of no return' and `final states' (such a… ▽ More Large observational clinical datasets become increasingly available for mining associations between various disease traits and administered therapy. These datasets can be considered as representations of the landscape of all possible disease conditions, in which a concrete pathology develops through a number of stereotypical routes, characterized by `points of no return' and `final states' (such as lethal or recovery states). Extracting this information directly from the data remains challenging, especially in the case of synchronic (with a short-term follow up) observations. Here we suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values, through modeling the geometrical data structure as a bouquet of bifurcating clinical trajectories. The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations. The methodology allows positioning a patient on a particular clinical trajectory (pathological scenario) and characterizing the degree of progression along it with a qualitative estimate of the uncertainty of the prognosis. Overall, our pseudo-time quantification-based approach gives a possibility to apply the methods developed for dynamical disease phenoty** and illness trajectory analysis (diachronic data analysis) to synchronic observational data. We developed a tool $ClinTrajan$ for clinical trajectory analysis implemented in Python programming language. We test the methodology in two large publicly available datasets: myocardial infarction complications and readmission of diabetic patients data. △ Less

Submitted 5 October, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

ACM Class: I.2.6; J.3; J.2

Journal ref: GigaScience, Volume 9, Issue 11, 2020, giaa128,

arXiv:2005.06284 [pdf, other]

Pruning coupled with learning, ensembles of minimal neural networks, and future of XAI

Authors: Alexander N. Gorban, Evgeny M. Mirkes

Abstract: Pruning coupled with learning aims to optimize the neural network (NN) structure for solving specific problems. This optimization can be used for various purposes: to prevent overfitting, to save resources for implementation and training, to provide explainability of the trained NN, and many others. The minimal structure that cannot be pruned further is not unique. Ensemble of minimal structures c… ▽ More Pruning coupled with learning aims to optimize the neural network (NN) structure for solving specific problems. This optimization can be used for various purposes: to prevent overfitting, to save resources for implementation and training, to provide explainability of the trained NN, and many others. The minimal structure that cannot be pruned further is not unique. Ensemble of minimal structures can be used as a committee of intellectual agents that solves problems by voting. Each minimal NN presents an "empirical knowledge" about the problem and can be verbalized. The non-uniqueness of such knowledge extracted from data is an important property of data-driven Artificial Intelligence (AI). In this work, we review an approach to pruning based on the principle: What controls training should control pruning. This principle is expected to work both for artificial NN and for selection and modification of important synaptic contacts in brain. In back-propagation artificial NN learning is controlled by the gradient of loss functions. Therefore, the first order sensitivity indicators are used for pruning and the algorithms based on these indicators are reviewed. The notion of logically transparent NN was introduced. The approach was illustrated on the problem of political forecasting: predicting the results of the US presidential election. Eight minimal NN were produced that give different forecasting algorithms. The non-uniqueness of solution can be utilised by creation of expert panels (committee). Another use of NN pluralism is to identify areas of input signals where further data collection is most useful. In Conclusion, we discuss the possible future of widely advertised XAI program. △ Less

Submitted 22 January, 2023; v1 submitted 13 May, 2020; originally announced May 2020.

Comments: Significantly modified and extended version, 23 pages, 5 figures

arXiv:2004.14230 [pdf, other]

doi 10.3390/e22101105

Fractional norms and quasinorms do not help to overcome the curse of dimensionality

Authors: Evgeny M. Mirkes, Jeza Allohibi, Alexander N. Gorban

Abstract: The curse of dimensionality causes the well-known and widely discussed problems for machine learning methods. There is a hypothesis that using of the Manhattan distance and even fractional quasinorms lp (for p less than 1) can help to overcome the curse of dimensionality in classification problems. In this study, we systematically test this hypothesis. We confirm that fractional quasinorms have a… ▽ More The curse of dimensionality causes the well-known and widely discussed problems for machine learning methods. There is a hypothesis that using of the Manhattan distance and even fractional quasinorms lp (for p less than 1) can help to overcome the curse of dimensionality in classification problems. In this study, we systematically test this hypothesis. We confirm that fractional quasinorms have a greater relative contrast or coefficient of variation than the Euclidean norm l2, but we also demonstrate that the distance concentration shows qualitatively the same behaviour for all tested norms and quasinorms and the difference between them decays as dimension tends to infinity. Estimation of classification quality for kNN based on different norms and quasinorms shows that a greater relative contrast does not mean better classifier performance and the worst performance for different databases was shown by different norms (quasinorms). A systematic comparison shows that the difference of the performance of kNN based on lp for p=2, 1, and 0.5 is statistically insignificant. △ Less

Submitted 29 April, 2020; originally announced April 2020.

Journal ref: Entropy. 2020; 22(10):1105

arXiv:2004.13717 [pdf, other]

Informational Space of Meaning for Scientific Texts

Authors: Neslihan Suzen, Evgeny M. Mirkes, Alexander N. Gorban

Abstract: In Natural Language Processing, automatic extracting the meaning of texts constitutes an important problem. Our focus is the computational analysis of meaning of short scientific texts (abstracts or brief reports). In this paper, a vector space model is developed for quantifying the meaning of words and texts. We introduce the Meaning Space, in which the meaning of a word is represented by a vecto… ▽ More In Natural Language Processing, automatic extracting the meaning of texts constitutes an important problem. Our focus is the computational analysis of meaning of short scientific texts (abstracts or brief reports). In this paper, a vector space model is developed for quantifying the meaning of words and texts. We introduce the Meaning Space, in which the meaning of a word is represented by a vector of Relative Information Gain (RIG) about the subject categories that the text belongs to, which can be obtained from observing the word in the text. This new approach is applied to construct the Meaning Space based on Leicester Scientific Corpus (LSC) and Leicester Scientific Dictionary-Core (LScDC). The LSC is a scientific corpus of 1,673,350 abstracts and the LScDC is a scientific dictionary which words are extracted from the LSC. Each text in the LSC belongs to at least one of 252 subject categories of Web of Science (WoS). These categories are used in construction of vectors of information gains. The Meaning Space is described and statistically analysed for the LSC with the LScDC. The usefulness of the proposed representation model is evaluated through top-ranked words in each category. The most informative n words are ordered. We demonstrated that RIG-based word ranking is much more useful than ranking based on raw word frequency in determining the science-specific meaning and importance of a word. The proposed model based on RIG is shown to have ability to stand out topic-specific words in categories. The most informative words are presented for 252 categories. The new scientific dictionary and the 103,998 x 252 Word-Category RIG Matrix are available online. Analysis of the Meaning Space provides us with a tool to further explore quantifying the meaning of a text using more complex and context-dependent meaning models that use co-occurrence of words and their combinations. △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: 320 pages

arXiv:2004.04479 [pdf, ps, other]

doi 10.1109/IJCNN48605.2020.9207472

On Adversarial Examples and Stealth Attacks in Artificial Intelligence Systems

Authors: Ivan Y. Tyukin, Desmond J. Higham, Alexander N. Gorban

Abstract: In this work we present a formal theoretical framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. Our results apply to general multi-class classifiers that map from an input space into a decision space, including artificial neural networks used in deep learning applications. Two classes of attacks are considered. The first cla… ▽ More In this work we present a formal theoretical framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. Our results apply to general multi-class classifiers that map from an input space into a decision space, including artificial neural networks used in deep learning applications. Two classes of attacks are considered. The first class involves adversarial examples and concerns the introduction of small perturbations of the input data that cause misclassification. The second class, introduced here for the first time and named stealth attacks, involves small perturbations to the AI system itself. Here the perturbed system produces whatever output is desired by the attacker on a specific small data set, perhaps even a single input, but performs as normal on a validation set (which is unknown to the attacker). We show that in both cases, i.e., in the case of an attack based on adversarial examples and in the case of a stealth attack, the dimensionality of the AI's decision-making space is a major contributor to the AI's susceptibility. For attacks based on adversarial examples, a second crucial parameter is the absence of local concentrations in the data probability distribution, a property known as Smeared Absolute Continuity. According to our findings, robustness to adversarial examples requires either (a) the data distributions in the AI's feature space to have concentrated probability density functions or (b) the dimensionality of the AI's decision variables to be sufficiently small. We also show how to construct stealth attacks on high-dimensional AI systems that are hard to spot unless the validation set is made exponentially large. △ Less

Submitted 9 April, 2020; originally announced April 2020.

MSC Class: 68T05; 68T10; 90C31

Journal ref: 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, United Kingdom, 2020

arXiv:2001.04959 [pdf, other]

doi 10.3390/e22010082

High--Dimensional Brain in a High-Dimensional World: Blessing of Dimensionality

Authors: Alexander N. Gorban, Valery A. Makarov, Ivan Y. Tyukin

Abstract: High-dimensional data and high-dimensional representations of reality are inherent features of modern Artificial Intelligence systems and applications of machine learning. The well-known phenomenon of the "curse of dimensionality" states: many problems become exponentially difficult in high dimensions. Recently, the other side of the coin, the "blessing of dimensionality", has attracted much atten… ▽ More High-dimensional data and high-dimensional representations of reality are inherent features of modern Artificial Intelligence systems and applications of machine learning. The well-known phenomenon of the "curse of dimensionality" states: many problems become exponentially difficult in high dimensions. Recently, the other side of the coin, the "blessing of dimensionality", has attracted much attention. It turns out that generic high-dimensional datasets exhibit fairly simple geometric properties. Thus, there is a fundamental tradeoff between complexity and simplicity in high dimensional spaces. Here we present a brief explanatory review of recent ideas, results and hypotheses about the blessing of dimensionality and related simplifying effects relevant to machine learning and neuroscience. △ Less

Submitted 14 January, 2020; originally announced January 2020.

Comments: 18 pages, 5 figures

Journal ref: Entropy 2020, 22(1), 82

arXiv:1912.06858 [pdf, other]

LScDC-new large scientific dictionary

Authors: Neslihan Suzen, Evgeny M. Mirkes, Alexander N. Gorban

Abstract: In this paper, we present a scientific corpus of abstracts of academic papers in English -- Leicester Scientific Corpus (LSC). The LSC contains 1,673,824 abstracts of research articles and proceeding papers indexed by Web of Science (WoS) in which publication year is 2014. Each abstract is assigned to at least one of 252 subject categories. Paper metadata include these categories and the number of… ▽ More In this paper, we present a scientific corpus of abstracts of academic papers in English -- Leicester Scientific Corpus (LSC). The LSC contains 1,673,824 abstracts of research articles and proceeding papers indexed by Web of Science (WoS) in which publication year is 2014. Each abstract is assigned to at least one of 252 subject categories. Paper metadata include these categories and the number of citations. We then develop scientific dictionaries named Leicester Scientific Dictionary (LScD) and Leicester Scientific Dictionary-Core (LScDC), where words are extracted from the LSC. The LScD is a list of 974,238 unique words (lemmas). The LScDC is a core list (sub-list) of the LScD with 104,223 lemmas. It was created by removing LScD words appearing in not greater than 10 texts in the LSC. LScD and LScDC are available online. Both the corpus and dictionaries are developed to be later used for quantification of meaning in academic texts. Finally, the core list LScDC was analysed by comparing its words and word frequencies with a classic academic word list 'New Academic Word List (NAWL)' containing 963 word families, which is also sampled from an academic corpus. The major sources of the corpus where NAWL is extracted are Cambridge English Corpus (CEC), oral sources and textbooks. We investigate whether two dictionaries are similar in terms of common words and ranking of words. Our comparison leads us to main conclusion: most of words of NAWL (99.6%) are present in the LScDC but two lists differ in word ranking. This difference is measured. △ Less

Submitted 14 December, 2019; originally announced December 2019.

Comments: 63 pages

arXiv:1910.00445 [pdf, other]

doi 10.1016/j.ins.2021.01.022

Blessing of dimensionality at the edge

Authors: Ivan Y. Tyukin, Alexander N. Gorban, Alistair A. McEwan, Sepehr Meshkinfamfard, Lixin Tang

Abstract: In this paper we present theory and algorithms enabling classes of Artificial Intelligence (AI) systems to continuously and incrementally improve with a-priori quantifiable guarantees - or more specifically remove classification errors - over time. This is distinct from state-of-the-art machine learning, AI, and software approaches. Another feature of this approach is that, in the supervised setti… ▽ More In this paper we present theory and algorithms enabling classes of Artificial Intelligence (AI) systems to continuously and incrementally improve with a-priori quantifiable guarantees - or more specifically remove classification errors - over time. This is distinct from state-of-the-art machine learning, AI, and software approaches. Another feature of this approach is that, in the supervised setting, the computational complexity of training is linear in the number of training samples. At the time of classification, the computational complexity is bounded by few inner product calculations. Moreover, the implementation is shown to be very scalable. This makes it viable for deployment in applications where computational power and memory are limited, such as embedded environments. It enables the possibility for fast on-line optimisation using improved training samples. The approach is based on the concentration of measure effects and stochastic separation theorems and is illustrated with an example on the identification faulty processes in Computer Numerical Control (CNC) milling and with a case study on adaptive removal of false positives in an industrial video surveillance and analytics system. △ Less

Submitted 10 July, 2020; v1 submitted 30 September, 2019; originally announced October 2019.

MSC Class: 68T05; 68T45; 68Q32

Journal ref: Information Sciences, 564, 124-143 (2021)

arXiv:1906.12222 [pdf, ps, other]

doi 10.1016/j.plrev.2019.06.003

Symphony of high-dimensional brain

Authors: Alexander N. Gorban, Valeri A. Makarov, Ivan Y. Tyukin

Abstract: This paper is the final part of the scientific discussion organised by the Journal "Physics of Life Rviews" about the simplicity revolution in neuroscience and AI. This discussion was initiated by the review paper "The unreasonable effectiveness of small neural ensembles in high-dimensional brain". Phys Life Rev 2019, doi 10.1016/j.plrev.2018.09.005, arXiv:1809.07656. The topics of the discussion… ▽ More This paper is the final part of the scientific discussion organised by the Journal "Physics of Life Rviews" about the simplicity revolution in neuroscience and AI. This discussion was initiated by the review paper "The unreasonable effectiveness of small neural ensembles in high-dimensional brain". Phys Life Rev 2019, doi 10.1016/j.plrev.2018.09.005, arXiv:1809.07656. The topics of the discussion varied from the necessity to take into account the difference between the theoretical random distributions and "extremely non-random" real distributions and revise the common machine learning theory, to different forms of the curse of dimensionality and high-dimensional pitfalls in neuroscience. V. K{ů}rkov{á}, A. Tozzi and J.F. Peters, R. Quian Quiroga, P. Varona, R. Barrio, G. Kreiman, L. Fortuna, C. van Leeuwen, R. Quian Quiroga, and V. Kreinovich, A.N. Gorban, V.A. Makarov, and I.Y. Tyukin participated in the discussion. In this paper we analyse the symphony of opinions and the possible outcomes of the simplicity revolution for machine learning and neuroscience. △ Less

Submitted 27 June, 2019; originally announced June 2019.

Journal ref: Physics of Life Reviews, 2019

arXiv:1811.05321 [pdf, other]

doi 10.1016/j.ins.2018.07.040

Correction of AI systems by linear discriminants: Probabilistic foundations

Authors: A. N. Gorban, A. Golubkov, B. Grechuk, E. M. Mirkes, I. Y. Tyukin

Abstract: Artificial Intelligence (AI) systems sometimes make errors and will make errors in the future, from time to time. These errors are usually unexpected, and can lead to dramatic consequences. Intensive development of AI and its practical applications makes the problem of errors more important. Total re-engineering of the systems can create new errors and is not always possible due to the resources i… ▽ More Artificial Intelligence (AI) systems sometimes make errors and will make errors in the future, from time to time. These errors are usually unexpected, and can lead to dramatic consequences. Intensive development of AI and its practical applications makes the problem of errors more important. Total re-engineering of the systems can create new errors and is not always possible due to the resources involved. The important challenge is to develop fast methods to correct errors without damaging existing skills. We formulated the technical requirements to the 'ideal' correctors. Such correctors include binary classifiers, which separate the situations with high risk of errors from the situations where the AI systems work properly. Surprisingly, for essentially high-dimensional data such methods are possible: simple linear Fisher discriminant can separate the situations with errors from correctly solved tasks even for exponentially large samples. The paper presents the probabilistic basis for fast non-destructive correction of AI systems. A series of new stochastic separation theorems is proven. These theorems provide new instruments for fast non-iterative correction of errors of legacy AI systems. The new approaches become efficient in high-dimensions, for correction of high-dimensional systems in high-dimensional world (i.e. for processing of essentially high-dimensional data by large systems). △ Less

Submitted 11 November, 2018; originally announced November 2018.

Comments: arXiv admin note: text overlap with arXiv:1809.07656 and arXiv:1802.02172

Journal ref: Information Sciences 466 (2018), 303-322

arXiv:1810.05593 [pdf, other]

doi 10.1016/j.ins.2018.11.057

Fast Construction of Correcting Ensembles for Legacy Artificial Intelligence Systems: Algorithms and a Case Study

Authors: Ivan Y. Tyukin, Alexander N. Gorban, Stephen Green, Danil Prokhorov

Abstract: This paper presents a technology for simple and computationally efficient improvements of a generic Artificial Intelligence (AI) system, including Multilayer and Deep Learning neural networks. The improvements are, in essence, small network ensembles constructed on top of the existing AI architectures. Theoretical foundations of the technology are based on Stochastic Separation Theorems and the id… ▽ More This paper presents a technology for simple and computationally efficient improvements of a generic Artificial Intelligence (AI) system, including Multilayer and Deep Learning neural networks. The improvements are, in essence, small network ensembles constructed on top of the existing AI architectures. Theoretical foundations of the technology are based on Stochastic Separation Theorems and the ideas of the concentration of measure. We show that, subject to mild technical assumptions on statistical properties of internal signals in the original AI system, the technology enables instantaneous and computationally efficient removal of spurious and systematic errors with probability close to one on the datasets which are exponentially large in dimension. The method is illustrated with numerical examples and a case study of ten digits recognition from American Sign Language. △ Less

Submitted 13 February, 2019; v1 submitted 12 October, 2018; originally announced October 2018.

Journal ref: Information Sciences, 2019

arXiv:1809.07656 [pdf, other]

doi 10.1016/j.plrev.2018.09.005

The unreasonable effectiveness of small neural ensembles in high-dimensional brain

Authors: A. N. Gorban, V. A. Makarov, I. Y. Tyukin

Abstract: Despite the widely-spread consensus on the brain complexity, sprouts of the single neuron revolution emerged in neuroscience in the 1970s. They brought many unexpected discoveries, including grandmother or concept cells and sparse coding of information in the brain. In machine learning for a long time, the famous curse of dimensionality seemed to be an unsolvable problem. Nevertheless, the idea… ▽ More Despite the widely-spread consensus on the brain complexity, sprouts of the single neuron revolution emerged in neuroscience in the 1970s. They brought many unexpected discoveries, including grandmother or concept cells and sparse coding of information in the brain. In machine learning for a long time, the famous curse of dimensionality seemed to be an unsolvable problem. Nevertheless, the idea of the blessing of dimensionality becomes gradually more and more popular. Ensembles of non-interacting or weakly interacting simple units prove to be an effective tool for solving essentially multidimensional problems. This approach is especially useful for one-shot (non-iterative) correction of errors in large legacy artificial intelligence systems. These simplicity revolutions in the era of complexity have deep fundamental reasons grounded in geometry of multidimensional data spaces. To explore and understand these reasons we revisit the background ideas of statistical physics. In the course of the 20th century they were developed into the concentration of measure theory. New stochastic separation theorems reveal the fine structure of the data clouds. We review and analyse biological, physical, and mathematical problems at the core of the fundamental question: how can high-dimensional brain organise reliable and fast learning in high-dimensional world of data by simple tools? Two critical applications are reviewed to exemplify the approach: one-shot correction of errors in intellectual systems and emergence of static and associative memories in ensembles of single neurons. △ Less

Submitted 10 November, 2018; v1 submitted 20 September, 2018; originally announced September 2018.

Comments: Review paper, accepted in Physics of Life Reviews; minor corrections

Journal ref: Physics of Life Reviews Volume 29, July 2019, Pages 55-88

arXiv:1805.01516 [pdf, ps, other]

How deep should be the depth of convolutional neural networks: a backyard dog case study

Authors: A. N. Gorban, E. M. Mirkes, I. Y. Tyukin

Abstract: The work concerns the problem of reducing a pre-trained deep neuronal network to a smaller network, with just few layers, whilst retaining the network's functionality on a given task The proposed approach is motivated by the observation that the aim to deliver the highest accuracy possible in the broadest range of operational conditions, which many deep neural networks models strive to achieve,… ▽ More The work concerns the problem of reducing a pre-trained deep neuronal network to a smaller network, with just few layers, whilst retaining the network's functionality on a given task The proposed approach is motivated by the observation that the aim to deliver the highest accuracy possible in the broadest range of operational conditions, which many deep neural networks models strive to achieve, may not necessarily be always needed, desired, or even achievable due to the lack of data or technical constraints. In relation to the face recognition problem, we formulated an example of such a usecase, the `backyard dog' problem. The `backyard dog', implemented by a lean network, should correctly identify members from a limited group of individuals, a `family', and should distinguish between them. At the same time, the network must produce an alarm to an image of an individual who is not in a member of the family. To produce such a network, we propose a shallowing algorithm. The algorithm takes an existing deep learning model on its input and outputs a shallowed version of it. The algorithm is non-iterative and is based on the Advanced Supervised Principal Component Analysis. Performance of the algorithm is assessed in exhaustive numerical experiments. In the above usecase, the `backyard dog' problem, the method is capable of drastically reducing the depth of deep learning neural networks, albeit at the cost of mild performance deterioration. We developed a simple non-iterative method for shallowing down pre-trained deep networks. The method is generic in the sense that it applies to a broad class of feed-forward networks, and is based on the Advanced Supervise Principal Component Analysis. The method enables generation of families of smaller-size shallower specialized networks tuned for specific operational conditions and tasks from a single larger and more universal legacy network. △ Less

Submitted 8 December, 2019; v1 submitted 3 May, 2018; originally announced May 2018.

Comments: Edited and extended version with more detailed description of numerical experiments

arXiv:1804.07580 [pdf]

doi 10.3390/e22030296

Robust And Scalable Learning Of Complex Dataset Topologies Via Elpigraph

Authors: Luca Albergante, Evgeny M. Mirkes, Huidong Chen, Alexis Martin, Louis Faure, Emmanuel Barillot, Luca Pinello, Alexander N. Gorban, Andrei Zinovyev

Abstract: Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of develo** embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computa… ▽ More Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of develo** embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently with large and complex datasets in various fields from biology, where it can be used to infer gene dynamics from single-cell RNA-Seq, to astronomy, where it can be used to explore complex structures in the distribution of galaxies. △ Less

Submitted 20 June, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

Comments: 32 pages, 14 figures

Journal ref: Entropy 22, no. 3: 296, 2020

arXiv:1802.02172 [pdf, other]

Augmented Artificial Intelligence: a Conceptual Framework

Authors: Alexander N. Gorban, Bogdan Grechuk, Ivan Y. Tyukin

Abstract: All artificial Intelligence (AI) systems make errors. These errors are unexpected, and differ often from the typical human mistakes ("non-human" errors). The AI errors should be corrected without damage of existing skills and, hopefully, avoiding direct human expertise. This paper presents an initial summary report of project taking new and systematic approach to improving the intellectual effecti… ▽ More All artificial Intelligence (AI) systems make errors. These errors are unexpected, and differ often from the typical human mistakes ("non-human" errors). The AI errors should be corrected without damage of existing skills and, hopefully, avoiding direct human expertise. This paper presents an initial summary report of project taking new and systematic approach to improving the intellectual effectiveness of the individual AI by communities of AIs. We combine some ideas of learning in heterogeneous multiagent systems with new and original mathematical approaches for non-iterative corrections of errors of legacy AI systems. The mathematical foundations of AI non-destructive correction are presented and a series of new stochastic separation theorems is proven. These theorems provide a new instrument for the development, analysis, and assessment of machine learning methods and algorithms in high dimension. They demonstrate that in high dimensions and even for exponentially large samples, linear classifiers in their classical Fisher's form are powerful enough to separate errors from correct responses with high probability and to provide efficient solution to the non-destructive corrector problem. In particular, we prove some hypotheses formulated in our paper `Stochastic Separation Theorems' (Neural Networks, 94, 255--259, 2017), and answer one general problem published by Donoho and Tanner in 2009. △ Less

Submitted 24 March, 2018; v1 submitted 6 February, 2018; originally announced February 2018.

Comments: The mathematical part is significantly extended. New stochastic separation theorems are proven for log-concave distributions. Some previously formulated hypotheses are confirmed

arXiv:1801.03421 [pdf, other]

doi 10.1098/rsta.2017.0237

Blessing of dimensionality: mathematical foundations of the statistical physics of data

Authors: A. N. Gorban, I. Y. Tyukin

Abstract: The concentration of measure phenomena were discovered as the mathematical background of statistical mechanics at the end of the XIX - beginning of the XX century and were then explored in mathematics of the XX-XXI centuries. At the beginning of the XXI century, it became clear that the proper utilisation of these phenomena in machine learning might transform the curse of dimensionality into the b… ▽ More The concentration of measure phenomena were discovered as the mathematical background of statistical mechanics at the end of the XIX - beginning of the XX century and were then explored in mathematics of the XX-XXI centuries. At the beginning of the XXI century, it became clear that the proper utilisation of these phenomena in machine learning might transform the curse of dimensionality into the blessing of dimensionality. This paper summarises recently discovered phenomena of measure concentration which drastically simplify some machine learning problems in high dimension, and allow us to correct legacy artificial intelligence systems. The classical concentration of measure theorems state that i.i.d. random points are concentrated in a thin layer near a surface (a sphere or equators of a sphere, an average or median level set of energy or another Lipschitz function, etc.). The new stochastic separation theorems describe the thin structure of these thin layers: the random points are not only concentrated in a thin layer but are all linearly separable from the rest of the set, even for exponentially large random sets. The linear functionals for separation of points can be selected in the form of the linear Fisher's discriminant. All artificial intelligence systems make errors. Non-destructive correction requires separation of the situations (samples) with errors from the samples corresponding to correct behaviour by a simple and robust classifier. The stochastic separation theorems provide us by such classifiers and a non-iterative (one-shot) procedure for learning. △ Less

Submitted 10 January, 2018; originally announced January 2018.

Comments: Accepted for publication in Philosophical Transactions of the Royal Society A, 2018. Comprises of 17 pages and 4 figures

Journal ref: Phil. Trans. R. Soc. A volume 376, issue 2118, 376 20170237, 2018

arXiv:1709.01547 [pdf, other]

doi 10.3389/fnbot.2018.00049

Knowledge Transfer Between Artificial Intelligence Systems

Authors: Ivan Y. Tyukin, Alexander N. Gorban, Konstantin Sofeikov, Ilya Romanenko

Abstract: We consider the fundamental question: how a legacy "student" Artificial Intelligent (AI) system could learn from a legacy "teacher" AI system or a human expert without complete re-training and, most importantly, without requiring significant computational resources. Here "learning" is understood as an ability of one system to mimic responses of the other and vice-versa. We call such learning an Ar… ▽ More We consider the fundamental question: how a legacy "student" Artificial Intelligent (AI) system could learn from a legacy "teacher" AI system or a human expert without complete re-training and, most importantly, without requiring significant computational resources. Here "learning" is understood as an ability of one system to mimic responses of the other and vice-versa. We call such learning an Artificial Intelligence knowledge transfer. We show that if internal variables of the "student" Artificial Intelligent system have the structure of an $n$-dimensional topological vector space and $n$ is sufficiently high then, with probability close to one, the required knowledge transfer can be implemented by simple cascades of linear functionals. In particular, for $n$ sufficiently large, with probability close to one, the "student" system can successfully and non-iteratively learn $k\ll n$ new examples from the "teacher" (or correct the same number of mistakes) at the cost of two additional inner products. The concept is illustrated with an example of knowledge transfer from a pre-trained convolutional neural network to a simple linear classifier with HOG features. △ Less

Submitted 14 November, 2017; v1 submitted 5 September, 2017; originally announced September 2017.

MSC Class: 68T05; 68T30

Journal ref: Front Neurorobot. 2018; 12: 49

arXiv:1703.01203 [pdf, ps, other]

doi 10.1016/j.neunet.2017.07.014

Stochastic Separation Theorems

Authors: A. N. Gorban, I. Y. Tyukin

Abstract: The problem of non-iterative one-shot and non-destructive correction of unavoidable mistakes arises in all Artificial Intelligence applications in the real world. Its solution requires robust separation of samples with errors from samples where the system works properly. We demonstrate that in (moderately) high dimension this separation could be achieved with probability close to one by linear dis… ▽ More The problem of non-iterative one-shot and non-destructive correction of unavoidable mistakes arises in all Artificial Intelligence applications in the real world. Its solution requires robust separation of samples with errors from samples where the system works properly. We demonstrate that in (moderately) high dimension this separation could be achieved with probability close to one by linear discriminants. Surprisingly, separation of a new image from a very large set of known images is almost always possible even in moderately high dimensions by linear functionals, and coefficients of these functionals can be found explicitly. Based on fundamental properties of measure concentration, we show that for $M<a\exp(b{n})$ random $M$-element sets in $\mathbb{R}^n$ are linearly separable with probability $p$, $p>1-\vartheta$, where $1>\vartheta>0$ is a given small constant. Exact values of $a,b>0$ depend on the probability distribution that determines how the random $M$-element sets are drawn, and on the constant $\vartheta$. These {\em stochastic separation theorems} provide a new instrument for the development, analysis, and assessment of machine learning methods and algorithms in high dimension. Theoretical statements are illustrated with numerical examples. △ Less

Submitted 3 August, 2017; v1 submitted 3 March, 2017; originally announced March 2017.

Comments: 6 pages, accepted for publication in Neural Networks (Letter section)

MSC Class: 68T10 ACM Class: I.2.6

Journal ref: Neural Networks 94 (2017), 255-259

arXiv:1610.00494 [pdf, ps, other]

doi 10.1016/j.ins.2019.02.001

One-Trial Correction of Legacy AI Systems and Stochastic Separation Theorems

Authors: Alexander N. Gorban, Ilya Romanenko, Richard Burton, Ivan Y. Tyukin

Abstract: We consider the problem of efficient "on the fly" tuning of existing, or {\it legacy}, Artificial Intelligence (AI) systems. The legacy AI systems are allowed to be of arbitrary class, albeit the data they are using for computing interim or final decision responses should posses an underlying structure of a high-dimensional topological real vector space. The tuning method that we propose enables d… ▽ More We consider the problem of efficient "on the fly" tuning of existing, or {\it legacy}, Artificial Intelligence (AI) systems. The legacy AI systems are allowed to be of arbitrary class, albeit the data they are using for computing interim or final decision responses should posses an underlying structure of a high-dimensional topological real vector space. The tuning method that we propose enables dealing with errors without the need to re-train the system. Instead of re-training a simple cascade of perceptron nodes is added to the legacy system. The added cascade modulates the AI legacy system's decisions. If applied repeatedly, the process results in a network of modulating rules "dressing up" and improving performance of existing AI systems. Mathematical rationale behind the method is based on the fundamental property of measure concentration in high dimensional spaces. The method is illustrated with an example of fine-tuning a deep convolutional network that has been pre-trained to detect pedestrians in images. △ Less

Submitted 13 February, 2019; v1 submitted 3 October, 2016; originally announced October 2016.

Journal ref: Information Sciences, 484, 237-254, 2019

arXiv:1605.06276 [pdf, ps, other]

doi 10.1016/j.neunet.2016.08.007

Piece-wise quadratic approximations of arbitrary error functions for fast and robust machine learning

Authors: A. N. Gorban, E. M. Mirkes, A. Zinovyev

Abstract: Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrated many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore,… ▽ More Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrated many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore, a lot of recent applications in machine learning exploited properties of non-quadratic error functionals based on $L_1$ norm or even sub-linear potentials corresponding to quasinorms $L_p$ ($0<p<1$). The back side of these approaches is increase in computational cost for optimization. Till so far, no approaches have been suggested to deal with {\it arbitrary} error functionals, in a flexible and computationally efficient framework. In this paper, we develop a theory and basic universal data approximation algorithms ($k$-means, principal components, principal manifolds and graphs, regularized and sparse regression), based on piece-wise quadratic error potentials of subquadratic growth (PQSQ potentials). We develop a new and universal framework to minimize {\it arbitrary sub-quadratic error potentials} using an algorithm with guaranteed fast convergence to the local or global error minimum. The theory of PQSQ potentials is based on the notion of the cone of minorant functions, and represents a natural approximation formalism based on the application of min-plus algebra. The approach can be applied in most of existing machine learning methods, including methods of data approximation and regularized and sparse regression, leading to the improvement in the computational cost/accuracy trade-off. We demonstrate that on synthetic and real-life datasets PQSQ-based machine learning methods achieve orders of magnitude faster computational performance than the corresponding state-of-the-art methods. △ Less

Submitted 21 August, 2016; v1 submitted 20 May, 2016; originally announced May 2016.

Comments: Edited and extended version with algortihms of regularized regression

Journal ref: Neural Networks, Volume 84, December 2016, 28-38

arXiv:1603.06828 [pdf, other]

doi 10.5445/KSP/1000058749/11

Robust principal graphs for data approximation

Authors: A. N. Gorban, E. M. Mirkes, A. Zinovyev

Abstract: Revealing hidden geometry and topology in noisy data sets is a challenging task. Elastic principal graph is a computationally efficient and flexible data approximator based on embedding a graph into the data space and minimizing the energy functional penalizing the deviation of graph nodes both from data points and from pluri-harmonic configuration (generalization of linearity). The structure of p… ▽ More Revealing hidden geometry and topology in noisy data sets is a challenging task. Elastic principal graph is a computationally efficient and flexible data approximator based on embedding a graph into the data space and minimizing the energy functional penalizing the deviation of graph nodes both from data points and from pluri-harmonic configuration (generalization of linearity). The structure of principal graph is learned from data by application of a topological grammar which in the simplest case leads to the construction of principal curves or trees. In order to more efficiently cope with noise and outliers, here we suggest using a trimmed data approximation term to increase the robustness of the method. The modification of the method that we suggest does not affect either computational efficiency or general convergence properties of the original elastic graph method. The trimmed elastic energy functional remains a Lyapunov function for the optimization algorithm. On several examples of complex data distributions we demonstrate how the robust principal graphs learn the global data structure and show the advantage of using the trimmed data approximation term for the construction of principal graphs and other popular data approximators. △ Less

Submitted 24 November, 2016; v1 submitted 22 March, 2016; originally announced March 2016.

Comments: A talk given at ECDA2015 (European Conference on Data Analysis, September 2nd to 4th 2015, University of Essex, Colchester, UK), to be published in Archives of Data Science

Journal ref: Archives of Data Science, Series A, Vol. 2, No. 1, 2017

arXiv:1506.04631 [pdf, ps, other]

doi 10.1016/j.ins.2015.09.021

Approximation with Random Bases: Pro et Contra

Authors: Alexander N. Gorban, Ivan Yu. Tyukin, Danil V. Prokhorov, Konstantin I. Sofeikov

Abstract: In this work we discuss the problem of selecting suitable approximators from families of parameterized elementary functions that are known to be dense in a Hilbert space of functions. We consider and analyze published procedures, both randomized and deterministic, for selecting elements from these families that have been shown to ensure the rate of convergence in $L_2$ norm of order $O(1/N)$, wher… ▽ More In this work we discuss the problem of selecting suitable approximators from families of parameterized elementary functions that are known to be dense in a Hilbert space of functions. We consider and analyze published procedures, both randomized and deterministic, for selecting elements from these families that have been shown to ensure the rate of convergence in $L_2$ norm of order $O(1/N)$, where $N$ is the number of elements. We show that both randomized and deterministic procedures are successful if additional information about the families of functions to be approximated is provided. In the absence of such additional information one may observe exponential growth of the number of terms needed to approximate the function and/or extreme sensitivity of the outcome of the approximation to parameters. Implications of our analysis for applications of neural networks in modeling and control are illustrated with examples. △ Less

Submitted 24 October, 2015; v1 submitted 15 June, 2015; originally announced June 2015.

Comments: arXiv admin note: text overlap with arXiv:0905.0677

MSC Class: 41A45; 41A45; 90C59; 92B20; 68W20

Journal ref: Information Sciences 364-365, 10 October 2016, Pages 129-145

arXiv:1406.5550 [pdf, other]

ViDaExpert: user-friendly tool for nonlinear visualization and analysis of multidimensional vectorial data

Authors: Alexander N. Gorban, Alexander Pitenko, Andrei Zinovyev

Abstract: ViDaExpert is a tool for visualization and analysis of multidimensional vectorial data. ViDaExpert is able to work with data tables of "object-feature" type that might contain numerical feature values as well as textual labels for rows (objects) and columns (features). ViDaExpert implements several statistical methods such as standard and weighted Principal Component Analysis (PCA) and the method… ▽ More ViDaExpert is a tool for visualization and analysis of multidimensional vectorial data. ViDaExpert is able to work with data tables of "object-feature" type that might contain numerical feature values as well as textual labels for rows (objects) and columns (features). ViDaExpert implements several statistical methods such as standard and weighted Principal Component Analysis (PCA) and the method of elastic maps (non-linear version of PCA), Linear Discriminant Analysis (LDA), multilinear regression, K-Means clustering, a variant of decision tree construction algorithm. Equipped with several user-friendly dialogs for configuring data point representations (size, shape, color) and fast 3D viewer, ViDaExpert is a handy tool allowing to construct an interactive 3D-scene representing a table of data in multidimensional space and perform its quick and insightfull statistical analysis, from basic to advanced methods. △ Less

Submitted 27 June, 2014; v1 submitted 20 June, 2014; originally announced June 2014.

MSC Class: 68N01; 68W25 ACM Class: G.3; G.4

arXiv:1303.3855 [pdf, ps, other]

doi 10.1016/j.camwa.2013.04.023

Gras** Complexity

Authors: A. N. Gorban, G. S. Yablonsky

Abstract: The century of complexity has come. The face of science has changed. Surprisingly, when we start asking about the essence of these changes and then critically analyse the answers, the result are mostly discouraging. Most of the answers are related to the properties that have been in the focus of scientific research already for more than a century (like non-linearity). This paper is Preface to the… ▽ More The century of complexity has come. The face of science has changed. Surprisingly, when we start asking about the essence of these changes and then critically analyse the answers, the result are mostly discouraging. Most of the answers are related to the properties that have been in the focus of scientific research already for more than a century (like non-linearity). This paper is Preface to the special issue "Gras** Complexity" of the journal "Computers and Mathematics with Applications". We analyse the change of era in science, its reasons and main changes in scientific activity and give a brief review of the papers in the issue. △ Less

Submitted 15 March, 2013; originally announced March 2013.

Comments: 8 pages, 3 figures, bibliography 52 items

Journal ref: Computers and Mathematics with Applications 65 (2013) 1421-1426

arXiv:1302.2645 [pdf]

doi 10.1007/978-3-642-38679-4_50

Geometrical complexity of data approximators

Authors: E. M. Mirkes, A. Zinovyev, A. N. Gorban

Abstract: There are many methods developed to approximate a cloud of vectors embedded in high-dimensional space by simpler objects: starting from principal points and linear manifolds to self-organizing maps, neural gas, elastic maps, various types of principal curves and principal trees, and so on. For each type of approximators the measure of the approximator complexity was developed too. These measures a… ▽ More There are many methods developed to approximate a cloud of vectors embedded in high-dimensional space by simpler objects: starting from principal points and linear manifolds to self-organizing maps, neural gas, elastic maps, various types of principal curves and principal trees, and so on. For each type of approximators the measure of the approximator complexity was developed too. These measures are necessary to find the balance between accuracy and complexity and to define the optimal approximations of a given type. We propose a measure of complexity (geometrical complexity) which is applicable to approximators of several types and which allows comparing data approximations of different types. △ Less

Submitted 3 May, 2013; v1 submitted 11 February, 2013; originally announced February 2013.

Comments: 10 pages, 3 figures, minor correction and extension

Journal ref: IWANN 2013, Advances in Computation Intelligence, Springer LNCS 7902, pp. 500-509, 2013

arXiv:1008.4063 [pdf]

Nonlinear Quality of Life Index

Authors: A. Zinovyev, A. N. Gorban

Abstract: We present details of the analysis of the nonlinear quality of life index for 171 countries. This index is based on four indicators: GDP per capita by Purchasing Power Parities, Life expectancy at birth, Infant mortality rate, and Tuberculosis incidence. We analyze the structure of the data in order to find the optimal and independent on expert's opinion way to map several numerical indicators fro… ▽ More We present details of the analysis of the nonlinear quality of life index for 171 countries. This index is based on four indicators: GDP per capita by Purchasing Power Parities, Life expectancy at birth, Infant mortality rate, and Tuberculosis incidence. We analyze the structure of the data in order to find the optimal and independent on expert's opinion way to map several numerical indicators from a multidimensional space onto the one-dimensional space of the quality of life. In the 4D space we found a principal curve that goes "through the middle" of the dataset and project the data points on this curve. The order along this principal curve gives us the ranking of countries. Projection onto the principal curve provides a solution to the classical problem of unsupervised ranking of objects. It allows us to find the independent on expert's opinion way to project several numerical indicators from a multidimensional space onto the one-dimensional space of the index values. This projection is, in some sense, optimal and preserves as much information as possible. For computation we used ViDaExpert, a tool for visualization and analysis of multidimensional vectorial data (arXiv:1406.5550). △ Less

Submitted 24 July, 2014; v1 submitted 24 August, 2010; originally announced August 2010.

Comments: 9 pages, 1 figure, 1 table with data for 171 countries. In this case study we use only publicly available data taken from GAPMINDER online data base for 2005

arXiv:1001.1122 [pdf, other]

doi 10.1142/S0129065710002383

Principal manifolds and graphs in practice: from molecular biology to dynamical systems

Authors: A. N. Gorban, A. Zinovyev

Abstract: We present several applications of non-linear data modeling, using principal manifolds and principal graphs constructed using the metaphor of elasticity (elastic principal graph approach). These approaches are generalizations of the Kohonen's self-organizing maps, a class of artificial neural networks. On several examples we show advantages of using non-linear objects for data approximation in com… ▽ More We present several applications of non-linear data modeling, using principal manifolds and principal graphs constructed using the metaphor of elasticity (elastic principal graph approach). These approaches are generalizations of the Kohonen's self-organizing maps, a class of artificial neural networks. On several examples we show advantages of using non-linear objects for data approximation in comparison to the linear ones. We propose four numerical criteria for comparing linear and non-linear map**s of datasets into the spaces of lower dimension. The examples are taken from comparative political science, from analysis of high-throughput data in molecular biology, from analysis of dynamical systems. △ Less

Submitted 25 July, 2010; v1 submitted 7 January, 2010; originally announced January 2010.

Comments: 12 pages, 9 figures

Journal ref: International Journal of Neural Systems, Vol. 20, No. 3 (2010) 219-232

arXiv:0809.0490 [pdf]

doi 10.4018/978-1-60566-766-9

Principal Graphs and Manifolds

Authors: A. N. Gorban, A. Y. Zinovyev

Abstract: In many physical, statistical, biological and other investigations it is desirable to approximate a system of points by objects of lower dimension and/or complexity. For this purpose, Karl Pearson invented principal component analysis in 1901 and found 'lines and planes of closest fit to system of points'. The famous k-means algorithm solves the approximation problem too, but by finite sets instea… ▽ More In many physical, statistical, biological and other investigations it is desirable to approximate a system of points by objects of lower dimension and/or complexity. For this purpose, Karl Pearson invented principal component analysis in 1901 and found 'lines and planes of closest fit to system of points'. The famous k-means algorithm solves the approximation problem too, but by finite sets instead of lines and planes. This chapter gives a brief practical introduction into the methods of construction of general principal objects, i.e. objects embedded in the 'middle' of the multidimensional data set. As a basis, the unifying framework of mean squared distance approximation of finite datasets is selected. Principal graphs and manifolds are constructed as generalisations of principal components and k-means principal points. For this purpose, the family of expectation/maximisation algorithms with nearest generalisations is presented. Construction of principal graphs with controlled complexity is based on the graph grammar approach. △ Less

Submitted 9 May, 2011; v1 submitted 2 September, 2008; originally announced September 2008.

Comments: 36 pages, 6 figures, minor corrections

Journal ref: Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods and Techniques, Ch. 2, Information Science Reference, 2009. 28-59

arXiv:cs/0603090 [pdf, ps, other]

doi 10.1016/j.aml.2006.04.022

Topological Grammars for Data Approximation

Authors: A. N. Gorban, N. R. Sumner, A. Y. Zinovyev

Abstract: A method of {\it topological grammars} is proposed for multidimensional data approximation. For data with complex topology we define a {\it principal cubic complex} of low dimension and given complexity that gives the best approximation for the dataset. This complex is a generalization of linear and non-linear principal manifolds and includes them as particular cases. The problem of optimal prin… ▽ More A method of {\it topological grammars} is proposed for multidimensional data approximation. For data with complex topology we define a {\it principal cubic complex} of low dimension and given complexity that gives the best approximation for the dataset. This complex is a generalization of linear and non-linear principal manifolds and includes them as particular cases. The problem of optimal principal complex construction is transformed into a series of minimization problems for quadratic functionals. These quadratic functionals have a physically transparent interpretation in terms of elastic energy. For the energy computation, the whole complex is represented as a system of nodes and springs. Topologically, the principal complex is a product of one-dimensional continuums (represented by graphs), and the grammars describe how these continuums transform during the process of optimal complex construction. This factorization of the whole process onto one-dimensional transformations using minimization of quadratic energy functionals allow us to construct efficient algorithms. △ Less

Submitted 28 July, 2006; v1 submitted 22 March, 2006; originally announced March 2006.

Comments: Corrected Journal version, Appl. Math. Lett., in press. 7 pgs., 2 figs

Journal ref: Applied Mathematics Letters 20 (2007) 382--386

arXiv:cond-mat/0307083 [pdf]

Generation of Explicit Knowledge from Empirical Data through Pruning of Trainable Neural Networks

Authors: A. N. Gorban, Eu. M. Mirkes, V. G. Tsaregorodtsev

Abstract: This paper presents a generalized technology of extraction of explicit knowledge from data. The main ideas are 1) maximal reduction of network complexity (not only removal of neurons or synapses, but removal all the unnecessary elements and signals and reduction of the complexity of elements), 2) using of adjustable and flexible pruning process (the pruning sequence shouldn't be predetermined -… ▽ More This paper presents a generalized technology of extraction of explicit knowledge from data. The main ideas are 1) maximal reduction of network complexity (not only removal of neurons or synapses, but removal all the unnecessary elements and signals and reduction of the complexity of elements), 2) using of adjustable and flexible pruning process (the pruning sequence shouldn't be predetermined - the user should have a possibility to prune network on his own way in order to achieve a desired network structure for the purpose of extraction of rules of desired type and form), and 3) extraction of rules not in predetermined but any desired form. Some considerations and notes about network architecture and training process and applicability of currently developed pruning techniques and rule extraction algorithms are discussed. This technology, being developed by us for more than 10 years, allowed us to create dozens of knowledge-based expert systems. In this paper we present a generalized three-step technology of extraction of explicit knowledge from empirical data. △ Less

Submitted 3 July, 2003; originally announced July 2003.

Comments: 9 pages, The talk was given at the IJCNN '99 (Washington DC, July 1999)

arXiv:cond-mat/0305681 [pdf]

Seven clusters in genomic triplet distributions

Authors: A. N. Gorban, A. Yu. Zinovyev, T. G. Popova

Abstract: In several recent papers new gene-detection algorithms were proposed for detecting protein-coding regions without requiring learning dataset of already known genes. The fact that unsupervised gene-detection is possible closely connected to existence of a cluster structure in oligomer frequency distributions. In this paper we study cluster structure of several genomes in the space of their triple… ▽ More In several recent papers new gene-detection algorithms were proposed for detecting protein-coding regions without requiring learning dataset of already known genes. The fact that unsupervised gene-detection is possible closely connected to existence of a cluster structure in oligomer frequency distributions. In this paper we study cluster structure of several genomes in the space of their triplet frequencies, using pure data exploration strategy. Several complete genomic sequences were analyzed, using visualization of tables of triplet frequencies in a sliding window. The distribution of 64-dimensional vectors of triplet frequencies displays a well-detectable cluster structure. The structure was found to consist of seven clusters, corresponding to protein-coding information in three possible phases in one of the two complementary strands and in the non-coding regions with high accuracy (higher than 90% on the nucleotide level). Visualizing and understanding the structure allows to analyze effectively performance of different gene-prediction tools. Since the method does not require extraction of ORFs, it can be applied even for unassembled genomes. The information content of the triplet distributions and the validity of the mean-field models are analysed. △ Less

Submitted 23 November, 2004; v1 submitted 29 May, 2003; originally announced May 2003.

Comments: Correction of URL. 16 pages, 5 figures. The software and datasets are available at http://www.ihes.fr/~zinovyev/bullet and http://www.ihes.fr/~zinovyev/7clusters Paper also available at http://www.bioinfo.de/isb/2003/03/0039

Journal ref: In Silico Biology, 3 (2003), 0039, 471-482

arXiv:cond-mat/0305527 [pdf]

doi 10.1109/ICNN.1997.614206

Back-propagation of accuracy

Authors: M. Yu. Senashova, A. N. Gorban, D. C. Wunsch II

Abstract: In this paper we solve the problem: how to determine maximal allowable errors, possible for signals and parameters of each element of a network proceeding from the condition that the vector of output signals of the network should be calculated with given accuracy? "Back-propagation of accuracy" is developed to solve this problem. The calculation of allowable errors for each element of network by… ▽ More In this paper we solve the problem: how to determine maximal allowable errors, possible for signals and parameters of each element of a network proceeding from the condition that the vector of output signals of the network should be calculated with given accuracy? "Back-propagation of accuracy" is developed to solve this problem. The calculation of allowable errors for each element of network by back-propagation of accuracy is surprisingly similar to a back-propagation of error, because it is the backward signals motion, but at the same time it is very different because the new rules of signals transformation in the passing back through the elements are different. The method allows us to formulate the requirements to the accuracy of calculations and to the realization of technical devices, if the requirements to the accuracy of output signals of the network are known. △ Less

Submitted 15 November, 2004; v1 submitted 22 May, 2003; originally announced May 2003.

Comments: 4 pages, 5 figures, The talk given on ICNN97 (The 1997 IEEE International Conference on Neural Networks, Houston, USA)

Journal ref: Proceedings of International Conference on Neural Networks (ICNN'97), 1997, pp. 1998-2001 vol.3

arXiv:cond-mat/0305508 [pdf]

Neural network modeling of data with gaps: method of principal curves, Carleman's formula, and other

Authors: A. N. Gorban, A. A. Rossiev, D. C. Wunsch II

Abstract: A method of modeling data with gaps by a sequence of curves has been developed. The new method is a generalization of iterative construction of singular expansion of matrices with gaps. Under discussion are three versions of the method featuring clear physical interpretation: linear - modeling the data by a sequence of linear manifolds of small dimension; quasilinear - constructing "principal cu… ▽ More A method of modeling data with gaps by a sequence of curves has been developed. The new method is a generalization of iterative construction of singular expansion of matrices with gaps. Under discussion are three versions of the method featuring clear physical interpretation: linear - modeling the data by a sequence of linear manifolds of small dimension; quasilinear - constructing "principal curves: (or "principal surfaces"), univalently projected on the linear principal components; essentially non-linear - based on constructing "principal curves": (principal strings and beams) employing the variation principle; the iteration implementation of this method is close to Kohonen self-organizing maps. The derived dependencies are extrapolated by Carleman's formulas. The method is interpreted as a construction of neural network conveyor designed to solve the following problems: to fill gaps in data; to repair data - to correct initial data values in such a way as to make the constructed models work best; to construct a calculator to fill gaps in the data line fed to the input. △ Less

Submitted 21 May, 2003; originally announced May 2003.

Comments: 28 pages, 7 figures,The talk was given at the USA-NIS Neurocomputing opportunities workshop, Washington DC, July 1999 (Associated with IJCNN'99)

Showing 1–50 of 50 results for author: Gorban, A N