-
Stealth edits for provably fixing or attacking large language models
Authors:
Oliver J. Sutton,
Qinghua Zhou,
Wei Wang,
Desmond J. Higham,
Alexander N. Gorban,
Alexander Bastounis,
Ivan Y. Tyukin
Abstract:
We reveal new methods and the theoretical foundations of techniques for editing large language models. We also show how the new theory can be used to assess the editability of models and to expose their susceptibility to previously unknown malicious attacks. Our theoretical approach shows that a single metric (a specific measure of the intrinsic dimensionality of the model's features) is fundament…
▽ More
We reveal new methods and the theoretical foundations of techniques for editing large language models. We also show how the new theory can be used to assess the editability of models and to expose their susceptibility to previously unknown malicious attacks. Our theoretical approach shows that a single metric (a specific measure of the intrinsic dimensionality of the model's features) is fundamental to predicting the success of popular editing approaches, and reveals new bridges between disparate families of editing methods. We collectively refer to these approaches as stealth editing methods, because they aim to directly and inexpensively update a model's weights to correct the model's responses to known hallucinating prompts without otherwise affecting the model's behaviour, without requiring retraining. By carefully applying the insight gleaned from our theoretical investigation, we are able to introduce a new network block -- named a jet-pack block -- which is optimised for highly selective model editing, uses only standard network operations, and can be inserted into existing networks. The intrinsic dimensionality metric also determines the vulnerability of a language model to a stealth attack: a small change to a model's weights which changes its response to a single attacker-chosen prompt. Stealth attacks do not require access to or knowledge of the model's training data, therefore representing a potent yet previously unrecognised threat to redistributed foundation models. They are computationally simple enough to be implemented in malware in many cases. Extensive experimental results illustrate and support the method and its theoretical underpinnings. Demos and source code for editing language models are available at https://github.com/qinghua-zhou/stealth-edits.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Agile gesture recognition for low-power applications: customisation for generalisation
Authors:
Ying Liu,
Liucheng Guo,
Valeri A. Makarovc,
Alexander Gorbana,
Evgeny Mirkesa,
Ivan Y. Tyukin
Abstract:
Automated hand gesture recognition has long been a focal point in the AI community. Traditionally, research in this field has predominantly focused on scenarios with access to a continuous flow of hand's images. This focus has been driven by the widespread use of cameras and the abundant availability of image data. However, there is an increasing demand for gesture recognition technologies that op…
▽ More
Automated hand gesture recognition has long been a focal point in the AI community. Traditionally, research in this field has predominantly focused on scenarios with access to a continuous flow of hand's images. This focus has been driven by the widespread use of cameras and the abundant availability of image data. However, there is an increasing demand for gesture recognition technologies that operate on low-power sensor devices. This is due to the rising concerns for data leakage and end-user privacy, as well as the limited battery capacity and the computing power in low-cost devices. Moreover, the challenge in data collection for individually designed hardware also hinders the generalisation of a gesture recognition model.
In this study, we unveil a novel methodology for pattern recognition systems using adaptive and agile error correction, designed to enhance the performance of legacy gesture recognition models on devices with limited battery capacity and computing power. This system comprises a compact Support Vector Machine as the base model for live gesture recognition. Additionally, it features an adaptive agile error corrector that employs few-shot learning within the feature space induced by high-dimensional kernel map**s. The error corrector can be customised for each user, allowing for dynamic adjustments to the gesture prediction based on their movement patterns while maintaining the agile performance of its base model on a low-cost and low-power micro-controller. This proposed system is distinguished by its compact size, rapid processing speed, and low power consumption, making it ideal for a wide range of embedded systems.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Weakly Supervised Learners for Correction of AI Errors with Provable Performance Guarantees
Authors:
Ivan Y. Tyukin,
Tatiana Tyukina,
Daniel van Helden,
Zedong Zheng,
Evgeny M. Mirkes,
Oliver J. Sutton,
Qinghua Zhou,
Alexander N. Gorban,
Penelope Allison
Abstract:
We present a new methodology for handling AI errors by introducing weakly supervised AI error correctors with a priori performance guarantees. These AI correctors are auxiliary maps whose role is to moderate the decisions of some previously constructed underlying classifier by either approving or rejecting its decisions. The rejection of a decision can be used as a signal to suggest abstaining fro…
▽ More
We present a new methodology for handling AI errors by introducing weakly supervised AI error correctors with a priori performance guarantees. These AI correctors are auxiliary maps whose role is to moderate the decisions of some previously constructed underlying classifier by either approving or rejecting its decisions. The rejection of a decision can be used as a signal to suggest abstaining from making a decision. A key technical focus of the work is in providing performance guarantees for these new AI correctors through bounds on the probabilities of incorrect decisions. These bounds are distribution agnostic and do not rely on assumptions on the data dimension. Our empirical example illustrates how the framework can be applied to improve the performance of an image classifier in a challenging real-world task where training data are scarce.
△ Less
Submitted 13 February, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
Relative intrinsic dimensionality is intrinsic to learning
Authors:
Oliver J. Sutton,
Qinghua Zhou,
Alexander N. Gorban,
Ivan Y. Tyukin
Abstract:
High dimensional data can have a surprising property: pairs of data points may be easily separated from each other, or even from arbitrary subsets, with high probability using just simple linear classifiers. However, this is more of a rule of thumb than a reliable property as high dimensionality alone is neither necessary nor sufficient for successful learning. Here, we introduce a new notion of t…
▽ More
High dimensional data can have a surprising property: pairs of data points may be easily separated from each other, or even from arbitrary subsets, with high probability using just simple linear classifiers. However, this is more of a rule of thumb than a reliable property as high dimensionality alone is neither necessary nor sufficient for successful learning. Here, we introduce a new notion of the intrinsic dimension of a data distribution, which precisely captures the separability properties of the data. For this intrinsic dimension, the rule of thumb above becomes a law: high intrinsic dimension guarantees highly separable data. We extend this notion to that of the relative intrinsic dimension of two data distributions, which we show provides both upper and lower bounds on the probability of successfully learning and generalising in a binary classification problem
△ Less
Submitted 10 October, 2023;
originally announced November 2023.
-
The Boundaries of Verifiable Accuracy, Robustness, and Generalisation in Deep Learning
Authors:
Alexander Bastounis,
Alexander N. Gorban,
Anders C. Hansen,
Desmond J. Higham,
Danil Prokhorov,
Oliver Sutton,
Ivan Y. Tyukin,
Qinghua Zhou
Abstract:
In this work, we assess the theoretical limitations of determining guaranteed stability and accuracy of neural networks in classification tasks. We consider classical distribution-agnostic framework and algorithms minimising empirical risks and potentially subjected to some weights regularisation. We show that there is a large family of tasks for which computing and verifying ideal stable and accu…
▽ More
In this work, we assess the theoretical limitations of determining guaranteed stability and accuracy of neural networks in classification tasks. We consider classical distribution-agnostic framework and algorithms minimising empirical risks and potentially subjected to some weights regularisation. We show that there is a large family of tasks for which computing and verifying ideal stable and accurate neural networks in the above settings is extremely challenging, if at all possible, even when such ideal solutions exist within the given class of neural architectures.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
How adversarial attacks can disrupt seemingly stable accurate classifiers
Authors:
Oliver J. Sutton,
Qinghua Zhou,
Ivan Y. Tyukin,
Alexander N. Gorban,
Alexander Bastounis,
Desmond J. Higham
Abstract:
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show th…
▽ More
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability -- notably the simultaneous susceptibility of the (otherwise accurate) model to easily constructed adversarial attacks, and robustness to random perturbations of the input data. We confirm that the same phenomena are directly observed in practical neural networks trained on standard image classification problems, where even large additive random noise fails to trigger the adversarial instability of the network. A surprising takeaway is that even small margins separating a classifier's decision surface from training and testing data can hide adversarial susceptibility from being detected using randomly sampled perturbations. Counterintuitively, using additive noise during training or testing is therefore inefficient for eradicating or detecting adversarial examples, and more demanding adversarial training is required.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Agile gesture recognition for capacitive sensing devices: adapting on-the-job
Authors:
Ying Liu,
Liucheng Guo,
Valeri A. Makarov,
Yuxiang Huang,
Alexander Gorban,
Evgeny Mirkes,
Ivan Y. Tyukin
Abstract:
Automated hand gesture recognition has been a focus of the AI community for decades. Traditionally, work in this domain revolved largely around scenarios assuming the availability of the flow of images of the user hands. This has partly been due to the prevalence of camera-based devices and the wide availability of image data. However, there is growing demand for gesture recognition technology tha…
▽ More
Automated hand gesture recognition has been a focus of the AI community for decades. Traditionally, work in this domain revolved largely around scenarios assuming the availability of the flow of images of the user hands. This has partly been due to the prevalence of camera-based devices and the wide availability of image data. However, there is growing demand for gesture recognition technology that can be implemented on low-power devices using limited sensor data instead of high-dimensional inputs like hand images. In this work, we demonstrate a hand gesture recognition system and method that uses signals from capacitive sensors embedded into the etee hand controller. The controller generates real-time signals from each of the wearer five fingers. We use a machine learning technique to analyse the time series signals and identify three features that can represent 5 fingers within 500 ms. The analysis is composed of a two stage training strategy, including dimension reduction through principal component analysis and classification with K nearest neighbour. Remarkably, we found that this combination showed a level of performance which was comparable to more advanced methods such as supervised variational autoencoder. The base system can also be equipped with the capability to learn from occasional errors by providing it with an additional adaptive error correction mechanism. The results showed that the error corrector improve the classification performance in the base system without compromising its performance. The system requires no more than 1 ms of computing time per input sample, and is smaller than deep neural networks, demonstrating the feasibility of agile gesture recognition systems based on this technology.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
MyI-Net: Fully Automatic Detection and Quantification of Myocardial Infarction from Cardiovascular MRI Images
Authors:
Shuihua Wang,
Ahmed M. S. E. K Abdelaty,
Kelly Parke,
J Ranjit Arnold,
Gerry P McCann,
Ivan Y Tyukin
Abstract:
A "heart attack" or myocardial infarction (MI), occurs when an artery supplying blood to the heart is abruptly occluded. The "gold standard" method for imaging MI is Cardiovascular Magnetic Resonance Imaging (MRI), with intravenously administered gadolinium-based contrast (late gadolinium enhancement). However, no "gold standard" fully automated method for the quantification of MI exists. In this…
▽ More
A "heart attack" or myocardial infarction (MI), occurs when an artery supplying blood to the heart is abruptly occluded. The "gold standard" method for imaging MI is Cardiovascular Magnetic Resonance Imaging (MRI), with intravenously administered gadolinium-based contrast (late gadolinium enhancement). However, no "gold standard" fully automated method for the quantification of MI exists. In this work, we propose an end-to-end fully automatic system (MyI-Net) for the detection and quantification of MI in MRI images. This has the potential to reduce the uncertainty due to the technical variability across labs and inherent problems of the data and labels. Our system consists of four processing stages designed to maintain the flow of information across scales. First, features from raw MRI images are generated using feature extractors built on ResNet and MoblieNet architectures. This is followed by the Atrous Spatial Pyramid Pooling (ASPP) to produce spatial information at different scales to preserve more image context. High-level features from ASPP and initial low-level features are concatenated at the third stage and then passed to the fourth stage where spatial information is recovered via up-sampling to produce final image segmentation output into: i) background, ii) heart muscle, iii) blood and iv) scar areas. New models were compared with state-of-art models and manual quantification. Our models showed favorable performance in global segmentation and scar tissue detection relative to state-of-the-art work, including a four-fold better performance in matching scar pixels to contours produced by clinicians.
△ Less
Submitted 28 December, 2022;
originally announced December 2022.
-
Towards a mathematical understanding of learning from few examples with nonlinear feature maps
Authors:
Oliver J. Sutton,
Alexander N. Gorban,
Ivan Y. Tyukin
Abstract:
We consider the problem of data classification where the training set consists of just a few data points. We explore this phenomenon mathematically and reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities. The main thrust of our analysis is to reveal the influence on the model's…
▽ More
We consider the problem of data classification where the training set consists of just a few data points. We explore this phenomenon mathematically and reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities. The main thrust of our analysis is to reveal the influence on the model's generalisation capabilities of nonlinear feature transformations map** the original data into high, and possibly infinite, dimensional spaces.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Learning from few examples with nonlinear feature maps
Authors:
Ivan Y. Tyukin,
Oliver Sutton,
Alexander N. Gorban
Abstract:
In this work we consider the problem of data classification in post-classical settings were the number of training examples consists of mere few data points. We explore the phenomenon and reveal key relationships between dimensionality of AI model's feature space, non-degeneracy of data distributions, and the model's generalisation capabilities. The main thrust of our present analysis is on the in…
▽ More
In this work we consider the problem of data classification in post-classical settings were the number of training examples consists of mere few data points. We explore the phenomenon and reveal key relationships between dimensionality of AI model's feature space, non-degeneracy of data distributions, and the model's generalisation capabilities. The main thrust of our present analysis is on the influence of nonlinear feature transformations map** original data into higher- and possibly infinite-dimensional spaces on the resulting model's generalisation capabilities. Subject to appropriate assumptions, we establish new relationships between intrinsic dimensions of the transformed data and the probabilities to learn successfully from few presentations.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation
Authors:
Qinghua Zhou,
Alexander N. Gorban,
Evgeny M. Mirkes,
Jonathan Bac,
Andrei Zinovyev,
Ivan Y. Tyukin
Abstract:
Finding best architectures of learning machines, such as deep neural networks, is a well-known technical and theoretical challenge. Recent work by Mellor et al (2021) showed that there may exist correlations between the accuracies of trained networks and the values of some easily computable measures defined on randomly initialised networks which may enable to search tens of thousands of neural arc…
▽ More
Finding best architectures of learning machines, such as deep neural networks, is a well-known technical and theoretical challenge. Recent work by Mellor et al (2021) showed that there may exist correlations between the accuracies of trained networks and the values of some easily computable measures defined on randomly initialised networks which may enable to search tens of thousands of neural architectures without training. Mellor et al used the Hamming distance evaluated over all ReLU neurons as such a measure. Motivated by these findings, in our work, we ask the question of the existence of other and perhaps more principled measures which could be used as determinants of success of a given neural architecture. In particular, we examine, if the dimensionality and quasi-orthogonality of neural networks' feature space could be correlated with the network's performance after training. We showed, using the setup as in Mellor et al, that dimensionality and quasi-orthogonality may jointly serve as network's performance discriminants. In addition to offering new opportunities to accelerate neural architecture search, our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces: data dimension and quasi-orthogonality.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
Learning from scarce information: using synthetic data to classify Roman fine ware pottery
Authors:
Santos J. Núñez Jareño,
Daniël P. van Helden,
Evgeny M. Mirkes,
Ivan Y. Tyukin,
Penelope M. Allison
Abstract:
In this article we consider a version of the challenging problem of learning from datasets whose size is too limited to allow generalisation beyond the training set. To address the challenge we propose to use a transfer learning approach whereby the model is first trained on a synthetic dataset replicating features of the original objects. In this study the objects were smartphone photographs of n…
▽ More
In this article we consider a version of the challenging problem of learning from datasets whose size is too limited to allow generalisation beyond the training set. To address the challenge we propose to use a transfer learning approach whereby the model is first trained on a synthetic dataset replicating features of the original objects. In this study the objects were smartphone photographs of near-complete Roman terra sigillata pottery vessels from the collection of the Museum of London. Taking the replicated features from published profile drawings of pottery forms allowed the integration of expert knowledge into the process through our synthetic data generator. After this first initial training the model was fine-tuned with data from photographs of real vessels. We show, through exhaustive experiments across several popular deep learning architectures, different test priors, and considering the impact of the photograph viewpoint and excessive damage to the vessels, that the proposed hybrid approach enables the creation of classifiers with appropriate generalisation performance. This performance is significantly better than that of classifiers trained exclusively on the original data which shows the promise of the approach to alleviate the fundamental issue of learning from small datasets.
△ Less
Submitted 3 July, 2021;
originally announced July 2021.
-
High-dimensional separability for one- and few-shot learning
Authors:
Alexander N. Gorban,
Bogdan Grechuk,
Evgeny M. Mirkes,
Sergey V. Stasenko,
Ivan Y. Tyukin
Abstract:
This work is driven by a practical question: corrections of Artificial Intelligence (AI) errors. These corrections should be quick and non-iterative. To solve this problem without modification of a legacy AI system, we propose special `external' devices, correctors. Elementary correctors consist of two parts, a classifier that separates the situations with high risk of error from the situations in…
▽ More
This work is driven by a practical question: corrections of Artificial Intelligence (AI) errors. These corrections should be quick and non-iterative. To solve this problem without modification of a legacy AI system, we propose special `external' devices, correctors. Elementary correctors consist of two parts, a classifier that separates the situations with high risk of error from the situations in which the legacy AI system works well and a new decision for situations with potential errors. Input signals for the correctors can be the inputs of the legacy AI system, its internal signals, and outputs. If the intrinsic dimensionality of data is high enough then the classifiers for correction of small number of errors can be very simple. According to the blessing of dimensionality effects, even simple and robust Fisher's discriminants can be used for one-shot learning of AI correctors. Stochastic separation theorems provide the mathematical basis for this one-short learning. However, as the number of correctors needed grows, the cluster structure of data becomes important and a new family of stochastic separation theorems is required. We refuse the classical hypothesis of the regularity of the data distribution and assume that the data can have a fine-grained structure with many clusters and peaks in the probability density. New stochastic separation theorems for data with fine-grained structure are formulated and proved. The multi-correctors for granular data are proposed. The advantages of the multi-corrector technology were demonstrated by examples of correcting errors and learning new classes of objects by a deep convolutional neural network on the CIFAR-10 dataset. The key problems of the non-classical high-dimensional data analysis are reviewed together with the basic preprocessing steps including supervised, semi-supervised and domain adaptation Principal Component Analysis.
△ Less
Submitted 22 October, 2021; v1 submitted 28 June, 2021;
originally announced June 2021.
-
The Feasibility and Inevitability of Stealth Attacks
Authors:
Ivan Y. Tyukin,
Desmond J. Higham,
Alexander Bastounis,
Eliyas Woldegeorgis,
Alexander N. Gorban
Abstract:
We develop and study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence (AI) systems including deep learning neural networks. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself. Such a stealth attack could be conducted by a mischievous, corrupt or disgr…
▽ More
We develop and study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence (AI) systems including deep learning neural networks. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself. Such a stealth attack could be conducted by a mischievous, corrupt or disgruntled member of a software development team. It could also be made by those wishing to exploit a ``democratization of AI'' agenda, where network architectures and trained parameter sets are shared publicly. We develop a range of new implementable attack strategies with accompanying analysis, showing that with high probability a stealth attack can be made transparent, in the sense that system performance is unchanged on a fixed validation set which is unknown to the attacker, while evoking any desired output on a trigger input of interest. The attacker only needs to have estimates of the size of the validation set and the spread of the AI's relevant latent space. In the case of deep learning neural networks, we show that a one neuron attack is possible - a modification to the weights and bias associated with a single neuron - revealing a vulnerability arising from over-parameterization. We illustrate these concepts using state of the art architectures on two standard image data sets. Guided by the theory and computational results, we also propose strategies to guard against stealth attacks.
△ Less
Submitted 4 January, 2023; v1 submitted 26 June, 2021;
originally announced June 2021.
-
Demystification of Few-shot and One-shot Learning
Authors:
Ivan Y. Tyukin,
Alexander N. Gorban,
Muhammad H. Alkhudaydi,
Qinghua Zhou
Abstract:
Few-shot and one-shot learning have been the subject of active and intensive research in recent years, with mounting evidence pointing to successful implementation and exploitation of few-shot learning algorithms in practice. Classical statistical learning theories do not fully explain why few- or one-shot learning is at all possible since traditional generalisation bounds normally require large t…
▽ More
Few-shot and one-shot learning have been the subject of active and intensive research in recent years, with mounting evidence pointing to successful implementation and exploitation of few-shot learning algorithms in practice. Classical statistical learning theories do not fully explain why few- or one-shot learning is at all possible since traditional generalisation bounds normally require large training and testing samples to be meaningful. This sharply contrasts with numerous examples of successful one- and few-shot learning systems and applications.
In this work we present mathematical foundations for a theory of one-shot and few-shot learning and reveal conditions specifying when such learning schemes are likely to succeed. Our theory is based on intrinsic properties of high-dimensional spaces. We show that if the ambient or latent decision space of a learning machine is sufficiently high-dimensional than a large class of objects in this space can indeed be easily learned from few examples provided that certain data non-concentration conditions are met.
△ Less
Submitted 29 May, 2021; v1 submitted 25 April, 2021;
originally announced April 2021.
-
General stochastic separation theorems with optimal bounds
Authors:
Bogdan Grechuk,
Alexander N. Gorban,
Ivan Y. Tyukin
Abstract:
Phenomenon of stochastic separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities. In high-dimensional datasets under broad assumptions each point can be separated from the rest of the set by simple and robust Fisher's discriminant (is Fisher separable). Errors or clusters of errors can be separated from the rest…
▽ More
Phenomenon of stochastic separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities. In high-dimensional datasets under broad assumptions each point can be separated from the rest of the set by simple and robust Fisher's discriminant (is Fisher separable). Errors or clusters of errors can be separated from the rest of the data. The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same stochastic separability that holds the keys to understanding the fundamentals of robustness and adaptivity in high-dimensional data-driven AI. To manage errors and analyze vulnerabilities, the stochastic separation theorems should evaluate the probability that the dataset will be Fisher separable in given dimensionality and for a given class of distributions. Explicit and optimal estimates of these separation probabilities are required, and this problem is solved in present work. The general stochastic separation theorems with optimal probability estimates are obtained for important classes of distributions: log-concave distribution, their convex combinations and product distributions. The standard i.i.d. assumption was significantly relaxed. These theorems and estimates can be used both for correction of high-dimensional data driven AI systems and for analysis of their vulnerabilities. The third area of application is the emergence of memories in ensembles of neurons, the phenomena of grandmother's cells and sparse coding in the brain, and explanation of unexpected effectiveness of small neural ensembles in high-dimensional brain.
△ Less
Submitted 9 January, 2021; v1 submitted 11 October, 2020;
originally announced October 2020.
-
On Adversarial Examples and Stealth Attacks in Artificial Intelligence Systems
Authors:
Ivan Y. Tyukin,
Desmond J. Higham,
Alexander N. Gorban
Abstract:
In this work we present a formal theoretical framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. Our results apply to general multi-class classifiers that map from an input space into a decision space, including artificial neural networks used in deep learning applications. Two classes of attacks are considered. The first cla…
▽ More
In this work we present a formal theoretical framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. Our results apply to general multi-class classifiers that map from an input space into a decision space, including artificial neural networks used in deep learning applications. Two classes of attacks are considered. The first class involves adversarial examples and concerns the introduction of small perturbations of the input data that cause misclassification. The second class, introduced here for the first time and named stealth attacks, involves small perturbations to the AI system itself. Here the perturbed system produces whatever output is desired by the attacker on a specific small data set, perhaps even a single input, but performs as normal on a validation set (which is unknown to the attacker). We show that in both cases, i.e., in the case of an attack based on adversarial examples and in the case of a stealth attack, the dimensionality of the AI's decision-making space is a major contributor to the AI's susceptibility. For attacks based on adversarial examples, a second crucial parameter is the absence of local concentrations in the data probability distribution, a property known as Smeared Absolute Continuity. According to our findings, robustness to adversarial examples requires either (a) the data distributions in the AI's feature space to have concentrated probability density functions or (b) the dimensionality of the AI's decision variables to be sufficiently small. We also show how to construct stealth attacks on high-dimensional AI systems that are hard to spot unless the validation set is made exponentially large.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
High--Dimensional Brain in a High-Dimensional World: Blessing of Dimensionality
Authors:
Alexander N. Gorban,
Valery A. Makarov,
Ivan Y. Tyukin
Abstract:
High-dimensional data and high-dimensional representations of reality are inherent features of modern Artificial Intelligence systems and applications of machine learning. The well-known phenomenon of the "curse of dimensionality" states: many problems become exponentially difficult in high dimensions. Recently, the other side of the coin, the "blessing of dimensionality", has attracted much atten…
▽ More
High-dimensional data and high-dimensional representations of reality are inherent features of modern Artificial Intelligence systems and applications of machine learning. The well-known phenomenon of the "curse of dimensionality" states: many problems become exponentially difficult in high dimensions. Recently, the other side of the coin, the "blessing of dimensionality", has attracted much attention. It turns out that generic high-dimensional datasets exhibit fairly simple geometric properties. Thus, there is a fundamental tradeoff between complexity and simplicity in high dimensional spaces. Here we present a brief explanatory review of recent ideas, results and hypotheses about the blessing of dimensionality and related simplifying effects relevant to machine learning and neuroscience.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
Blessing of dimensionality at the edge
Authors:
Ivan Y. Tyukin,
Alexander N. Gorban,
Alistair A. McEwan,
Sepehr Meshkinfamfard,
Lixin Tang
Abstract:
In this paper we present theory and algorithms enabling classes of Artificial Intelligence (AI) systems to continuously and incrementally improve with a-priori quantifiable guarantees - or more specifically remove classification errors - over time. This is distinct from state-of-the-art machine learning, AI, and software approaches. Another feature of this approach is that, in the supervised setti…
▽ More
In this paper we present theory and algorithms enabling classes of Artificial Intelligence (AI) systems to continuously and incrementally improve with a-priori quantifiable guarantees - or more specifically remove classification errors - over time. This is distinct from state-of-the-art machine learning, AI, and software approaches. Another feature of this approach is that, in the supervised setting, the computational complexity of training is linear in the number of training samples. At the time of classification, the computational complexity is bounded by few inner product calculations. Moreover, the implementation is shown to be very scalable. This makes it viable for deployment in applications where computational power and memory are limited, such as embedded environments. It enables the possibility for fast on-line optimisation using improved training samples. The approach is based on the concentration of measure effects and stochastic separation theorems and is illustrated with an example on the identification faulty processes in Computer Numerical Control (CNC) milling and with a case study on adaptive removal of false positives in an industrial video surveillance and analytics system.
△ Less
Submitted 10 July, 2020; v1 submitted 30 September, 2019;
originally announced October 2019.
-
Symphony of high-dimensional brain
Authors:
Alexander N. Gorban,
Valeri A. Makarov,
Ivan Y. Tyukin
Abstract:
This paper is the final part of the scientific discussion organised by the Journal "Physics of Life Rviews" about the simplicity revolution in neuroscience and AI. This discussion was initiated by the review paper "The unreasonable effectiveness of small neural ensembles in high-dimensional brain". Phys Life Rev 2019, doi 10.1016/j.plrev.2018.09.005, arXiv:1809.07656. The topics of the discussion…
▽ More
This paper is the final part of the scientific discussion organised by the Journal "Physics of Life Rviews" about the simplicity revolution in neuroscience and AI. This discussion was initiated by the review paper "The unreasonable effectiveness of small neural ensembles in high-dimensional brain". Phys Life Rev 2019, doi 10.1016/j.plrev.2018.09.005, arXiv:1809.07656. The topics of the discussion varied from the necessity to take into account the difference between the theoretical random distributions and "extremely non-random" real distributions and revise the common machine learning theory, to different forms of the curse of dimensionality and high-dimensional pitfalls in neuroscience. V. K{ů}rkov{á}, A. Tozzi and J.F. Peters, R. Quian Quiroga, P. Varona, R. Barrio, G. Kreiman, L. Fortuna, C. van Leeuwen, R. Quian Quiroga, and V. Kreinovich, A.N. Gorban, V.A. Makarov, and I.Y. Tyukin participated in the discussion. In this paper we analyse the symphony of opinions and the possible outcomes of the simplicity revolution for machine learning and neuroscience.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.
-
Correction of AI systems by linear discriminants: Probabilistic foundations
Authors:
A. N. Gorban,
A. Golubkov,
B. Grechuk,
E. M. Mirkes,
I. Y. Tyukin
Abstract:
Artificial Intelligence (AI) systems sometimes make errors and will make errors in the future, from time to time. These errors are usually unexpected, and can lead to dramatic consequences. Intensive development of AI and its practical applications makes the problem of errors more important. Total re-engineering of the systems can create new errors and is not always possible due to the resources i…
▽ More
Artificial Intelligence (AI) systems sometimes make errors and will make errors in the future, from time to time. These errors are usually unexpected, and can lead to dramatic consequences. Intensive development of AI and its practical applications makes the problem of errors more important. Total re-engineering of the systems can create new errors and is not always possible due to the resources involved. The important challenge is to develop fast methods to correct errors without damaging existing skills. We formulated the technical requirements to the 'ideal' correctors. Such correctors include binary classifiers, which separate the situations with high risk of errors from the situations where the AI systems work properly. Surprisingly, for essentially high-dimensional data such methods are possible: simple linear Fisher discriminant can separate the situations with errors from correctly solved tasks even for exponentially large samples. The paper presents the probabilistic basis for fast non-destructive correction of AI systems. A series of new stochastic separation theorems is proven. These theorems provide new instruments for fast non-iterative correction of errors of legacy AI systems. The new approaches become efficient in high-dimensions, for correction of high-dimensional systems in high-dimensional world (i.e. for processing of essentially high-dimensional data by large systems).
△ Less
Submitted 11 November, 2018;
originally announced November 2018.
-
Fast Construction of Correcting Ensembles for Legacy Artificial Intelligence Systems: Algorithms and a Case Study
Authors:
Ivan Y. Tyukin,
Alexander N. Gorban,
Stephen Green,
Danil Prokhorov
Abstract:
This paper presents a technology for simple and computationally efficient improvements of a generic Artificial Intelligence (AI) system, including Multilayer and Deep Learning neural networks. The improvements are, in essence, small network ensembles constructed on top of the existing AI architectures. Theoretical foundations of the technology are based on Stochastic Separation Theorems and the id…
▽ More
This paper presents a technology for simple and computationally efficient improvements of a generic Artificial Intelligence (AI) system, including Multilayer and Deep Learning neural networks. The improvements are, in essence, small network ensembles constructed on top of the existing AI architectures. Theoretical foundations of the technology are based on Stochastic Separation Theorems and the ideas of the concentration of measure. We show that, subject to mild technical assumptions on statistical properties of internal signals in the original AI system, the technology enables instantaneous and computationally efficient removal of spurious and systematic errors with probability close to one on the datasets which are exponentially large in dimension. The method is illustrated with numerical examples and a case study of ten digits recognition from American Sign Language.
△ Less
Submitted 13 February, 2019; v1 submitted 12 October, 2018;
originally announced October 2018.
-
The unreasonable effectiveness of small neural ensembles in high-dimensional brain
Authors:
A. N. Gorban,
V. A. Makarov,
I. Y. Tyukin
Abstract:
Despite the widely-spread consensus on the brain complexity, sprouts of the single neuron revolution emerged in neuroscience in the 1970s. They brought many unexpected discoveries, including grandmother or concept cells and sparse coding of information in the brain.
In machine learning for a long time, the famous curse of dimensionality seemed to be an unsolvable problem. Nevertheless, the idea…
▽ More
Despite the widely-spread consensus on the brain complexity, sprouts of the single neuron revolution emerged in neuroscience in the 1970s. They brought many unexpected discoveries, including grandmother or concept cells and sparse coding of information in the brain.
In machine learning for a long time, the famous curse of dimensionality seemed to be an unsolvable problem. Nevertheless, the idea of the blessing of dimensionality becomes gradually more and more popular. Ensembles of non-interacting or weakly interacting simple units prove to be an effective tool for solving essentially multidimensional problems. This approach is especially useful for one-shot (non-iterative) correction of errors in large legacy artificial intelligence systems.
These simplicity revolutions in the era of complexity have deep fundamental reasons grounded in geometry of multidimensional data spaces. To explore and understand these reasons we revisit the background ideas of statistical physics. In the course of the 20th century they were developed into the concentration of measure theory. New stochastic separation theorems reveal the fine structure of the data clouds.
We review and analyse biological, physical, and mathematical problems at the core of the fundamental question: how can high-dimensional brain organise reliable and fast learning in high-dimensional world of data by simple tools?
Two critical applications are reviewed to exemplify the approach: one-shot correction of errors in intellectual systems and emergence of static and associative memories in ensembles of single neurons.
△ Less
Submitted 10 November, 2018; v1 submitted 20 September, 2018;
originally announced September 2018.
-
How deep should be the depth of convolutional neural networks: a backyard dog case study
Authors:
A. N. Gorban,
E. M. Mirkes,
I. Y. Tyukin
Abstract:
The work concerns the problem of reducing a pre-trained deep neuronal network to a smaller network, with just few layers, whilst retaining the network's functionality on a given task
The proposed approach is motivated by the observation that the aim to deliver the highest accuracy possible in the broadest range of operational conditions, which many deep neural networks models strive to achieve,…
▽ More
The work concerns the problem of reducing a pre-trained deep neuronal network to a smaller network, with just few layers, whilst retaining the network's functionality on a given task
The proposed approach is motivated by the observation that the aim to deliver the highest accuracy possible in the broadest range of operational conditions, which many deep neural networks models strive to achieve, may not necessarily be always needed, desired, or even achievable due to the lack of data or technical constraints. In relation to the face recognition problem, we formulated an example of such a usecase, the `backyard dog' problem. The `backyard dog', implemented by a lean network, should correctly identify members from a limited group of individuals, a `family', and should distinguish between them. At the same time, the network must produce an alarm to an image of an individual who is not in a member of the family. To produce such a network, we propose a shallowing algorithm. The algorithm takes an existing deep learning model on its input and outputs a shallowed version of it. The algorithm is non-iterative and is based on the Advanced Supervised Principal Component Analysis. Performance of the algorithm is assessed in exhaustive numerical experiments. In the above usecase, the `backyard dog' problem, the method is capable of drastically reducing the depth of deep learning neural networks, albeit at the cost of mild performance deterioration.
We developed a simple non-iterative method for shallowing down pre-trained deep networks. The method is generic in the sense that it applies to a broad class of feed-forward networks, and is based on the Advanced Supervise Principal Component Analysis. The method enables generation of families of smaller-size shallower specialized networks tuned for specific operational conditions and tasks from a single larger and more universal legacy network.
△ Less
Submitted 8 December, 2019; v1 submitted 3 May, 2018;
originally announced May 2018.
-
Augmented Artificial Intelligence: a Conceptual Framework
Authors:
Alexander N. Gorban,
Bogdan Grechuk,
Ivan Y. Tyukin
Abstract:
All artificial Intelligence (AI) systems make errors. These errors are unexpected, and differ often from the typical human mistakes ("non-human" errors). The AI errors should be corrected without damage of existing skills and, hopefully, avoiding direct human expertise. This paper presents an initial summary report of project taking new and systematic approach to improving the intellectual effecti…
▽ More
All artificial Intelligence (AI) systems make errors. These errors are unexpected, and differ often from the typical human mistakes ("non-human" errors). The AI errors should be corrected without damage of existing skills and, hopefully, avoiding direct human expertise. This paper presents an initial summary report of project taking new and systematic approach to improving the intellectual effectiveness of the individual AI by communities of AIs. We combine some ideas of learning in heterogeneous multiagent systems with new and original mathematical approaches for non-iterative corrections of errors of legacy AI systems. The mathematical foundations of AI non-destructive correction are presented and a series of new stochastic separation theorems is proven. These theorems provide a new instrument for the development, analysis, and assessment of machine learning methods and algorithms in high dimension. They demonstrate that in high dimensions and even for exponentially large samples, linear classifiers in their classical Fisher's form are powerful enough to separate errors from correct responses with high probability and to provide efficient solution to the non-destructive corrector problem. In particular, we prove some hypotheses formulated in our paper `Stochastic Separation Theorems' (Neural Networks, 94, 255--259, 2017), and answer one general problem published by Donoho and Tanner in 2009.
△ Less
Submitted 24 March, 2018; v1 submitted 6 February, 2018;
originally announced February 2018.
-
Blessing of dimensionality: mathematical foundations of the statistical physics of data
Authors:
A. N. Gorban,
I. Y. Tyukin
Abstract:
The concentration of measure phenomena were discovered as the mathematical background of statistical mechanics at the end of the XIX - beginning of the XX century and were then explored in mathematics of the XX-XXI centuries. At the beginning of the XXI century, it became clear that the proper utilisation of these phenomena in machine learning might transform the curse of dimensionality into the b…
▽ More
The concentration of measure phenomena were discovered as the mathematical background of statistical mechanics at the end of the XIX - beginning of the XX century and were then explored in mathematics of the XX-XXI centuries. At the beginning of the XXI century, it became clear that the proper utilisation of these phenomena in machine learning might transform the curse of dimensionality into the blessing of dimensionality.
This paper summarises recently discovered phenomena of measure concentration which drastically simplify some machine learning problems in high dimension, and allow us to correct legacy artificial intelligence systems. The classical concentration of measure theorems state that i.i.d. random points are concentrated in a thin layer near a surface (a sphere or equators of a sphere, an average or median level set of energy or another Lipschitz function, etc.).
The new stochastic separation theorems describe the thin structure of these thin layers: the random points are not only concentrated in a thin layer but are all linearly separable from the rest of the set, even for exponentially large random sets. The linear functionals for separation of points can be selected in the form of the linear Fisher's discriminant.
All artificial intelligence systems make errors. Non-destructive correction requires separation of the situations (samples) with errors from the samples corresponding to correct behaviour by a simple and robust classifier. The stochastic separation theorems provide us by such classifiers and a non-iterative (one-shot) procedure for learning.
△ Less
Submitted 10 January, 2018;
originally announced January 2018.
-
Knowledge Transfer Between Artificial Intelligence Systems
Authors:
Ivan Y. Tyukin,
Alexander N. Gorban,
Konstantin Sofeikov,
Ilya Romanenko
Abstract:
We consider the fundamental question: how a legacy "student" Artificial Intelligent (AI) system could learn from a legacy "teacher" AI system or a human expert without complete re-training and, most importantly, without requiring significant computational resources. Here "learning" is understood as an ability of one system to mimic responses of the other and vice-versa. We call such learning an Ar…
▽ More
We consider the fundamental question: how a legacy "student" Artificial Intelligent (AI) system could learn from a legacy "teacher" AI system or a human expert without complete re-training and, most importantly, without requiring significant computational resources. Here "learning" is understood as an ability of one system to mimic responses of the other and vice-versa. We call such learning an Artificial Intelligence knowledge transfer. We show that if internal variables of the "student" Artificial Intelligent system have the structure of an $n$-dimensional topological vector space and $n$ is sufficiently high then, with probability close to one, the required knowledge transfer can be implemented by simple cascades of linear functionals. In particular, for $n$ sufficiently large, with probability close to one, the "student" system can successfully and non-iteratively learn $k\ll n$ new examples from the "teacher" (or correct the same number of mistakes) at the cost of two additional inner products. The concept is illustrated with an example of knowledge transfer from a pre-trained convolutional neural network to a simple linear classifier with HOG features.
△ Less
Submitted 14 November, 2017; v1 submitted 5 September, 2017;
originally announced September 2017.
-
Stochastic Separation Theorems
Authors:
A. N. Gorban,
I. Y. Tyukin
Abstract:
The problem of non-iterative one-shot and non-destructive correction of unavoidable mistakes arises in all Artificial Intelligence applications in the real world. Its solution requires robust separation of samples with errors from samples where the system works properly. We demonstrate that in (moderately) high dimension this separation could be achieved with probability close to one by linear dis…
▽ More
The problem of non-iterative one-shot and non-destructive correction of unavoidable mistakes arises in all Artificial Intelligence applications in the real world. Its solution requires robust separation of samples with errors from samples where the system works properly. We demonstrate that in (moderately) high dimension this separation could be achieved with probability close to one by linear discriminants. Surprisingly, separation of a new image from a very large set of known images is almost always possible even in moderately high dimensions by linear functionals, and coefficients of these functionals can be found explicitly. Based on fundamental properties of measure concentration, we show that for $M<a\exp(b{n})$ random $M$-element sets in $\mathbb{R}^n$ are linearly separable with probability $p$, $p>1-\vartheta$, where $1>\vartheta>0$ is a given small constant. Exact values of $a,b>0$ depend on the probability distribution that determines how the random $M$-element sets are drawn, and on the constant $\vartheta$. These {\em stochastic separation theorems} provide a new instrument for the development, analysis, and assessment of machine learning methods and algorithms in high dimension. Theoretical statements are illustrated with numerical examples.
△ Less
Submitted 3 August, 2017; v1 submitted 3 March, 2017;
originally announced March 2017.
-
One-Trial Correction of Legacy AI Systems and Stochastic Separation Theorems
Authors:
Alexander N. Gorban,
Ilya Romanenko,
Richard Burton,
Ivan Y. Tyukin
Abstract:
We consider the problem of efficient "on the fly" tuning of existing, or {\it legacy}, Artificial Intelligence (AI) systems. The legacy AI systems are allowed to be of arbitrary class, albeit the data they are using for computing interim or final decision responses should posses an underlying structure of a high-dimensional topological real vector space. The tuning method that we propose enables d…
▽ More
We consider the problem of efficient "on the fly" tuning of existing, or {\it legacy}, Artificial Intelligence (AI) systems. The legacy AI systems are allowed to be of arbitrary class, albeit the data they are using for computing interim or final decision responses should posses an underlying structure of a high-dimensional topological real vector space. The tuning method that we propose enables dealing with errors without the need to re-train the system. Instead of re-training a simple cascade of perceptron nodes is added to the legacy system. The added cascade modulates the AI legacy system's decisions. If applied repeatedly, the process results in a network of modulating rules "dressing up" and improving performance of existing AI systems. Mathematical rationale behind the method is based on the fundamental property of measure concentration in high dimensional spaces. The method is illustrated with an example of fine-tuning a deep convolutional network that has been pre-trained to detect pedestrians in images.
△ Less
Submitted 13 February, 2019; v1 submitted 3 October, 2016;
originally announced October 2016.
-
Approximation with Random Bases: Pro et Contra
Authors:
Alexander N. Gorban,
Ivan Yu. Tyukin,
Danil V. Prokhorov,
Konstantin I. Sofeikov
Abstract:
In this work we discuss the problem of selecting suitable approximators from families of parameterized elementary functions that are known to be dense in a Hilbert space of functions. We consider and analyze published procedures, both randomized and deterministic, for selecting elements from these families that have been shown to ensure the rate of convergence in $L_2$ norm of order $O(1/N)$, wher…
▽ More
In this work we discuss the problem of selecting suitable approximators from families of parameterized elementary functions that are known to be dense in a Hilbert space of functions. We consider and analyze published procedures, both randomized and deterministic, for selecting elements from these families that have been shown to ensure the rate of convergence in $L_2$ norm of order $O(1/N)$, where $N$ is the number of elements. We show that both randomized and deterministic procedures are successful if additional information about the families of functions to be approximated is provided. In the absence of such additional information one may observe exponential growth of the number of terms needed to approximate the function and/or extreme sensitivity of the outcome of the approximation to parameters. Implications of our analysis for applications of neural networks in modeling and control are illustrated with examples.
△ Less
Submitted 24 October, 2015; v1 submitted 15 June, 2015;
originally announced June 2015.