Search | arXiv e-print repository

Stealth edits for provably fixing or attacking large language models

Authors: Oliver J. Sutton, Qinghua Zhou, Wei Wang, Desmond J. Higham, Alexander N. Gorban, Alexander Bastounis, Ivan Y. Tyukin

Abstract: We reveal new methods and the theoretical foundations of techniques for editing large language models. We also show how the new theory can be used to assess the editability of models and to expose their susceptibility to previously unknown malicious attacks. Our theoretical approach shows that a single metric (a specific measure of the intrinsic dimensionality of the model's features) is fundament… ▽ More We reveal new methods and the theoretical foundations of techniques for editing large language models. We also show how the new theory can be used to assess the editability of models and to expose their susceptibility to previously unknown malicious attacks. Our theoretical approach shows that a single metric (a specific measure of the intrinsic dimensionality of the model's features) is fundamental to predicting the success of popular editing approaches, and reveals new bridges between disparate families of editing methods. We collectively refer to these approaches as stealth editing methods, because they aim to directly and inexpensively update a model's weights to correct the model's responses to known hallucinating prompts without otherwise affecting the model's behaviour, without requiring retraining. By carefully applying the insight gleaned from our theoretical investigation, we are able to introduce a new network block -- named a jet-pack block -- which is optimised for highly selective model editing, uses only standard network operations, and can be inserted into existing networks. The intrinsic dimensionality metric also determines the vulnerability of a language model to a stealth attack: a small change to a model's weights which changes its response to a single attacker-chosen prompt. Stealth attacks do not require access to or knowledge of the model's training data, therefore representing a potent yet previously unrecognised threat to redistributed foundation models. They are computationally simple enough to be implemented in malware in many cases. Extensive experimental results illustrate and support the method and its theoretical underpinnings. Demos and source code for editing language models are available at https://github.com/qinghua-zhou/stealth-edits. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 24 pages, 9 figures. Open source implementation: https://github.com/qinghua-zhou/stealth-edits

MSC Class: 68T07; 68T50; 68W40 ACM Class: I.2.7; F.2.0

arXiv:2402.06563 [pdf]

doi 10.1109/BigData59044.2023.10386194

What is Hiding in Medicine's Dark Matter? Learning with Missing Data in Medical Practices

Authors: Neslihan Suzen, Evgeny M. Mirkes, Damian Roland, Jeremy Levesley, Alexander N. Gorban, Tim J. Coats

Abstract: Electronic patient records (EPRs) produce a wealth of data but contain significant missing information. Understanding and handling this missing data is an important part of clinical data analysis and if left unaddressed could result in bias in analysis and distortion in critical conclusions. Missing data may be linked to health care professional practice patterns and imputation of missing data can… ▽ More Electronic patient records (EPRs) produce a wealth of data but contain significant missing information. Understanding and handling this missing data is an important part of clinical data analysis and if left unaddressed could result in bias in analysis and distortion in critical conclusions. Missing data may be linked to health care professional practice patterns and imputation of missing data can increase the validity of clinical decisions. This study focuses on statistical approaches for understanding and interpreting the missing data and machine learning based clinical data imputation using a single centre's paediatric emergency data and the data from UK's largest clinical audit for traumatic injury database (TARN). In the study of 56,961 data points related to initial vital signs and observations taken on children presenting to an Emergency Department, we have shown that missing data are likely to be non-random and how these are linked to health care professional practice patterns. We have then examined 79 TARN fields with missing values for 5,791 trauma cases. Singular Value Decomposition (SVD) and k-Nearest Neighbour (kNN) based missing data imputation methods are used and imputation results against the original dataset are compared and statistically tested. We have concluded that the 1NN imputer is the best imputation which indicates a usual pattern of clinical decision making: find the most similar patients and take their attributes as imputation. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 8 pages

Journal ref: 2023 IEEE International Conference on Big Data (BigData), 4979-4986

arXiv:2402.00899 [pdf, other]

Weakly Supervised Learners for Correction of AI Errors with Provable Performance Guarantees

Authors: Ivan Y. Tyukin, Tatiana Tyukina, Daniel van Helden, Zedong Zheng, Evgeny M. Mirkes, Oliver J. Sutton, Qinghua Zhou, Alexander N. Gorban, Penelope Allison

Abstract: We present a new methodology for handling AI errors by introducing weakly supervised AI error correctors with a priori performance guarantees. These AI correctors are auxiliary maps whose role is to moderate the decisions of some previously constructed underlying classifier by either approving or rejecting its decisions. The rejection of a decision can be used as a signal to suggest abstaining fro… ▽ More We present a new methodology for handling AI errors by introducing weakly supervised AI error correctors with a priori performance guarantees. These AI correctors are auxiliary maps whose role is to moderate the decisions of some previously constructed underlying classifier by either approving or rejecting its decisions. The rejection of a decision can be used as a signal to suggest abstaining from making a decision. A key technical focus of the work is in providing performance guarantees for these new AI correctors through bounds on the probabilities of incorrect decisions. These bounds are distribution agnostic and do not rely on assumptions on the data dimension. Our empirical example illustrates how the framework can be applied to improve the performance of an image classifier in a challenging real-world task where training data are scarce. △ Less

Submitted 13 February, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

MSC Class: 68T05; 68T37

arXiv:2311.13917 [pdf]

doi 10.1016/j.cnsns.2024.107906

Exploring the impact of social stress on the adaptive dynamics of COVID-19: Ty** the behavior of naïve populations faced with epidemics

Authors: Innokentiy Kastalskiy, Andrei Zinovyev, Evgeny Mirkes, Victor Kazantsev, Alexander N. Gorban

Abstract: In the context of natural disasters, human responses inevitably intertwine with natural factors. The COVID-19 pandemic, as a significant stress factor, has brought to light profound variations among different countries in terms of their adaptive dynamics in addressing the spread of infection outbreaks across different regions. This emphasizes the crucial role of cultural characteristics in natural… ▽ More In the context of natural disasters, human responses inevitably intertwine with natural factors. The COVID-19 pandemic, as a significant stress factor, has brought to light profound variations among different countries in terms of their adaptive dynamics in addressing the spread of infection outbreaks across different regions. This emphasizes the crucial role of cultural characteristics in natural disaster analysis. The theoretical understanding of large-scale epidemics primarily relies on mean-field kinetic models. However, conventional SIR-like models failed to fully explain the observed phenomena at the onset of the COVID-19 outbreak. These phenomena included the unexpected cessation of exponential growth, the reaching of plateaus, and the occurrence of multi-wave dynamics. In situations where an outbreak of a highly virulent and unfamiliar infection arises, it becomes crucial to respond swiftly at a non-medical level to mitigate the negative socio-economic impact. Here we present a theoretical examination of the first wave of the epidemic based on a simple SIRSS model (SIR with Social Stress). We conduct an analysis of the socio-cultural features of naïve population behaviors across various countries worldwide. The unique characteristics of each country/territory are encapsulated in only a few constants within our model, derived from the fitted COVID-19 statistics. These constants also reflect the societal response dynamics to the external stress factor, underscoring the importance of studying the mutual behavior of humanity and natural factors during global social disasters. Based on these distinctive characteristics of specific regions, local authorities can optimize their strategies to effectively combat epidemics until vaccines are developed. △ Less

Submitted 12 February, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

Comments: 29 pages, 16 figures, 1 table, 2 appendices

Journal ref: Communications in Nonlinear Science and Numerical Simulation, Volume 132, May 2024, 107906

arXiv:2311.07579 [pdf, other]

doi 10.1007/978-3-031-44207-0_43

Relative intrinsic dimensionality is intrinsic to learning

Authors: Oliver J. Sutton, Qinghua Zhou, Alexander N. Gorban, Ivan Y. Tyukin

Abstract: High dimensional data can have a surprising property: pairs of data points may be easily separated from each other, or even from arbitrary subsets, with high probability using just simple linear classifiers. However, this is more of a rule of thumb than a reliable property as high dimensionality alone is neither necessary nor sufficient for successful learning. Here, we introduce a new notion of t… ▽ More High dimensional data can have a surprising property: pairs of data points may be easily separated from each other, or even from arbitrary subsets, with high probability using just simple linear classifiers. However, this is more of a rule of thumb than a reliable property as high dimensionality alone is neither necessary nor sufficient for successful learning. Here, we introduce a new notion of the intrinsic dimension of a data distribution, which precisely captures the separability properties of the data. For this intrinsic dimension, the rule of thumb above becomes a law: high intrinsic dimension guarantees highly separable data. We extend this notion to that of the relative intrinsic dimension of two data distributions, which we show provides both upper and lower bounds on the probability of successfully learning and generalising in a binary classification problem △ Less

Submitted 10 October, 2023; originally announced November 2023.

Comments: 12 pages, 5 figures

MSC Class: 68T09; 68T10

Journal ref: Artificial Neural Networks and Machine Learning ICANN 2023. Lecture Notes in Computer Science, vol 14254, pp 516-529. Springer, Cham

arXiv:2309.07072 [pdf, ps, other]

The Boundaries of Verifiable Accuracy, Robustness, and Generalisation in Deep Learning

Authors: Alexander Bastounis, Alexander N. Gorban, Anders C. Hansen, Desmond J. Higham, Danil Prokhorov, Oliver Sutton, Ivan Y. Tyukin, Qinghua Zhou

Abstract: In this work, we assess the theoretical limitations of determining guaranteed stability and accuracy of neural networks in classification tasks. We consider classical distribution-agnostic framework and algorithms minimising empirical risks and potentially subjected to some weights regularisation. We show that there is a large family of tasks for which computing and verifying ideal stable and accu… ▽ More In this work, we assess the theoretical limitations of determining guaranteed stability and accuracy of neural networks in classification tasks. We consider classical distribution-agnostic framework and algorithms minimising empirical risks and potentially subjected to some weights regularisation. We show that there is a large family of tasks for which computing and verifying ideal stable and accurate neural networks in the above settings is extremely challenging, if at all possible, even when such ideal solutions exist within the given class of neural architectures. △ Less

Submitted 13 September, 2023; originally announced September 2023.

MSC Class: 68T07; 68T05

arXiv:2309.03665 [pdf, other]

How adversarial attacks can disrupt seemingly stable accurate classifiers

Authors: Oliver J. Sutton, Qinghua Zhou, Ivan Y. Tyukin, Alexander N. Gorban, Alexander Bastounis, Desmond J. Higham

Abstract: Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show th… ▽ More Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability -- notably the simultaneous susceptibility of the (otherwise accurate) model to easily constructed adversarial attacks, and robustness to random perturbations of the input data. We confirm that the same phenomena are directly observed in practical neural networks trained on standard image classification problems, where even large additive random noise fails to trigger the adversarial instability of the network. A surprising takeaway is that even small margins separating a classifier's decision surface from training and testing data can hide adversarial susceptibility from being detected using randomly sampled perturbations. Counterintuitively, using additive noise during training or testing is therefore inefficient for eradicating or detecting adversarial examples, and more demanding adversarial training is required. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 11 pages, 8 figures, additional supplementary materials

arXiv:2306.04745 [pdf, other]

3D Human Keypoints Estimation From Point Clouds in the Wild Without Human Labels

Authors: Zhenzhen Weng, Alexander S. Gorban, **gwei Ji, Mahyar Najibi, Yin Zhou, Dragomir Anguelov

Abstract: Training a 3D human keypoint detector from point clouds in a supervised manner requires large volumes of high quality labels. While it is relatively easy to capture large amounts of human point clouds, annotating 3D keypoints is expensive, subjective, error prone and especially difficult for long-tail cases (pedestrians with rare poses, scooterists, etc.). In this work, we propose GC-KPL - Geometr… ▽ More Training a 3D human keypoint detector from point clouds in a supervised manner requires large volumes of high quality labels. While it is relatively easy to capture large amounts of human point clouds, annotating 3D keypoints is expensive, subjective, error prone and especially difficult for long-tail cases (pedestrians with rare poses, scooterists, etc.). In this work, we propose GC-KPL - Geometry Consistency inspired Key Point Leaning, an approach for learning 3D human joint locations from point clouds without human labels. We achieve this by our novel unsupervised loss formulations that account for the structure and movement of the human body. We show that by training on a large training set from Waymo Open Dataset without any human annotated keypoints, we are able to achieve reasonable performance as compared to the fully supervised approach. Further, the backbone benefits from the unsupervised training and is useful in downstream fewshot learning of keypoints, where fine-tuning on only 10 percent of the labeled training data gives comparable performance to fine-tuning on the entire set. We demonstrated that GC-KPL outperforms by a large margin over SoTA when trained on entire dataset and efficiently leverages large volumes of unlabeled data. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: CVPR 2023

arXiv:2305.07624 [pdf, other]

Agile gesture recognition for capacitive sensing devices: adapting on-the-job

Authors: Ying Liu, Liucheng Guo, Valeri A. Makarov, Yuxiang Huang, Alexander Gorban, Evgeny Mirkes, Ivan Y. Tyukin

Abstract: Automated hand gesture recognition has been a focus of the AI community for decades. Traditionally, work in this domain revolved largely around scenarios assuming the availability of the flow of images of the user hands. This has partly been due to the prevalence of camera-based devices and the wide availability of image data. However, there is growing demand for gesture recognition technology tha… ▽ More Automated hand gesture recognition has been a focus of the AI community for decades. Traditionally, work in this domain revolved largely around scenarios assuming the availability of the flow of images of the user hands. This has partly been due to the prevalence of camera-based devices and the wide availability of image data. However, there is growing demand for gesture recognition technology that can be implemented on low-power devices using limited sensor data instead of high-dimensional inputs like hand images. In this work, we demonstrate a hand gesture recognition system and method that uses signals from capacitive sensors embedded into the etee hand controller. The controller generates real-time signals from each of the wearer five fingers. We use a machine learning technique to analyse the time series signals and identify three features that can represent 5 fingers within 500 ms. The analysis is composed of a two stage training strategy, including dimension reduction through principal component analysis and classification with K nearest neighbour. Remarkably, we found that this combination showed a level of performance which was comparable to more advanced methods such as supervised variational autoencoder. The base system can also be equipped with the capability to learn from occasional errors by providing it with an additional adaptive error correction mechanism. The results showed that the error corrector improve the classification performance in the base system without compromising its performance. The system requires no more than 1 ms of computing time per input sample, and is smaller than deep neural networks, demonstrating the feasibility of agile gesture recognition systems based on this technology. △ Less

Submitted 12 May, 2023; originally announced May 2023.

arXiv:2303.16645 [pdf, other]

doi 10.1134/S106377372211007X

Study of the X-ray Pulsar IGR J21343+4738 based on NuSTAR, Swift, and SRG data

Authors: A. S. Gorban, S. V. Molkov, A. A. Lutovinov, A. N. Semena

Abstract: We present the results of our study of the X-ray pulsar IGR J21343+4738 based on NuSTAR, Swift, and SRG observations in the wide energy range 0.3 - 79 keV. The absence of absorption features in the energy spectra of the source, both averaged and phase-resolved ones, has allowed us to estimate the upper and lower limits on the magnetic field of the neutron star in the binary system,… ▽ More We present the results of our study of the X-ray pulsar IGR J21343+4738 based on NuSTAR, Swift, and SRG observations in the wide energy range 0.3 - 79 keV. The absence of absorption features in the energy spectra of the source, both averaged and phase-resolved ones, has allowed us to estimate the upper and lower limits on the magnetic field of the neutron star in the binary system, $B<2.5\times10^{11}$G and $B>3.4 \times 10^{12}$G, respectively. The spectral and timing analyses have shown that IGR J21343+4738 has all properties of a quasi-persistent X-ray pulsar with a pulsation period of $322.71\pm{0.04}$s and a luminosity $L_{x} \simeq3.3$ $\times10^{35}$erg s$^{-1}$. The analysis of the long-term variability of the object in X-rays has confirmed the possible orbital period of the binary system $\sim 34.3$ days previously detected in the optical range. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: 8 pages, 4 figures, 1 table

Journal ref: Astronomy Letters, 2022, Vol. 48, No. 12, pp. 798-805

arXiv:2212.07729 [pdf, other]

HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving

Authors: Andrei Zanfir, Mihai Zanfir, Alexander Gorban, **gwei Ji, Yin Zhou, Dragomir Anguelov, Cristian Sminchisescu

Abstract: Autonomous driving is an exciting new industry, posing important research questions. Within the perception module, 3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians. While hardware systems and sensors have dramatically improved over the decades -- with cars potentially boasting comp… ▽ More Autonomous driving is an exciting new industry, posing important research questions. Within the perception module, 3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians. While hardware systems and sensors have dramatically improved over the decades -- with cars potentially boasting complex LiDAR and vision systems and with a growing expansion of the available body of dedicated datasets for this newly available information -- not much work has been done to harness these novel signals for the core problem of 3D human pose estimation. Our method, which we coin HUM3DIL (HUMan 3D from Images and LiDAR), efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin. It is a fast and compact model for onboard deployment. Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages. Quantitative experiments on the Waymo Open Dataset support these claims, where we achieve state-of-the-art results on the task of 3D pose estimation. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Comments: Published at the 6th Conference on Robot Learning (CoRL 2022), Auckland, New Zealand

arXiv:2211.03607 [pdf, other]

Towards a mathematical understanding of learning from few examples with nonlinear feature maps

Authors: Oliver J. Sutton, Alexander N. Gorban, Ivan Y. Tyukin

Abstract: We consider the problem of data classification where the training set consists of just a few data points. We explore this phenomenon mathematically and reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities. The main thrust of our analysis is to reveal the influence on the model's… ▽ More We consider the problem of data classification where the training set consists of just a few data points. We explore this phenomenon mathematically and reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities. The main thrust of our analysis is to reveal the influence on the model's generalisation capabilities of nonlinear feature transformations map** the original data into high, and possibly infinite, dimensional spaces. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: 18 pages, 8 figures

MSC Class: 68Q32; 68T05

arXiv:2208.13290 [pdf, other]

doi 10.3390/e25010033

Domain Adaptation Principal Component Analysis: base linear method for learning with out-of-distribution data

Authors: Evgeny M Mirkes, Jonathan Bac, Aziz Fouché, Sergey V. Stasenko, Andrei Zinovyev, Alexander N. Gorban

Abstract: Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets red into a common space in which the source dataset is informative for training while the divergence between s… ▽ More Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets red into a common space in which the source dataset is informative for training while the divergence between source and target is minimized. The most popular domain adaptation solutions are based on training neural networks that combine classification and adversarial learning modules, frequently making them both data-hungry and difficult to train. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) that identifies a linear reduced data representation useful for solving the domain adaptation task. DAPCA algorithm introduces positive and negative weights between pairs of data points, and generalizes the supervised extension of principal component analysis. DAPCA is an iterative algorithm that solves a simple quadratic optimization problem at each iteration. The convergence of the algorithm is guaranteed, and the number of iterations is small in practice. We validate the suggested algorithm on previously proposed benchmarks for solving the domain adaptation task. We also show the benefit of using DAPCA in analyzing the single-cell omics datasets in biomedical applications. Overall, DAPCA can serve as a practical preprocessing step in many machine learning applications leading to reduced dataset representations, taking into account possible divergence between source and target domains. △ Less

Submitted 15 December, 2022; v1 submitted 28 August, 2022; originally announced August 2022.

Journal ref: Entropy, 25(1), 33, 2023

arXiv:2207.00330 [pdf, ps, other]

doi 10.1016/j.plrev.2021.10.002

It is useful to analyze correlation graphs

Authors: A. N. Gorban, T. A. Tyukina, L. I. Pokidysheva, E. V. Smirnova

Abstract: In 1987, we analyzed the changes in correlation graphs between various features of the organism during stress and adaptation. After 33 years of research of many authors, discoveries and rediscoveries, we can say with complete confidence: It is useful to analyze correlation graphs. In addition, we should add that the concept of adaptability ('adaptation energy') introduced by Selye is useful, espec… ▽ More In 1987, we analyzed the changes in correlation graphs between various features of the organism during stress and adaptation. After 33 years of research of many authors, discoveries and rediscoveries, we can say with complete confidence: It is useful to analyze correlation graphs. In addition, we should add that the concept of adaptability ('adaptation energy') introduced by Selye is useful, especially if it is supplemented by 'adaptation entropy' and free energy, as well as an analysis of limiting factors. Our review of these topics, Dynamic and Thermodynamic Adaptation Models" (Phys Life Rev, 2021, arXiv:2103.01959 [q-bio.OT]), attracted many comments from leading experts, with new ideas and new problems, from the dynamics of aging and the training of athletes to single-cell omics. Methodological backgrounds, like free energy analysis, were also discussed in depth. In this article, we provide an analytical overview of twelve commenting papers and some related publications. △ Less

Submitted 1 July, 2022; originally announced July 2022.

Comments: Mini-review, 9 pages, 62 bibliography

Journal ref: Physics of Life Reviews, Volume 40, March 2022, Pages 15-23

arXiv:2205.15696 [pdf]

An Informational Space Based Semantic Analysis for Scientific Texts

Authors: Neslihan Suzen, Alexander N. Gorban, Jeremy Levesley, Evgeny M. Mirkes

Abstract: One major problem in Natural Language Processing is the automatic analysis and representation of human language. Human language is ambiguous and deeper understanding of semantics and creating human-to-machine interaction have required an effort in creating the schemes for act of communication and building common-sense knowledge bases for the 'meaning' in texts. This paper introduces computational… ▽ More One major problem in Natural Language Processing is the automatic analysis and representation of human language. Human language is ambiguous and deeper understanding of semantics and creating human-to-machine interaction have required an effort in creating the schemes for act of communication and building common-sense knowledge bases for the 'meaning' in texts. This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts. Computational methods extracting semantic feature are used to analyse the relations between texts of messages and 'representations of situations' for a newly created large collection of scientific texts, Leicester Scientific Corpus. The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties, with the vectors of some attributes: a list of scientific subject categories that the text belongs to. First, this paper introduces 'Meaning Space' in which the informational representation of the meaning is extracted from the occurrence of the word in texts across the scientific categories, i.e., the meaning of a word is represented by a vector of Relative Information Gain about the subject categories. Then, the meaning space is statistically analysed for Leicester Scientific Dictionary-Core and we investigate 'Principal Components of the Meaning' to describe the adequate dimensions of the meaning. The research in this paper conducts the base for the geometric representation of the meaning of texts. △ Less

Submitted 31 May, 2022; originally announced May 2022.

Comments: 19 pages. arXiv admin note: substantial text overlap with arXiv:2009.08859, arXiv:2004.13717

Journal ref: Computer Science & Information Technology, volume 12, number 08, pp. 81-99, 2022. CS & IT - CSCP 2022

arXiv:2203.16935 [pdf, other]

Learning from few examples with nonlinear feature maps

Authors: Ivan Y. Tyukin, Oliver Sutton, Alexander N. Gorban

Abstract: In this work we consider the problem of data classification in post-classical settings were the number of training examples consists of mere few data points. We explore the phenomenon and reveal key relationships between dimensionality of AI model's feature space, non-degeneracy of data distributions, and the model's generalisation capabilities. The main thrust of our present analysis is on the in… ▽ More In this work we consider the problem of data classification in post-classical settings were the number of training examples consists of mere few data points. We explore the phenomenon and reveal key relationships between dimensionality of AI model's feature space, non-degeneracy of data distributions, and the model's generalisation capabilities. The main thrust of our present analysis is on the influence of nonlinear feature transformations map** original data into higher- and possibly infinite-dimensional spaces on the resulting model's generalisation capabilities. Subject to appropriate assumptions, we establish new relationships between intrinsic dimensions of the transformed data and the probabilities to learn successfully from few presentations. △ Less

Submitted 31 March, 2022; originally announced March 2022.

MSC Class: 68T05; 68Q32

arXiv:2203.16687 [pdf, other]

Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation

Authors: Qinghua Zhou, Alexander N. Gorban, Evgeny M. Mirkes, Jonathan Bac, Andrei Zinovyev, Ivan Y. Tyukin

Abstract: Finding best architectures of learning machines, such as deep neural networks, is a well-known technical and theoretical challenge. Recent work by Mellor et al (2021) showed that there may exist correlations between the accuracies of trained networks and the values of some easily computable measures defined on randomly initialised networks which may enable to search tens of thousands of neural arc… ▽ More Finding best architectures of learning machines, such as deep neural networks, is a well-known technical and theoretical challenge. Recent work by Mellor et al (2021) showed that there may exist correlations between the accuracies of trained networks and the values of some easily computable measures defined on randomly initialised networks which may enable to search tens of thousands of neural architectures without training. Mellor et al used the Hamming distance evaluated over all ReLU neurons as such a measure. Motivated by these findings, in our work, we ask the question of the existence of other and perhaps more principled measures which could be used as determinants of success of a given neural architecture. In particular, we examine, if the dimensionality and quasi-orthogonality of neural networks' feature space could be correlated with the network's performance after training. We showed, using the setup as in Mellor et al, that dimensionality and quasi-orthogonality may jointly serve as network's performance discriminants. In addition to offering new opportunities to accelerate neural architecture search, our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces: data dimension and quasi-orthogonality. △ Less

Submitted 30 March, 2022; originally announced March 2022.

MSC Class: 68T05; 68Q32

arXiv:2203.02995 [pdf, other]

Changes in the nature of the spectral continuum and the stability of the cyclotron line in the X-ray pulsar GRO J2058+42

Authors: A. S. Gorban, S. V. Molkov, S. S. Tsygankov, A. A. Mushtukov, A. A. Lutovinov

Abstract: The results of the broadband spectral and timing study of the transient X-ray pulsar GRO J2058+42 in a wide energy range at a low luminosity $L_{x} \simeq 2.5\times 10^{36}$ erg s$^{-1}$ are reported. The data revealed that the pulse profile and pulse fraction of the source are significantly changed in comparison with previous NuSTAR observations, when the source was ten times brighter. The cyclot… ▽ More The results of the broadband spectral and timing study of the transient X-ray pulsar GRO J2058+42 in a wide energy range at a low luminosity $L_{x} \simeq 2.5\times 10^{36}$ erg s$^{-1}$ are reported. The data revealed that the pulse profile and pulse fraction of the source are significantly changed in comparison with previous NuSTAR observations, when the source was ten times brighter. The cyclotron absorption line at $\sim10$ keV in the narrow phase interval is consistent with the high state observations. Spectral analysis showed that at high luminosities $L_{x}\simeq (2.7-3.2)\times 10^{37}$ erg s$^{-1}$ the spectrum has a shape typical of accreting pulsars, while when the luminosity drops by about an order of magnitude, to $2.5\times 10^{36}$ erg s$^{-1}$ a two-component model is necessary to its describing. This behavior fits into a model in which the low-energy part of the spectrum is formed in a hot spot, and the high-energy part is formed as a result of resonant Compton scattering by incident matter in an accretion channel above the surface of a neutron star. △ Less

Submitted 6 March, 2022; originally announced March 2022.

Comments: 16 pages, 6 figures, 3 tables

Journal ref: Astronomy Letters, 2022, Vol. 48, No. 4

arXiv:2202.07218 [pdf]

doi 10.1109/TNNLS.2023.33354

Situation-based memory in spiking neuron-astrocyte network

Authors: Susanna Gordleeva, Yuliya A. Tsybina, Mikhail I. Krivonosov, Ivan Y. Tyukin, Victor B. Kazantsev, Alexey A. Zaikin, Alexander N. Gorban

Abstract: Mammalian brains operate in a very special surrounding: to survive they have to react quickly and effectively to the pool of stimuli patterns previously recognized as danger. Many learning tasks often encountered by living organisms involve a specific set-up centered around a relatively small set of patterns presented in a particular environment. For example, at a party, people recognize friends i… ▽ More Mammalian brains operate in a very special surrounding: to survive they have to react quickly and effectively to the pool of stimuli patterns previously recognized as danger. Many learning tasks often encountered by living organisms involve a specific set-up centered around a relatively small set of patterns presented in a particular environment. For example, at a party, people recognize friends immediately, without deep analysis, just by seeing a fragment of their clothes. This set-up with reduced "ontology" is referred to as a "situation". Situations are usually local in space and time. In this work, we propose that neuron-astrocyte networks provide a network topology that is effectively adapted to accommodate situation-based memory. In order to illustrate this, we numerically simulate and analyze a well-established model of a neuron-astrocyte network, which is subjected to stimuli conforming to the situation-driven environment. Three pools of stimuli patterns are considered: external patterns, patterns from the situation associative pool regularly presented to the network and learned by the network, and patterns already learned and remembered by astrocytes. Patterns from the external world are added to and removed from the associative pool. Then we show that astrocytes are structurally necessary for an effective function in such a learning and testing set-up. To demonstrate this we present a novel neuromorphic model for short-term memory implemented by a two-net spiking neural-astrocytic network. Our results show that such a system tested on synthesized data with selective astrocyte-induced modulation of neuronal activity provides an enhancement of retrieval quality in comparison to standard spiking neural networks trained via Hebbian plasticity only. We argue that the proposed set-up may offer a new way to analyze, model, and understand neuromorphic artificial intelligence systems. △ Less

Submitted 15 February, 2022; originally announced February 2022.

Comments: 38 pages, 11 figures, 4 tables

Journal ref: IEEE Transactions on Neural Networks and Learning Systems, 04 December 2023, Early Access

arXiv:2112.12141 [pdf, other]

Multi-modal 3D Human Pose Estimation with 2D Weak Supervision in Autonomous Driving

Authors: **gxiao Zheng, Xinwei Shi, Alexander Gorban, Junhua Mao, Yang Song, Charles R. Qi, Ting Liu, Visesh Chari, Andre Cornman, Yin Zhou, Congcong Li, Dragomir Anguelov

Abstract: 3D human pose estimation (HPE) in autonomous vehicles (AV) differs from other use cases in many factors, including the 3D resolution and range of data, absence of dense depth maps, failure modes for LiDAR, relative location between the camera and LiDAR, and a high bar for estimation accuracy. Data collected for other use cases (such as virtual reality, gaming, and animation) may therefore not be u… ▽ More 3D human pose estimation (HPE) in autonomous vehicles (AV) differs from other use cases in many factors, including the 3D resolution and range of data, absence of dense depth maps, failure modes for LiDAR, relative location between the camera and LiDAR, and a high bar for estimation accuracy. Data collected for other use cases (such as virtual reality, gaming, and animation) may therefore not be usable for AV applications. This necessitates the collection and annotation of a large amount of 3D data for HPE in AV, which is time-consuming and expensive. In this paper, we propose one of the first approaches to alleviate this problem in the AV setting. Specifically, we propose a multi-modal approach which uses 2D labels on RGB images as weak supervision to perform 3D HPE. The proposed multi-modal architecture incorporates LiDAR and camera inputs with an auxiliary segmentation branch. On the Waymo Open Dataset, our approach achieves a 22% relative improvement over camera-only 2D HPE baseline, and 6% improvement over LiDAR-only model. Finally, careful ablation studies and parts based analysis illustrate the advantages of each of our contributions. △ Less

Submitted 22 December, 2021; originally announced December 2021.

arXiv:2110.05337 [pdf]

doi 10.1134/S1063773721060049

Study of the X-ray Pulsar XTE J1946+274 with NuSTAR

Authors: A. S. Gorban, S. V. Molkov, S. S. Tsygankov, A. A. Lutovinov

Abstract: We present the results of our spectral and timing analysis of the emission from the transient X-ray pulsar XTE J1946+274 based on the simultaneous NuSTAR and Swift/XRT observations in the broad energy range 0.3-79 keV carried out in June 2018 during a bright outburst. Our spectral analysis has confirmed the presence of a cyclotron absorption line at an energy $\sim38$ keV in both averaged and phas… ▽ More We present the results of our spectral and timing analysis of the emission from the transient X-ray pulsar XTE J1946+274 based on the simultaneous NuSTAR and Swift/XRT observations in the broad energy range 0.3-79 keV carried out in June 2018 during a bright outburst. Our spectral analysis has confirmed the presence of a cyclotron absorption line at an energy $\sim38$ keV in both averaged and phase-resolved spectra of the source. Phase-resolved spectroscopy has also allowed the variation in spectral parameters with neutron star rotation phase, whose period is $\simeq15.755$ s, to be studied. The energy of the cyclotron line is shown to change significantly (from $\simeq34$ to $\simeq39$ keV) on the scale of a pulse, with the line width and optical depth also exhibiting variability. The observed behavior of the cyclotron line parameters can be interpreted in terms of the model of the reflection of emission from a small accretion column (the source's luminosity at the time of its observations was $\sim 3 \times 10^{37}$ erg s$^{-1}$) off the neutron star surface. The equivalent width of the iron line has been found to also change significantly with pulse phase. The time delay between the pulse and equivalent width profiles can be explained by the reflection of neutron star emission from the outer accretion disk regions. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: 19 pages, 7 figures, 1 table

Journal ref: Astronomy Letters, 2021, Vol. 47, No. 6, pp. 390-401

arXiv:2109.02596 [pdf, other]

doi 10.3390/e23101368

Scikit-dimension: a Python package for intrinsic dimension estimation

Authors: Jonathan Bac, Evgeny M. Mirkes, Alexander N. Gorban, Ivan Tyukin, Andrei Zinovyev

Abstract: Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces \texttt{scikit-dimension}, an open-source P… ▽ More Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces \texttt{scikit-dimension}, an open-source Python package for intrinsic dimension estimation. \texttt{scikit-dimension} package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface to evaluate global and local intrinsic dimension, as well as generators of synthetic toy and benchmark datasets widespread in the literature. The package is developed with tools assessing the code quality, coverage, unit testing and continuous integration. We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data. The source code is available from https://github.com/j-bac/scikit-dimension , the documentation is available from https://scikit-dimension.readthedocs.io . △ Less

Submitted 6 September, 2021; originally announced September 2021.

Comments: 12 pages, 4 figures, 1 table

Journal ref: Entropy, 2021, 23(10), 1368

arXiv:2108.13414 [pdf, other]

Astrocytes mediate analogous memory in a multi-layer neuron-astrocytic network

Authors: Yuliya Tsybina, Innokentiy Kastalskiy, Mikhail Krivonosov, Alexey Zaikin, Victor Kazantsev, Alexander Gorban, Susanna Gordleeva

Abstract: Modeling the neuronal processes underlying short-term working memory remains the focus of many theoretical studies in neuroscience. Here we propose a mathematical model of spiking neuron network (SNN) demonstrating how a piece of information can be maintained as a robust activity pattern for several seconds then completely disappear if no other stimuli come. Such short-term memory traces are prese… ▽ More Modeling the neuronal processes underlying short-term working memory remains the focus of many theoretical studies in neuroscience. Here we propose a mathematical model of spiking neuron network (SNN) demonstrating how a piece of information can be maintained as a robust activity pattern for several seconds then completely disappear if no other stimuli come. Such short-term memory traces are preserved due to the activation of astrocytes accompanying the SNN. The astrocytes exhibit calcium transients at a time scale of seconds. These transients further modulate the efficiency of synaptic transmission and, hence, the firing rate of neighboring neurons at diverse timescales through gliotransmitter release. We show how such transients continuously encode frequencies of neuronal discharges and provide robust short-term storage of analogous information. This kind of short-term memory can keep operative information for seconds, then completely forget it to avoid overlap** with forthcoming patterns. The SNN is inter-connected with the astrocytic layer by local inter-cellular diffusive connections. The astrocytes are activated only when the neighboring neurons fire quite synchronously, e.g. when an information pattern is loaded. For illustration, we took greyscale photos of people's faces where the grey level encoded the level of applied current stimulating the neurons. The astrocyte feedback modulates (facilitates) synaptic transmission by varying the frequency of neuronal firing. We show how arbitrary patterns can be loaded, then stored for a certain interval of time, and retrieved if the appropriate clue pattern is applied to the input. △ Less

Submitted 31 August, 2021; originally announced August 2021.

Comments: 18 pages, 6 figures, 1 table, Appendix

arXiv:2106.15416 [pdf, other]

doi 10.3390/e23081090

High-dimensional separability for one- and few-shot learning

Authors: Alexander N. Gorban, Bogdan Grechuk, Evgeny M. Mirkes, Sergey V. Stasenko, Ivan Y. Tyukin

Abstract: This work is driven by a practical question: corrections of Artificial Intelligence (AI) errors. These corrections should be quick and non-iterative. To solve this problem without modification of a legacy AI system, we propose special `external' devices, correctors. Elementary correctors consist of two parts, a classifier that separates the situations with high risk of error from the situations in… ▽ More This work is driven by a practical question: corrections of Artificial Intelligence (AI) errors. These corrections should be quick and non-iterative. To solve this problem without modification of a legacy AI system, we propose special `external' devices, correctors. Elementary correctors consist of two parts, a classifier that separates the situations with high risk of error from the situations in which the legacy AI system works well and a new decision for situations with potential errors. Input signals for the correctors can be the inputs of the legacy AI system, its internal signals, and outputs. If the intrinsic dimensionality of data is high enough then the classifiers for correction of small number of errors can be very simple. According to the blessing of dimensionality effects, even simple and robust Fisher's discriminants can be used for one-shot learning of AI correctors. Stochastic separation theorems provide the mathematical basis for this one-short learning. However, as the number of correctors needed grows, the cluster structure of data becomes important and a new family of stochastic separation theorems is required. We refuse the classical hypothesis of the regularity of the data distribution and assume that the data can have a fine-grained structure with many clusters and peaks in the probability density. New stochastic separation theorems for data with fine-grained structure are formulated and proved. The multi-correctors for granular data are proposed. The advantages of the multi-corrector technology were demonstrated by examples of correcting errors and learning new classes of objects by a deep convolutional neural network on the CIFAR-10 dataset. The key problems of the non-classical high-dimensional data analysis are reviewed together with the basic preprocessing steps including supervised, semi-supervised and domain adaptation Principal Component Analysis. △ Less

Submitted 22 October, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: Corrected and restructured version with some extensions

Journal ref: Entropy. 2021; 23(8):1090

arXiv:2106.13997 [pdf, other]

doi 10.1093/imamat/hxad027

The Feasibility and Inevitability of Stealth Attacks

Authors: Ivan Y. Tyukin, Desmond J. Higham, Alexander Bastounis, Eliyas Woldegeorgis, Alexander N. Gorban

Abstract: We develop and study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence (AI) systems including deep learning neural networks. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself. Such a stealth attack could be conducted by a mischievous, corrupt or disgr… ▽ More We develop and study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence (AI) systems including deep learning neural networks. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself. Such a stealth attack could be conducted by a mischievous, corrupt or disgruntled member of a software development team. It could also be made by those wishing to exploit a ``democratization of AI'' agenda, where network architectures and trained parameter sets are shared publicly. We develop a range of new implementable attack strategies with accompanying analysis, showing that with high probability a stealth attack can be made transparent, in the sense that system performance is unchanged on a fixed validation set which is unknown to the attacker, while evoking any desired output on a trigger input of interest. The attacker only needs to have estimates of the size of the validation set and the spread of the AI's relevant latent space. In the case of deep learning neural networks, we show that a one neuron attack is possible - a modification to the weights and bias associated with a single neuron - revealing a vulnerability arising from over-parameterization. We illustrate these concepts using state of the art architectures on two standard image data sets. Guided by the theory and computational results, we also propose strategies to guard against stealth attacks. △ Less

Submitted 4 January, 2023; v1 submitted 26 June, 2021; originally announced June 2021.

MSC Class: 68T01; 68T05; 90C31

Journal ref: IMA Journal of Applied Mathematics, October 2023, hxad027

arXiv:2106.08966 [pdf]

doi 10.1038/s41598-021-01317-z

Social stress drives the multi-wave dynamics of COVID-19 outbreaks

Authors: I. A. Kastalskiy, E. V. Pankratova, E. M. Mirkes, V. B. Kazantsev, A. N. Gorban

Abstract: The dynamics of epidemics depend on how people's behavior changes during an outbreak. At the beginning of the epidemic, people do not know about the virus, then, after the outbreak of epidemics and alarm, they begin to comply with the restrictions and the spreading of epidemics may decline. Over time, some people get tired/frustrated by the restrictions and stop following them (exhaustion), especi… ▽ More The dynamics of epidemics depend on how people's behavior changes during an outbreak. At the beginning of the epidemic, people do not know about the virus, then, after the outbreak of epidemics and alarm, they begin to comply with the restrictions and the spreading of epidemics may decline. Over time, some people get tired/frustrated by the restrictions and stop following them (exhaustion), especially if the number of new cases drops down. After resting for a while, they can follow the restrictions again. But during this pause the second wave can come and become even stronger then the first one. Studies based on SIR models do not predict the observed quick exit from the first wave of epidemics. Social dynamics should be considered. The appearance of the second wave also depends on social factors. Many generalizations of the SIR model have been developed that take into account the weakening of immunity over time, the evolution of the virus, vaccination and other medical and biological details. However, these more sophisticated models do not explain the apparent differences in outbreak profiles between countries with different intrinsic socio-cultural features. In our work, a system of models of the COVID-19 pandemic is proposed, combining the dynamics of social stress with classical epidemic models. Social stress is described by the tools of sociophysics. The combination of a dynamic SIR-type model with the classical triad of stages of the general adaptation syndrome, alarm-resistance-exhaustion, makes it possible to describe with high accuracy the available statistical data for 13 countries. The sets of kinetic constants corresponding to optimal fit of model to data were found. They characterize the ability of society to mobilize efforts against epidemics and maintain this concentration over time, and can further help in the development of strategies specific to a particular society. △ Less

Submitted 19 October, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: Minor corrections, enriched discussion and extended bibliography

Journal ref: Sci Rep 11, 22497 (2021)

arXiv:2104.12869 [pdf, other]

Semantic Analysis for Automated Evaluation of the Potential Impact of Research Articles

Authors: Neslihan Suzen, Alexander Gorban, Jeremy Levesley, Evgeny Mirkes

Abstract: Can the analysis of the semantics of words used in the text of a scientific paper predict its future impact measured by citations? This study details examples of automated text classification that achieved 80% success rate in distinguishing between highly-cited and little-cited articles. Automated intelligent systems allow the identification of promising works that could become influential in the… ▽ More Can the analysis of the semantics of words used in the text of a scientific paper predict its future impact measured by citations? This study details examples of automated text classification that achieved 80% success rate in distinguishing between highly-cited and little-cited articles. Automated intelligent systems allow the identification of promising works that could become influential in the scientific community. The problems of quantifying the meaning of texts and representation of human language have been clear since the inception of Natural Language Processing. This paper presents a novel method for vector representation of text meaning based on information theory and show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus. We describe the experimental framework used to evaluate the impact of scientific articles through their informational semantics. Our interest is in citation classification to discover how important semantics of texts are in predicting the citation count. We propose the semantics of texts as an important factor for citation prediction. For each article, our system extracts the abstract of paper, represents the words of the abstract as vectors in Meaning Space, automatically analyses the distribution of scientific categories (Web of Science categories) within the text of abstract, and then classifies papers according to citation counts (highly-cited, little-cited). We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers. △ Less

Submitted 26 April, 2021; originally announced April 2021.

Comments: 36 pages

arXiv:2104.12174 [pdf, other]

doi 10.1109/IJCNN52387.2021.9534395.

Demystification of Few-shot and One-shot Learning

Authors: Ivan Y. Tyukin, Alexander N. Gorban, Muhammad H. Alkhudaydi, Qinghua Zhou

Abstract: Few-shot and one-shot learning have been the subject of active and intensive research in recent years, with mounting evidence pointing to successful implementation and exploitation of few-shot learning algorithms in practice. Classical statistical learning theories do not fully explain why few- or one-shot learning is at all possible since traditional generalisation bounds normally require large t… ▽ More Few-shot and one-shot learning have been the subject of active and intensive research in recent years, with mounting evidence pointing to successful implementation and exploitation of few-shot learning algorithms in practice. Classical statistical learning theories do not fully explain why few- or one-shot learning is at all possible since traditional generalisation bounds normally require large training and testing samples to be meaningful. This sharply contrasts with numerous examples of successful one- and few-shot learning systems and applications. In this work we present mathematical foundations for a theory of one-shot and few-shot learning and reveal conditions specifying when such learning schemes are likely to succeed. Our theory is based on intrinsic properties of high-dimensional spaces. We show that if the ambient or latent decision space of a learning machine is sufficiently high-dimensional than a large class of objects in this space can indeed be easily learned from few examples provided that certain data non-concentration conditions are met. △ Less

Submitted 29 May, 2021; v1 submitted 25 April, 2021; originally announced April 2021.

Comments: IEEE International Joint Conference on Neural Networks, IJCNN 2021

MSC Class: 68T05; 68T07

Journal ref: In2021 International Joint Conference on Neural Networks (IJCNN) 2021 Jul 18 (pp. 1-7). IEEE

arXiv:2103.01959 [pdf, other]

doi 10.1016/j.plrev.2021.03.001

Dynamic and Thermodynamic Models of Adaptation

Authors: A. N. Gorban, T. A. Tyukina, L. I. Pokidysheva, E. V. Smirnova

Abstract: The concept of biological adaptation was closely connected to some mathematical, engineering and physical ideas from the very beginning. Cannon in his "The wisdom of the body" (1932) used the engineering vision of regulation. In 1938, Selye enriched this approach by the notion of adaptation energy. This term causes much debate when one takes it literally, i.e. as a sort of energy. Selye did not us… ▽ More The concept of biological adaptation was closely connected to some mathematical, engineering and physical ideas from the very beginning. Cannon in his "The wisdom of the body" (1932) used the engineering vision of regulation. In 1938, Selye enriched this approach by the notion of adaptation energy. This term causes much debate when one takes it literally, i.e. as a sort of energy. Selye did not use the language of mathematics, but the formalization of his phenomenological theory in the spirit of thermodynamics was simple and led to verifiable predictions. In 1980s, the dynamics of correlation and variance in systems under adaptation to a load of environmental factors were studied and the universal effect in ensembles of systems under a load of similar factors was discovered: in a crisis, as a rule, even before the onset of obvious symptoms of stress, the correlation increases together with variance (and volatility). During 30 years, this effect has been supported by many observations of groups of humans, mice, trees, grassy plants, and on financial time series. In the last ten years, these results were supplemented by many new experiments, from gene networks in cardiology and oncology to dynamics of depression and clinical psychotherapy. Several systems of models were developed: the thermodynamic-like theory of adaptation of ensembles and several families of models of individual adaptation. Historically, the first group of models was based on Selye's concept of adaptation energy and used fitness estimates. Two other groups of models are based on the idea of hidden attractor bifurcation and on the advection--diffusion model for distribution of population in the space of physiological attributes. We explore this world of models and experiments, starting with classic works, with particular attention to the results of the last ten years and open questions. △ Less

Submitted 17 March, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

Comments: Review paper, 48 pages, 29 figures, 183 bibliography, the final version accepted in Phys Life Rev

Journal ref: Physics of Life Reviews, 2021;37:17--64

arXiv:2011.01750 [pdf, other]

doi 10.3389/fncel.2021.631485

Formation of working memory in a spiking neuron network accompanied by astrocytes

Authors: Susanna Yu. Gordleeva, Yulia A. Tsybina, Mikhail I. Krivonosov, Mikhail V. Ivanchenko, Alexey A. Zaikin, Victor B. Kazantsev, Alexander N. Gorban

Abstract: We propose a biologically plausible computational model of working memory (WM) implemented by the spiking neuron network (SNN) interacting with a network of astrocytes. SNN is modelled by the synaptically coupled Izhikevich neurons with a non-specific architecture connection topology. Astrocytes generating calcium signals are connected by local gap junction diffusive couplings and interact with ne… ▽ More We propose a biologically plausible computational model of working memory (WM) implemented by the spiking neuron network (SNN) interacting with a network of astrocytes. SNN is modelled by the synaptically coupled Izhikevich neurons with a non-specific architecture connection topology. Astrocytes generating calcium signals are connected by local gap junction diffusive couplings and interact with neurons by chemicals diffused in the extracellular space. Calcium elevations occur in response to the increase of concentration of a neurotransmitter released by spiking neurons when a group of them fire coherently. In turn, gliotransmitters are released by activated astrocytes modulating the strengths of synaptic connections in the corresponding neuronal group. Input information is encoded as two-dimensional patterns of short applied current pulses stimulating neurons. The output is taken from frequencies of transient discharges of corresponding neurons. We show how a set of information patterns with quite significant overlap** areas can be uploaded into the neuron-astrocyte network and stored for several seconds. Information retrieval is organised by the application of a cue pattern representing the one from the memory set distorted by noise. We found that successful retrieval with level of the correlation between recalled pattern and ideal pattern more than 90% is possible for multi-item WM task. Having analysed the dynamical mechanism of WM formation, we discovered that astrocytes operating at a time scale of a dozen of seconds can successfully store traces of neuronal activations corresponding to information patterns. In the retrieval stage, the astrocytic network selectively modulates synaptic connections in SNN leading to the successful recall. Information and dynamical characteristics of the proposed WM model agrees with classical concepts and other WM models. △ Less

Submitted 3 November, 2020; originally announced November 2020.

Journal ref: Frontiers in Cellular Neuroscience, 15, 2021, Article 631485

arXiv:2010.05241 [pdf, other]

doi 10.1016/j.neunet.2021.01.034

General stochastic separation theorems with optimal bounds

Authors: Bogdan Grechuk, Alexander N. Gorban, Ivan Y. Tyukin

Abstract: Phenomenon of stochastic separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities. In high-dimensional datasets under broad assumptions each point can be separated from the rest of the set by simple and robust Fisher's discriminant (is Fisher separable). Errors or clusters of errors can be separated from the rest… ▽ More Phenomenon of stochastic separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities. In high-dimensional datasets under broad assumptions each point can be separated from the rest of the set by simple and robust Fisher's discriminant (is Fisher separable). Errors or clusters of errors can be separated from the rest of the data. The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same stochastic separability that holds the keys to understanding the fundamentals of robustness and adaptivity in high-dimensional data-driven AI. To manage errors and analyze vulnerabilities, the stochastic separation theorems should evaluate the probability that the dataset will be Fisher separable in given dimensionality and for a given class of distributions. Explicit and optimal estimates of these separation probabilities are required, and this problem is solved in present work. The general stochastic separation theorems with optimal probability estimates are obtained for important classes of distributions: log-concave distribution, their convex combinations and product distributions. The standard i.i.d. assumption was significantly relaxed. These theorems and estimates can be used both for correction of high-dimensional data driven AI systems and for analysis of their vulnerabilities. The third area of application is the emergence of memories in ensembles of neurons, the phenomena of grandmother's cells and sparse coding in the brain, and explanation of unexpected effectiveness of small neural ensembles in high-dimensional brain. △ Less

Submitted 9 January, 2021; v1 submitted 11 October, 2020; originally announced October 2020.

Comments: Numerical examples and illustrations are added, minor corrections extended discussion and the bibliography

Journal ref: Neural Networks, Volume 138, 2021, Pages 33-56

arXiv:2009.08859 [pdf, other]

Principal Components of the Meaning

Authors: Neslihan Suzen, Alexander Gorban, Jeremy Levesley, Evgeny Mirkes

Abstract: In this paper we argue that (lexical) meaning in science can be represented in a 13 dimension Meaning Space. This space is constructed using principal component analysis (singular decomposition) on the matrix of word category relative information gains, where the categories are those used by the Web of Science, and the words are taken from a reduced word set from texts in the Web of Science. We sh… ▽ More In this paper we argue that (lexical) meaning in science can be represented in a 13 dimension Meaning Space. This space is constructed using principal component analysis (singular decomposition) on the matrix of word category relative information gains, where the categories are those used by the Web of Science, and the words are taken from a reduced word set from texts in the Web of Science. We show that this reduced word set plausibly represents all texts in the corpus, so that the principal component analysis has some objective meaning with respect to the corpus. We argue that 13 dimensions is adequate to describe the meaning of scientific texts, and hypothesise about the qualitative meaning of the principal components. △ Less

Submitted 18 September, 2020; originally announced September 2020.

arXiv:2008.09303 [pdf]

doi 10.1109/TGRS.2021.3076011

Coloring Panchromatic Nighttime Satellite Images: Comparing the Performance of Several Machine Learning Methods

Authors: N. Rybnikova, B. A. Portnov, E. M. Mirkes, A. Zinovyev, A. Brook, A. N. Gorban

Abstract: Artificial light-at-night (ALAN), emitted from the ground and visible from space, marks human presence on Earth. Since the launch of the Suomi National Polar Partnership satellite with the Visible Infrared Imaging Radiometer Suite Day/Night Band (VIIRS/DNB) onboard, global nighttime images have significantly improved; however, they remained panchromatic. Although multispectral images are also avai… ▽ More Artificial light-at-night (ALAN), emitted from the ground and visible from space, marks human presence on Earth. Since the launch of the Suomi National Polar Partnership satellite with the Visible Infrared Imaging Radiometer Suite Day/Night Band (VIIRS/DNB) onboard, global nighttime images have significantly improved; however, they remained panchromatic. Although multispectral images are also available, they are either commercial or free of charge, but sporadic. In this paper, we use several machine learning techniques, such as linear, kernel, random forest regressions, and elastic map approach, to transform panchromatic VIIRS/DBN into Red Green Blue (RGB) images. To validate the proposed approach, we analyze RGB images for eight urban areas worldwide. We link RGB values, obtained from ISS photographs, to panchromatic ALAN intensities, their pixel-wise differences, and several land-use type proxies. Each dataset is used for model training, while other datasets are used for the model validation. The analysis shows that model-estimated RGB images demonstrate a high degree of correspondence with the original RGB images from the ISS database. Yet, estimates, based on linear, kernel and random forest regressions, provide better correlations, contrast similarity and lower WMSEs levels, while RGB images, generated using elastic map approach, provide higher consistency of predictions. △ Less

Submitted 10 April, 2021; v1 submitted 21 August, 2020; originally announced August 2020.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 60, Art no. 4702715. 2022

arXiv:2007.16163 [pdf, ps, other]

doi 10.1134/S1063772918110069

The Stellar Population and Orbit of the Galactic Globular Cluster Palomar 3

Authors: M. E. Sharina, M. V. Ryabova, M. I. Maricheva, A. S. Gorban

Abstract: Deep stellar photometry of one of the most distant Galactic globular clusters, Palomar 3, based on frames taken with the VLT in Johnson-Cousins broadband V and Ifilters is presented, together with medium-resolution stellar spectroscopy in the central region of the cluster obtained with the CARELEC spectrograph of the Observatoire de Haute Provence and measurements of the Lick spectral indices for… ▽ More Deep stellar photometry of one of the most distant Galactic globular clusters, Palomar 3, based on frames taken with the VLT in Johnson-Cousins broadband V and Ifilters is presented, together with medium-resolution stellar spectroscopy in the central region of the cluster obtained with the CARELEC spectrograph of the Observatoire de Haute Provence and measurements of the Lick spectral indices for the integrated spectrum. Computations of the orbital parameters of Palomar 3 and nine Galactic globular clusters with similar metallicities and ages are also presented. The orbital parameters, age, metallicity, and distance of Palomar 3 are estimated. The interstellar absorption is consistent with and supplements values from the literature. The need to obtain more accurate data on the proper motions, ages, and chemical compositions of the cluster stars to elucidate the origin of this globular cluster is emphasized. △ Less

Submitted 31 July, 2020; originally announced July 2020.

Comments: 20 pages, 8 tables, 7 figures

Journal ref: Astronomy Reports, 2018, Volume 62, Issue 11, pp.733-746

arXiv:2007.08225 [pdf, other]

doi 10.1016/j.rinp.2021.103922

Transition states and entangled mass action law

Authors: Alexander N. Gorban

Abstract: The classical approaches to the derivation of the (generalized) Mass Action Law (MAL) assume that the intermediate transition state (i) has short life time and (ii) is in partial equilibrium with the initial reagents of the elementary reaction. The partial equilibrium assumption (ii) means that the reverse decomposition of the intermediates is much faster than its transition through other channels… ▽ More The classical approaches to the derivation of the (generalized) Mass Action Law (MAL) assume that the intermediate transition state (i) has short life time and (ii) is in partial equilibrium with the initial reagents of the elementary reaction. The partial equilibrium assumption (ii) means that the reverse decomposition of the intermediates is much faster than its transition through other channels to the products. In this work we demonstrate how avoiding this partial equilibrium assumption modifies the reaction rates. This kinetic revision of transition state theory results in an effective `entanglement' of reaction rates, which become linear combinations of different MAL expressions. △ Less

Submitted 17 January, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

Comments: Significantly extended version with more explanation, illustrations, and references

Journal ref: Results in Physics, Volume 22, March 2021, Article Number103922

arXiv:2007.03788 [pdf, other]

doi 10.1093/gigascience/giaa128

Trajectories, bifurcations and pseudotime in large clinical datasets: applications to myocardial infarction and diabetes data

Authors: Sergey E. Golovenkin, Jonathan Bac, Alexander Chervov, Evgeny M. Mirkes, Yuliya V. Orlova, Emmanuel Barillot, Alexander N. Gorban, Andrei Zinovyev

Abstract: Large observational clinical datasets become increasingly available for mining associations between various disease traits and administered therapy. These datasets can be considered as representations of the landscape of all possible disease conditions, in which a concrete pathology develops through a number of stereotypical routes, characterized by `points of no return' and `final states' (such a… ▽ More Large observational clinical datasets become increasingly available for mining associations between various disease traits and administered therapy. These datasets can be considered as representations of the landscape of all possible disease conditions, in which a concrete pathology develops through a number of stereotypical routes, characterized by `points of no return' and `final states' (such as lethal or recovery states). Extracting this information directly from the data remains challenging, especially in the case of synchronic (with a short-term follow up) observations. Here we suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values, through modeling the geometrical data structure as a bouquet of bifurcating clinical trajectories. The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations. The methodology allows positioning a patient on a particular clinical trajectory (pathological scenario) and characterizing the degree of progression along it with a qualitative estimate of the uncertainty of the prognosis. Overall, our pseudo-time quantification-based approach gives a possibility to apply the methods developed for dynamical disease phenoty** and illness trajectory analysis (diachronic data analysis) to synchronic observational data. We developed a tool $ClinTrajan$ for clinical trajectory analysis implemented in Python programming language. We test the methodology in two large publicly available datasets: myocardial infarction complications and readmission of diabetic patients data. △ Less

Submitted 5 October, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

ACM Class: I.2.6; J.3; J.2

Journal ref: GigaScience, Volume 9, Issue 11, 2020, giaa128,

arXiv:2005.06284 [pdf, other]

Pruning coupled with learning, ensembles of minimal neural networks, and future of XAI

Authors: Alexander N. Gorban, Evgeny M. Mirkes

Abstract: Pruning coupled with learning aims to optimize the neural network (NN) structure for solving specific problems. This optimization can be used for various purposes: to prevent overfitting, to save resources for implementation and training, to provide explainability of the trained NN, and many others. The minimal structure that cannot be pruned further is not unique. Ensemble of minimal structures c… ▽ More Pruning coupled with learning aims to optimize the neural network (NN) structure for solving specific problems. This optimization can be used for various purposes: to prevent overfitting, to save resources for implementation and training, to provide explainability of the trained NN, and many others. The minimal structure that cannot be pruned further is not unique. Ensemble of minimal structures can be used as a committee of intellectual agents that solves problems by voting. Each minimal NN presents an "empirical knowledge" about the problem and can be verbalized. The non-uniqueness of such knowledge extracted from data is an important property of data-driven Artificial Intelligence (AI). In this work, we review an approach to pruning based on the principle: What controls training should control pruning. This principle is expected to work both for artificial NN and for selection and modification of important synaptic contacts in brain. In back-propagation artificial NN learning is controlled by the gradient of loss functions. Therefore, the first order sensitivity indicators are used for pruning and the algorithms based on these indicators are reviewed. The notion of logically transparent NN was introduced. The approach was illustrated on the problem of political forecasting: predicting the results of the US presidential election. Eight minimal NN were produced that give different forecasting algorithms. The non-uniqueness of solution can be utilised by creation of expert panels (committee). Another use of NN pluralism is to identify areas of input signals where further data collection is most useful. In Conclusion, we discuss the possible future of widely advertised XAI program. △ Less

Submitted 22 January, 2023; v1 submitted 13 May, 2020; originally announced May 2020.

Comments: Significantly modified and extended version, 23 pages, 5 figures

arXiv:2004.14230 [pdf, other]

doi 10.3390/e22101105

Fractional norms and quasinorms do not help to overcome the curse of dimensionality

Authors: Evgeny M. Mirkes, Jeza Allohibi, Alexander N. Gorban

Abstract: The curse of dimensionality causes the well-known and widely discussed problems for machine learning methods. There is a hypothesis that using of the Manhattan distance and even fractional quasinorms lp (for p less than 1) can help to overcome the curse of dimensionality in classification problems. In this study, we systematically test this hypothesis. We confirm that fractional quasinorms have a… ▽ More The curse of dimensionality causes the well-known and widely discussed problems for machine learning methods. There is a hypothesis that using of the Manhattan distance and even fractional quasinorms lp (for p less than 1) can help to overcome the curse of dimensionality in classification problems. In this study, we systematically test this hypothesis. We confirm that fractional quasinorms have a greater relative contrast or coefficient of variation than the Euclidean norm l2, but we also demonstrate that the distance concentration shows qualitatively the same behaviour for all tested norms and quasinorms and the difference between them decays as dimension tends to infinity. Estimation of classification quality for kNN based on different norms and quasinorms shows that a greater relative contrast does not mean better classifier performance and the worst performance for different databases was shown by different norms (quasinorms). A systematic comparison shows that the difference of the performance of kNN based on lp for p=2, 1, and 0.5 is statistically insignificant. △ Less

Submitted 29 April, 2020; originally announced April 2020.

Journal ref: Entropy. 2020; 22(10):1105

arXiv:2004.13717 [pdf, other]

Informational Space of Meaning for Scientific Texts

Authors: Neslihan Suzen, Evgeny M. Mirkes, Alexander N. Gorban

Abstract: In Natural Language Processing, automatic extracting the meaning of texts constitutes an important problem. Our focus is the computational analysis of meaning of short scientific texts (abstracts or brief reports). In this paper, a vector space model is developed for quantifying the meaning of words and texts. We introduce the Meaning Space, in which the meaning of a word is represented by a vecto… ▽ More In Natural Language Processing, automatic extracting the meaning of texts constitutes an important problem. Our focus is the computational analysis of meaning of short scientific texts (abstracts or brief reports). In this paper, a vector space model is developed for quantifying the meaning of words and texts. We introduce the Meaning Space, in which the meaning of a word is represented by a vector of Relative Information Gain (RIG) about the subject categories that the text belongs to, which can be obtained from observing the word in the text. This new approach is applied to construct the Meaning Space based on Leicester Scientific Corpus (LSC) and Leicester Scientific Dictionary-Core (LScDC). The LSC is a scientific corpus of 1,673,350 abstracts and the LScDC is a scientific dictionary which words are extracted from the LSC. Each text in the LSC belongs to at least one of 252 subject categories of Web of Science (WoS). These categories are used in construction of vectors of information gains. The Meaning Space is described and statistically analysed for the LSC with the LScDC. The usefulness of the proposed representation model is evaluated through top-ranked words in each category. The most informative n words are ordered. We demonstrated that RIG-based word ranking is much more useful than ranking based on raw word frequency in determining the science-specific meaning and importance of a word. The proposed model based on RIG is shown to have ability to stand out topic-specific words in categories. The most informative words are presented for 252 categories. The new scientific dictionary and the 103,998 x 252 Word-Category RIG Matrix are available online. Analysis of the Meaning Space provides us with a tool to further explore quantifying the meaning of a text using more complex and context-dependent meaning models that use co-occurrence of words and their combinations. △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: 320 pages

arXiv:2004.04479 [pdf, ps, other]

doi 10.1109/IJCNN48605.2020.9207472

On Adversarial Examples and Stealth Attacks in Artificial Intelligence Systems

Authors: Ivan Y. Tyukin, Desmond J. Higham, Alexander N. Gorban

Abstract: In this work we present a formal theoretical framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. Our results apply to general multi-class classifiers that map from an input space into a decision space, including artificial neural networks used in deep learning applications. Two classes of attacks are considered. The first cla… ▽ More In this work we present a formal theoretical framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. Our results apply to general multi-class classifiers that map from an input space into a decision space, including artificial neural networks used in deep learning applications. Two classes of attacks are considered. The first class involves adversarial examples and concerns the introduction of small perturbations of the input data that cause misclassification. The second class, introduced here for the first time and named stealth attacks, involves small perturbations to the AI system itself. Here the perturbed system produces whatever output is desired by the attacker on a specific small data set, perhaps even a single input, but performs as normal on a validation set (which is unknown to the attacker). We show that in both cases, i.e., in the case of an attack based on adversarial examples and in the case of a stealth attack, the dimensionality of the AI's decision-making space is a major contributor to the AI's susceptibility. For attacks based on adversarial examples, a second crucial parameter is the absence of local concentrations in the data probability distribution, a property known as Smeared Absolute Continuity. According to our findings, robustness to adversarial examples requires either (a) the data distributions in the AI's feature space to have concentrated probability density functions or (b) the dimensionality of the AI's decision variables to be sufficiently small. We also show how to construct stealth attacks on high-dimensional AI systems that are hard to spot unless the validation set is made exponentially large. △ Less

Submitted 9 April, 2020; originally announced April 2020.

MSC Class: 68T05; 68T10; 90C31

Journal ref: 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, United Kingdom, 2020

arXiv:2002.02532 [pdf, other]

doi 10.1016/j.plrev.2019.12.002

Singularities of transient processes in dynamics and beyond

Authors: Alexander N. Gorban

Abstract: This note is a brief review of the analysis of long transients in dynamical systems. The problem of long transients arose in many disciplines, from physical and chemical kinetic to biology and even social sciences. Detailed analysis of singularities of various `relaxation times' associated long transients with bifurcations of $ω$-limit sets, homoclinic structures (intersections of $α$- and $ω$-lim… ▽ More This note is a brief review of the analysis of long transients in dynamical systems. The problem of long transients arose in many disciplines, from physical and chemical kinetic to biology and even social sciences. Detailed analysis of singularities of various `relaxation times' associated long transients with bifurcations of $ω$-limit sets, homoclinic structures (intersections of $α$- and $ω$-limit sets) and other peculiarities of dynamics. This review was stimulated by the analysis of anomalously long transients in ecology published recently by A. Morozov and S. Petrovskii with co-authors. △ Less

Submitted 3 February, 2020; originally announced February 2020.

Journal ref: Physics of Life Reviews Volume 32, March 2020, Pages 46-49

arXiv:2001.06520 [pdf, other]

doi 10.1007/978-3-030-10442-9

Personality Traits and Drug Consumption. A Story Told by Data

Authors: Elaine Fehrman, Vincent Egan, Alexander N. Gorban, Jeremy Levesley, Evgeny M. Mirkes, Awaz K. Muhammad

Abstract: This is a preprint version of the first book from the series: "Stories told by data". In this book a story is told about the psychological traits associated with drug consumption. The book includes: - A review of published works on the psychological profiles of drug users. - Analysis of a new original database with information on 1885 respondents and usage of 18 drugs. (Database is available o… ▽ More This is a preprint version of the first book from the series: "Stories told by data". In this book a story is told about the psychological traits associated with drug consumption. The book includes: - A review of published works on the psychological profiles of drug users. - Analysis of a new original database with information on 1885 respondents and usage of 18 drugs. (Database is available online.) - An introductory description of the data mining and machine learning methods used for the analysis of this dataset. - The demonstration that the personality traits (five factor model, impulsivity, and sensation seeking), together with simple demographic data, give the possibility of predicting the risk of consumption of individual drugs with sensitivity and specificity above 70% for most drugs. - The analysis of correlations of use of different substances and the description of the groups of drugs with correlated use (correlation pleiades). - Proof of significant differences of personality profiles for users of different drugs. This is explicitly proved for benzodiazepines, ecstasy, and heroin. - Tables of personality profiles for users and non-users of 18 substances. The book is aimed at advanced undergraduates or first-year PhD students, as well as researchers and practitioners. No previous knowledge of machine learning, advanced data mining concepts or modern psychology of personality is assumed. For more detailed introduction into statistical methods we recommend several undergraduate textbooks. Familiarity with basic statistics and some experience in the use of probabilities would be helpful as well as some basic technical understanding of psychology. △ Less

Submitted 17 January, 2020; originally announced January 2020.

Comments: A preprint version prepared by the authors before the Springer editorial work. 124 pages, 27 figures, 63 tables, bibl. 244

Journal ref: Springer, Cham, Research Monograph, 2019, ISBN 978-3-030-10441-2

arXiv:2001.04959 [pdf, other]

doi 10.3390/e22010082

High--Dimensional Brain in a High-Dimensional World: Blessing of Dimensionality

Authors: Alexander N. Gorban, Valery A. Makarov, Ivan Y. Tyukin

Abstract: High-dimensional data and high-dimensional representations of reality are inherent features of modern Artificial Intelligence systems and applications of machine learning. The well-known phenomenon of the "curse of dimensionality" states: many problems become exponentially difficult in high dimensions. Recently, the other side of the coin, the "blessing of dimensionality", has attracted much atten… ▽ More High-dimensional data and high-dimensional representations of reality are inherent features of modern Artificial Intelligence systems and applications of machine learning. The well-known phenomenon of the "curse of dimensionality" states: many problems become exponentially difficult in high dimensions. Recently, the other side of the coin, the "blessing of dimensionality", has attracted much attention. It turns out that generic high-dimensional datasets exhibit fairly simple geometric properties. Thus, there is a fundamental tradeoff between complexity and simplicity in high dimensional spaces. Here we present a brief explanatory review of recent ideas, results and hypotheses about the blessing of dimensionality and related simplifying effects relevant to machine learning and neuroscience. △ Less

Submitted 14 January, 2020; originally announced January 2020.

Comments: 18 pages, 5 figures

Journal ref: Entropy 2020, 22(1), 82

arXiv:1912.06858 [pdf, other]

LScDC-new large scientific dictionary

Authors: Neslihan Suzen, Evgeny M. Mirkes, Alexander N. Gorban

Abstract: In this paper, we present a scientific corpus of abstracts of academic papers in English -- Leicester Scientific Corpus (LSC). The LSC contains 1,673,824 abstracts of research articles and proceeding papers indexed by Web of Science (WoS) in which publication year is 2014. Each abstract is assigned to at least one of 252 subject categories. Paper metadata include these categories and the number of… ▽ More In this paper, we present a scientific corpus of abstracts of academic papers in English -- Leicester Scientific Corpus (LSC). The LSC contains 1,673,824 abstracts of research articles and proceeding papers indexed by Web of Science (WoS) in which publication year is 2014. Each abstract is assigned to at least one of 252 subject categories. Paper metadata include these categories and the number of citations. We then develop scientific dictionaries named Leicester Scientific Dictionary (LScD) and Leicester Scientific Dictionary-Core (LScDC), where words are extracted from the LSC. The LScD is a list of 974,238 unique words (lemmas). The LScDC is a core list (sub-list) of the LScD with 104,223 lemmas. It was created by removing LScD words appearing in not greater than 10 texts in the LSC. LScD and LScDC are available online. Both the corpus and dictionaries are developed to be later used for quantification of meaning in academic texts. Finally, the core list LScDC was analysed by comparing its words and word frequencies with a classic academic word list 'New Academic Word List (NAWL)' containing 963 word families, which is also sampled from an academic corpus. The major sources of the corpus where NAWL is extracted are Cambridge English Corpus (CEC), oral sources and textbooks. We investigate whether two dictionaries are similar in terms of common words and ranking of words. Our comparison leads us to main conclusion: most of words of NAWL (99.6%) are present in the LScDC but two lists differ in word ranking. This difference is measured. △ Less

Submitted 14 December, 2019; originally announced December 2019.

Comments: 63 pages

arXiv:1910.00445 [pdf, other]

doi 10.1016/j.ins.2021.01.022

Blessing of dimensionality at the edge

Authors: Ivan Y. Tyukin, Alexander N. Gorban, Alistair A. McEwan, Sepehr Meshkinfamfard, Lixin Tang

Abstract: In this paper we present theory and algorithms enabling classes of Artificial Intelligence (AI) systems to continuously and incrementally improve with a-priori quantifiable guarantees - or more specifically remove classification errors - over time. This is distinct from state-of-the-art machine learning, AI, and software approaches. Another feature of this approach is that, in the supervised setti… ▽ More In this paper we present theory and algorithms enabling classes of Artificial Intelligence (AI) systems to continuously and incrementally improve with a-priori quantifiable guarantees - or more specifically remove classification errors - over time. This is distinct from state-of-the-art machine learning, AI, and software approaches. Another feature of this approach is that, in the supervised setting, the computational complexity of training is linear in the number of training samples. At the time of classification, the computational complexity is bounded by few inner product calculations. Moreover, the implementation is shown to be very scalable. This makes it viable for deployment in applications where computational power and memory are limited, such as embedded environments. It enables the possibility for fast on-line optimisation using improved training samples. The approach is based on the concentration of measure effects and stochastic separation theorems and is illustrated with an example on the identification faulty processes in Computer Numerical Control (CNC) milling and with a case study on adaptive removal of false positives in an industrial video surveillance and analytics system. △ Less

Submitted 10 July, 2020; v1 submitted 30 September, 2019; originally announced October 2019.

MSC Class: 68T05; 68T45; 68Q32

Journal ref: Information Sciences, 564, 124-143 (2021)

arXiv:1906.12222 [pdf, ps, other]

doi 10.1016/j.plrev.2019.06.003

Symphony of high-dimensional brain

Authors: Alexander N. Gorban, Valeri A. Makarov, Ivan Y. Tyukin

Abstract: This paper is the final part of the scientific discussion organised by the Journal "Physics of Life Rviews" about the simplicity revolution in neuroscience and AI. This discussion was initiated by the review paper "The unreasonable effectiveness of small neural ensembles in high-dimensional brain". Phys Life Rev 2019, doi 10.1016/j.plrev.2018.09.005, arXiv:1809.07656. The topics of the discussion… ▽ More This paper is the final part of the scientific discussion organised by the Journal "Physics of Life Rviews" about the simplicity revolution in neuroscience and AI. This discussion was initiated by the review paper "The unreasonable effectiveness of small neural ensembles in high-dimensional brain". Phys Life Rev 2019, doi 10.1016/j.plrev.2018.09.005, arXiv:1809.07656. The topics of the discussion varied from the necessity to take into account the difference between the theoretical random distributions and "extremely non-random" real distributions and revise the common machine learning theory, to different forms of the curse of dimensionality and high-dimensional pitfalls in neuroscience. V. K{ů}rkov{á}, A. Tozzi and J.F. Peters, R. Quian Quiroga, P. Varona, R. Barrio, G. Kreiman, L. Fortuna, C. van Leeuwen, R. Quian Quiroga, and V. Kreinovich, A.N. Gorban, V.A. Makarov, and I.Y. Tyukin participated in the discussion. In this paper we analyse the symphony of opinions and the possible outcomes of the simplicity revolution for machine learning and neuroscience. △ Less

Submitted 27 June, 2019; originally announced June 2019.

Journal ref: Physics of Life Reviews, 2019

arXiv:1906.01073 [pdf, ps, other]

Three waves of chemical dynamics

Authors: A. N. Gorban, G. S. Yablonsky

Abstract: Three epochs in development of chemical dynamics are presented. We try to understand the modern research programs in the light of classical works. Three eras (or waves) of chemical dynamics can be revealed in the flux of research and publications. These waves may be associated with leaders: the first is the van't Hoof wave, the second may be called the Semenov--Hinshelwood wave and the third is de… ▽ More Three epochs in development of chemical dynamics are presented. We try to understand the modern research programs in the light of classical works. Three eras (or waves) of chemical dynamics can be revealed in the flux of research and publications. These waves may be associated with leaders: the first is the van't Hoof wave, the second may be called the Semenov--Hinshelwood wave and the third is definitely the Aris wave. Of course, the whole building was impossible without efforts of hundreds of other researchers. Some of of them are mentioned in our brief review. △ Less

Submitted 3 June, 2019; originally announced June 2019.

Comments: A brief review of chemical dynamics with 45 references

arXiv:1902.05351 [pdf, other]

doi 10.1016/j.cnsns.2019.104910

New universal Lyapunov functions for non-linear reaction networks

Authors: A. N. Gorban

Abstract: In 1961, Rényi discovered a rich family of non-classical Lyapunov functions for kinetics of the Markov chains, or, what is the same, for the linear kinetic equations. This family was parameterised by convex functions on the positive semi-axis. After works of Csiszár and Morimoto, these functions became widely known as $f$-divergences or the Csiszár--Morimoto divergences. These Lyapunov functions a… ▽ More In 1961, Rényi discovered a rich family of non-classical Lyapunov functions for kinetics of the Markov chains, or, what is the same, for the linear kinetic equations. This family was parameterised by convex functions on the positive semi-axis. After works of Csiszár and Morimoto, these functions became widely known as $f$-divergences or the Csiszár--Morimoto divergences. These Lyapunov functions are universal in the following sense: they depend only on the state of equilibrium, not on the kinetic parameters themselves. Despite many years of research, no such wide family of universal Lyapunov functions has been found for nonlinear reaction networks. For general non-linear networks with detailed or complex balance, the classical thermodynamics potentials remain the only universal Lyapunov functions. We constructed a rich family of new universal Lyapunov functions for {\em any non-linear reaction network} with detailed or complex balance. These functions are parameterised by compact subsets of the projective space. They are universal in the same sense: they depend only on the state of equilibrium and on the network structure, but not on the kinetic parameters themselves. The main elements and operations in the construction of the new Lyapunov functions are partial equilibria of reactions and convex envelopes of families of functions. △ Less

Submitted 17 July, 2019; v1 submitted 14 February, 2019; originally announced February 2019.

Comments: Corrections of misprints

Journal ref: Communications in Nonlinear Science and Numerical Simulation Volume 79, December 2019, paper number 104910

arXiv:1812.09611 [pdf, other]

doi 10.1371/journal.pone.0218304

Simple model of complex dynamics of activity patterns in develo** networks of neuronal cultures

Authors: I. Y. Tyukin, D. Iudin, F. Iudin, T. Tyukina, V. Kazantsev, I. Mukhina, A. N. Gorban

Abstract: Living neuronal networks in dissociated neuronal cultures are widely known for their ability to generate highly robust spatiotemporal activity patterns in various experimental conditions. These include neuronal avalanches satisfying the power scaling law and thereby exemplifying self-organized criticality in living systems. A crucial question is how these patterns can be explained and modeled in a… ▽ More Living neuronal networks in dissociated neuronal cultures are widely known for their ability to generate highly robust spatiotemporal activity patterns in various experimental conditions. These include neuronal avalanches satisfying the power scaling law and thereby exemplifying self-organized criticality in living systems. A crucial question is how these patterns can be explained and modeled in a way that is biologically meaningful, mathematically tractable and yet broad enough to account for neuronal heterogeneity and complexity. Here we propose a simple model which may offer an answer to this question. Our derivations are based on just few phenomenological observations concerning input-output behavior of an isolated neuron. A distinctive feature of the model is that at the simplest level of description it comprises of only two variables, a network activity variable and an exogenous variable corresponding to energy needed to sustain the activity and modulate the efficacy of signal transmission. Strikingly, this simple model is already capable of explaining emergence of network spikes and bursts in develo** neuronal cultures. The model behavior and predictions are supported by empirical observations and published experimental evidence on cultured neurons behavior exposed to oxygen and energy deprivation. At the larger, network scale, introduction of the energy-dependent regulatory mechanism enables the network to balance on the edge of the network percolation transition. Network activity in this state shows population bursts satisfying the scaling avalanche conditions. This network state is self-sustainable and represents a balance between global network-wide processes and spontaneous activity of individual elements. △ Less

Submitted 22 December, 2018; originally announced December 2018.

Journal ref: PLoS ONE 14(6): e0218304. 2019

arXiv:1811.05321 [pdf, other]

doi 10.1016/j.ins.2018.07.040

Correction of AI systems by linear discriminants: Probabilistic foundations

Authors: A. N. Gorban, A. Golubkov, B. Grechuk, E. M. Mirkes, I. Y. Tyukin

Abstract: Artificial Intelligence (AI) systems sometimes make errors and will make errors in the future, from time to time. These errors are usually unexpected, and can lead to dramatic consequences. Intensive development of AI and its practical applications makes the problem of errors more important. Total re-engineering of the systems can create new errors and is not always possible due to the resources i… ▽ More Artificial Intelligence (AI) systems sometimes make errors and will make errors in the future, from time to time. These errors are usually unexpected, and can lead to dramatic consequences. Intensive development of AI and its practical applications makes the problem of errors more important. Total re-engineering of the systems can create new errors and is not always possible due to the resources involved. The important challenge is to develop fast methods to correct errors without damaging existing skills. We formulated the technical requirements to the 'ideal' correctors. Such correctors include binary classifiers, which separate the situations with high risk of errors from the situations where the AI systems work properly. Surprisingly, for essentially high-dimensional data such methods are possible: simple linear Fisher discriminant can separate the situations with errors from correctly solved tasks even for exponentially large samples. The paper presents the probabilistic basis for fast non-destructive correction of AI systems. A series of new stochastic separation theorems is proven. These theorems provide new instruments for fast non-iterative correction of errors of legacy AI systems. The new approaches become efficient in high-dimensions, for correction of high-dimensional systems in high-dimensional world (i.e. for processing of essentially high-dimensional data by large systems). △ Less

Submitted 11 November, 2018; originally announced November 2018.

Comments: arXiv admin note: text overlap with arXiv:1809.07656 and arXiv:1802.02172

Journal ref: Information Sciences 466 (2018), 303-322

Showing 1–50 of 165 results for author: Gorban, A