Search | arXiv e-print repository

doi 10.1016/j.ins.2018.11.057

Fast Construction of Correcting Ensembles for Legacy Artificial Intelligence Systems: Algorithms and a Case Study

Authors: Ivan Y. Tyukin, Alexander N. Gorban, Stephen Green, Danil Prokhorov

Abstract: This paper presents a technology for simple and computationally efficient improvements of a generic Artificial Intelligence (AI) system, including Multilayer and Deep Learning neural networks. The improvements are, in essence, small network ensembles constructed on top of the existing AI architectures. Theoretical foundations of the technology are based on Stochastic Separation Theorems and the id… ▽ More This paper presents a technology for simple and computationally efficient improvements of a generic Artificial Intelligence (AI) system, including Multilayer and Deep Learning neural networks. The improvements are, in essence, small network ensembles constructed on top of the existing AI architectures. Theoretical foundations of the technology are based on Stochastic Separation Theorems and the ideas of the concentration of measure. We show that, subject to mild technical assumptions on statistical properties of internal signals in the original AI system, the technology enables instantaneous and computationally efficient removal of spurious and systematic errors with probability close to one on the datasets which are exponentially large in dimension. The method is illustrated with numerical examples and a case study of ten digits recognition from American Sign Language. △ Less

Submitted 13 February, 2019; v1 submitted 12 October, 2018; originally announced October 2018.

Journal ref: Information Sciences, 2019

arXiv:1809.07656 [pdf, other]

doi 10.1016/j.plrev.2018.09.005

The unreasonable effectiveness of small neural ensembles in high-dimensional brain

Authors: A. N. Gorban, V. A. Makarov, I. Y. Tyukin

Abstract: Despite the widely-spread consensus on the brain complexity, sprouts of the single neuron revolution emerged in neuroscience in the 1970s. They brought many unexpected discoveries, including grandmother or concept cells and sparse coding of information in the brain. In machine learning for a long time, the famous curse of dimensionality seemed to be an unsolvable problem. Nevertheless, the idea… ▽ More Despite the widely-spread consensus on the brain complexity, sprouts of the single neuron revolution emerged in neuroscience in the 1970s. They brought many unexpected discoveries, including grandmother or concept cells and sparse coding of information in the brain. In machine learning for a long time, the famous curse of dimensionality seemed to be an unsolvable problem. Nevertheless, the idea of the blessing of dimensionality becomes gradually more and more popular. Ensembles of non-interacting or weakly interacting simple units prove to be an effective tool for solving essentially multidimensional problems. This approach is especially useful for one-shot (non-iterative) correction of errors in large legacy artificial intelligence systems. These simplicity revolutions in the era of complexity have deep fundamental reasons grounded in geometry of multidimensional data spaces. To explore and understand these reasons we revisit the background ideas of statistical physics. In the course of the 20th century they were developed into the concentration of measure theory. New stochastic separation theorems reveal the fine structure of the data clouds. We review and analyse biological, physical, and mathematical problems at the core of the fundamental question: how can high-dimensional brain organise reliable and fast learning in high-dimensional world of data by simple tools? Two critical applications are reviewed to exemplify the approach: one-shot correction of errors in intellectual systems and emergence of static and associative memories in ensembles of single neurons. △ Less

Submitted 10 November, 2018; v1 submitted 20 September, 2018; originally announced September 2018.

Comments: Review paper, accepted in Physics of Life Reviews; minor corrections

Journal ref: Physics of Life Reviews Volume 29, July 2019, Pages 55-88

arXiv:1807.10543 [pdf, other]

doi 10.1016/j.procs.2020.02.171

Automatic Short Answer Grading and Feedback Using Text Mining Methods

Authors: Neslihan Suzen, Alexander Gorban, Jeremy Levesley, Evgeny Mirkes

Abstract: Automatic grading is not a new approach but the need to adapt the latest technology to automatic grading has become very important. As the technology has rapidly became more powerful on scoring exams and essays, especially from the 1990s onwards, partially or wholly automated grading systems using computational methods have evolved and have become a major area of research. In particular, the deman… ▽ More Automatic grading is not a new approach but the need to adapt the latest technology to automatic grading has become very important. As the technology has rapidly became more powerful on scoring exams and essays, especially from the 1990s onwards, partially or wholly automated grading systems using computational methods have evolved and have become a major area of research. In particular, the demand of scoring of natural language responses has created a need for tools that can be applied to automatically grade these responses. In this paper, we focus on the concept of automatic grading of short answer questions such as are typical in the UK GCSE system, and providing useful feedback on their answers to students. We present experimental results on a dataset provided from the introductory computer science class in the University of North Texas. We first apply standard data mining techniques to the corpus of student answers for the purpose of measuring similarity between the student answers and the model answer. This is based on the number of common words. We then evaluate the relation between these similarities and marks awarded by scorers. We then consider an approach that groups student answers into clusters. Each cluster would be awarded the same mark, and the same feedback given to each answer in a cluster. In this manner, we demonstrate that clusters indicate the groups of students who are awarded the same or the similar scores. Words in each cluster are compared to show that clusters are constructed based on how many and which words of the model answer have been used. The main novelty in this paper is that we design a model to predict marks based on the similarities between the student answers and the model answer. △ Less

Submitted 19 December, 2019; v1 submitted 27 July, 2018; originally announced July 2018.

Comments: 27 pages; added questions for section 6; correction of typos

Journal ref: Procedia Computer Science 169 (2020), 726-743

arXiv:1805.01516 [pdf, ps, other]

How deep should be the depth of convolutional neural networks: a backyard dog case study

Authors: A. N. Gorban, E. M. Mirkes, I. Y. Tyukin

Abstract: The work concerns the problem of reducing a pre-trained deep neuronal network to a smaller network, with just few layers, whilst retaining the network's functionality on a given task The proposed approach is motivated by the observation that the aim to deliver the highest accuracy possible in the broadest range of operational conditions, which many deep neural networks models strive to achieve,… ▽ More The work concerns the problem of reducing a pre-trained deep neuronal network to a smaller network, with just few layers, whilst retaining the network's functionality on a given task The proposed approach is motivated by the observation that the aim to deliver the highest accuracy possible in the broadest range of operational conditions, which many deep neural networks models strive to achieve, may not necessarily be always needed, desired, or even achievable due to the lack of data or technical constraints. In relation to the face recognition problem, we formulated an example of such a usecase, the `backyard dog' problem. The `backyard dog', implemented by a lean network, should correctly identify members from a limited group of individuals, a `family', and should distinguish between them. At the same time, the network must produce an alarm to an image of an individual who is not in a member of the family. To produce such a network, we propose a shallowing algorithm. The algorithm takes an existing deep learning model on its input and outputs a shallowed version of it. The algorithm is non-iterative and is based on the Advanced Supervised Principal Component Analysis. Performance of the algorithm is assessed in exhaustive numerical experiments. In the above usecase, the `backyard dog' problem, the method is capable of drastically reducing the depth of deep learning neural networks, albeit at the cost of mild performance deterioration. We developed a simple non-iterative method for shallowing down pre-trained deep networks. The method is generic in the sense that it applies to a broad class of feed-forward networks, and is based on the Advanced Supervise Principal Component Analysis. The method enables generation of families of smaller-size shallower specialized networks tuned for specific operational conditions and tasks from a single larger and more universal legacy network. △ Less

Submitted 8 December, 2019; v1 submitted 3 May, 2018; originally announced May 2018.

Comments: Edited and extended version with more detailed description of numerical experiments

arXiv:1804.08588 [pdf, other]

Large Scale Scene Text Verification with Guided Attention

Authors: Dafang He, Yeqing Li, Alexander Gorban, Derrall Heath, Julian Ibarz, Qian Yu, Daniel Kifer, C. Lee Giles

Abstract: Many tasks are related to determining if a particular text string exists in an image. In this work, we propose a new framework that learns this task in an end-to-end way. The framework takes an image and a text string as input and then outputs the probability of the text string being present in the image. This is the first end-to-end framework that learns such relationships between text and images… ▽ More Many tasks are related to determining if a particular text string exists in an image. In this work, we propose a new framework that learns this task in an end-to-end way. The framework takes an image and a text string as input and then outputs the probability of the text string being present in the image. This is the first end-to-end framework that learns such relationships between text and images in scene text area. The framework does not require explicit scene text detection or recognition and thus no bounding box annotations are needed for it. It is also the first work in scene text area that tackles suh a weakly labeled problem. Based on this framework, we developed a model called Guided Attention. Our designed model achieves much better results than several state-of-the-art scene text reading based solutions for a challenging Street View Business Matching task. The task tries to find correct business names for storefront images and the dataset we collected for it is substantially larger, and more challenging than existing scene text dataset. This new real-world task provides a new perspective for studying scene text related problems. We also demonstrate the uniqueness of our task via a comparison between our problem and a typical Visual Question Answering problem. △ Less

Submitted 18 November, 2018; v1 submitted 23 April, 2018; originally announced April 2018.

Comments: 18 pages, ACCV 2019

arXiv:1804.07580 [pdf]

doi 10.3390/e22030296

Robust And Scalable Learning Of Complex Dataset Topologies Via Elpigraph

Authors: Luca Albergante, Evgeny M. Mirkes, Huidong Chen, Alexis Martin, Louis Faure, Emmanuel Barillot, Luca Pinello, Alexander N. Gorban, Andrei Zinovyev

Abstract: Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of develo** embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computa… ▽ More Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of develo** embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently with large and complex datasets in various fields from biology, where it can be used to infer gene dynamics from single-cell RNA-Seq, to astronomy, where it can be used to explore complex structures in the distribution of galaxies. △ Less

Submitted 20 June, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

Comments: 32 pages, 14 figures

Journal ref: Entropy 22, no. 3: 296, 2020

arXiv:1804.01342 [pdf, other]

doi 10.1016/j.ecocom.2018.06.007

Mobility cost and degenerated diffusion in kinesis models

Authors: Alexander N. Gorban, Nurdan Çabukoǧlu

Abstract: A new critical effect is predicted in population dispersal. It is based on the fact that a trade-off between the advantages of mobility and the cost of mobility breaks with a significant deterioration in living conditions. The recently developed model of purposeful kinesis (Gorban \& Çabukoǧlu, Ecological Complexity 33, 2018) is based on the "let well enough alone" idea: mobility decreases for hig… ▽ More A new critical effect is predicted in population dispersal. It is based on the fact that a trade-off between the advantages of mobility and the cost of mobility breaks with a significant deterioration in living conditions. The recently developed model of purposeful kinesis (Gorban \& Çabukoǧlu, Ecological Complexity 33, 2018) is based on the "let well enough alone" idea: mobility decreases for high reproduction coefficient and, therefore, animals stay longer in good conditions and leave quicker bad conditions. Mobility has a cost, which should be measured in the changes of the reproduction coefficient. Introduction of the cost of mobility into the reproduction coefficient leads to an equation for mobility. It can be solved in a closed form using Lambert $W$-function. Surprisingly, the "let well enough alone" models with the simple linear cost of mobility have an intrinsic phase transition: when conditions worsen then the mobility increases up to some critical value of the reproduction coefficient. For worse conditions, there is no solution for mobility. We interpret this critical effect as the complete loss of mobility that is degeneration of diffusion. Qualitatively, this means that mobility increases with worsening of conditions up to some limit, and after that, mobility is nullified. △ Less

Submitted 28 February, 2019; v1 submitted 4 April, 2018; originally announced April 2018.

Comments: The final version submitted to the journal

Journal ref: Ecological Complexity 36 (2018), 16-21

arXiv:1803.03599 [pdf, other]

doi 10.1098/rsta.2017.0238

Hilbert's Sixth Problem: the endless road to rigour

Authors: Alexander N. Gorban

Abstract: Introduction to the special issue of Phil. Trans. R. Soc. A 376, 2018, `Hilbert's Sixth Problem'. The essence of the Sixth Problem is discussed and the content of this issue is introduced. In 1900, David Hilbert presented 23 problems for the advancement of mathematical science. Hilbert's Sixth Problem proposed the expansion of the axiomatic method outside of mathematics, in physics and beyond. I… ▽ More Introduction to the special issue of Phil. Trans. R. Soc. A 376, 2018, `Hilbert's Sixth Problem'. The essence of the Sixth Problem is discussed and the content of this issue is introduced. In 1900, David Hilbert presented 23 problems for the advancement of mathematical science. Hilbert's Sixth Problem proposed the expansion of the axiomatic method outside of mathematics, in physics and beyond. Its title was shocking: "Mathematical Treatment of the Axioms of Physics." Axioms of physics did not exist and were not expected. During further explanation, Hilbert specified this problem with special focus on probability and "the limiting processes, ... which lead from the atomistic view to the laws of motion of continua". The programmatic call was formulated "to treat, by means of axioms, those physical sciences in which already today mathematics plays an important part." This issue presents a modern slice of the work on the Sixth Problem, from quantum probability to fluid dynamics and machine learning, and from review of solid mathematical and physical results to opinion pieces with new ambitious ideas. Some expectations were broken: The continuum limit of atomistic kinetics may differ from the classical fluid dynamics. The "curse of dimensionality" in machine learning turns into the "blessing of dimensionality" that is closely related to statistical physics. Quantum probability facilitates the modelling of geological uncertainty and hydrocarbon reservoirs. And many other findings are presented. △ Less

Submitted 9 March, 2018; originally announced March 2018.

Comments: With portrait of David Hilbert, Courtesy the artist Anna Gorban

Journal ref: Phil. Trans. R. Soc. A volume 376, issue 2118, 20170238, 2018

arXiv:1802.05745 [pdf, ps, other]

doi 10.1016/j.coche.2018.02.009

Model reduction in chemical dynamics: slow invariant manifolds, singular perturbations, thermodynamic estimates, and analysis of reaction graph

Authors: A. N. Gorban

Abstract: The paper has two goals: It presents basic ideas, notions, and methods for reduction of reaction kinetics models: quasi-steady-state, quasi-equilibrium, slow invariant manifolds, and limiting steps. It describes briefly the current state of the art and some latest achievements in the broad area of model reduction in chemical and biochemical kinetics, including new results in methods of invaria… ▽ More The paper has two goals: It presents basic ideas, notions, and methods for reduction of reaction kinetics models: quasi-steady-state, quasi-equilibrium, slow invariant manifolds, and limiting steps. It describes briefly the current state of the art and some latest achievements in the broad area of model reduction in chemical and biochemical kinetics, including new results in methods of invariant manifolds, computation singular perturbation, bottleneck methods, asymptotology, tropical equilibration, and reaction mechanism skeletonisation. △ Less

Submitted 15 February, 2018; originally announced February 2018.

Comments: Review submitted to Current Opinion in Chemical Engineering

Journal ref: Current Opinion in Chemical Engineering Volume 21, September 2018, Pages 48-59

arXiv:1802.02172 [pdf, other]

Augmented Artificial Intelligence: a Conceptual Framework

Authors: Alexander N. Gorban, Bogdan Grechuk, Ivan Y. Tyukin

Abstract: All artificial Intelligence (AI) systems make errors. These errors are unexpected, and differ often from the typical human mistakes ("non-human" errors). The AI errors should be corrected without damage of existing skills and, hopefully, avoiding direct human expertise. This paper presents an initial summary report of project taking new and systematic approach to improving the intellectual effecti… ▽ More All artificial Intelligence (AI) systems make errors. These errors are unexpected, and differ often from the typical human mistakes ("non-human" errors). The AI errors should be corrected without damage of existing skills and, hopefully, avoiding direct human expertise. This paper presents an initial summary report of project taking new and systematic approach to improving the intellectual effectiveness of the individual AI by communities of AIs. We combine some ideas of learning in heterogeneous multiagent systems with new and original mathematical approaches for non-iterative corrections of errors of legacy AI systems. The mathematical foundations of AI non-destructive correction are presented and a series of new stochastic separation theorems is proven. These theorems provide a new instrument for the development, analysis, and assessment of machine learning methods and algorithms in high dimension. They demonstrate that in high dimensions and even for exponentially large samples, linear classifiers in their classical Fisher's form are powerful enough to separate errors from correct responses with high probability and to provide efficient solution to the non-destructive corrector problem. In particular, we prove some hypotheses formulated in our paper `Stochastic Separation Theorems' (Neural Networks, 94, 255--259, 2017), and answer one general problem published by Donoho and Tanner in 2009. △ Less

Submitted 24 March, 2018; v1 submitted 6 February, 2018; originally announced February 2018.

Comments: The mathematical part is significantly extended. New stochastic separation theorems are proven for log-concave distributions. Some previously formulated hypotheses are confirmed

arXiv:1801.06853 [pdf, other]

doi 10.1016/j.ecocom.2018.01.002

Basic Model of Purposeful Kinesis

Authors: A. N. Gorban, N. Çabukoǧlu

Abstract: The notions of taxis and kinesis are introduced and used to describe two types of behavior of an organism in non-uniform conditions: (i) Taxis means the guided movement to more favorable conditions; (ii) Kinesis is the non-directional change in space motion in response to the change of conditions. Migration and dispersal of animals has evolved under control of natural selection. In a simple formal… ▽ More The notions of taxis and kinesis are introduced and used to describe two types of behavior of an organism in non-uniform conditions: (i) Taxis means the guided movement to more favorable conditions; (ii) Kinesis is the non-directional change in space motion in response to the change of conditions. Migration and dispersal of animals has evolved under control of natural selection. In a simple formalisation, the strategy of dispersal should increase Darwinian fitness. We introduce new models of purposeful kinesis with diffusion coefficient dependent on fitness. The local and instant evaluation of Darwinian fitness is used, the reproduction coefficient. New models include one additional parameter, intensity of kinesis, and may be considered as the {\em minimal models of purposeful kinesis}. The properties of models are explored by a series of numerical experiments. It is demonstrated how kinesis could be beneficial for assimilation of patches of food or of periodic fluctuations. Kinesis based on local and instant estimations of fitness is not always beneficial: for species with the Allee effect it can delay invasion and spreading. It is proven that kinesis cannot modify stability of positive homogeneous steady states. △ Less

Submitted 1 February, 2018; v1 submitted 21 January, 2018; originally announced January 2018.

Comments: Minor amendments: synchronization with the final Journal version

Journal ref: Ecological Complexity, 33, 2018, 75-83

arXiv:1801.03421 [pdf, other]

doi 10.1098/rsta.2017.0237

Blessing of dimensionality: mathematical foundations of the statistical physics of data

Authors: A. N. Gorban, I. Y. Tyukin

Abstract: The concentration of measure phenomena were discovered as the mathematical background of statistical mechanics at the end of the XIX - beginning of the XX century and were then explored in mathematics of the XX-XXI centuries. At the beginning of the XXI century, it became clear that the proper utilisation of these phenomena in machine learning might transform the curse of dimensionality into the b… ▽ More The concentration of measure phenomena were discovered as the mathematical background of statistical mechanics at the end of the XIX - beginning of the XX century and were then explored in mathematics of the XX-XXI centuries. At the beginning of the XXI century, it became clear that the proper utilisation of these phenomena in machine learning might transform the curse of dimensionality into the blessing of dimensionality. This paper summarises recently discovered phenomena of measure concentration which drastically simplify some machine learning problems in high dimension, and allow us to correct legacy artificial intelligence systems. The classical concentration of measure theorems state that i.i.d. random points are concentrated in a thin layer near a surface (a sphere or equators of a sphere, an average or median level set of energy or another Lipschitz function, etc.). The new stochastic separation theorems describe the thin structure of these thin layers: the random points are not only concentrated in a thin layer but are all linearly separable from the rest of the set, even for exponentially large random sets. The linear functionals for separation of points can be selected in the form of the linear Fisher's discriminant. All artificial intelligence systems make errors. Non-destructive correction requires separation of the situations (samples) with errors from the samples corresponding to correct behaviour by a simple and robust classifier. The stochastic separation theorems provide us by such classifiers and a non-iterative (one-shot) procedure for learning. △ Less

Submitted 10 January, 2018; originally announced January 2018.

Comments: Accepted for publication in Philosophical Transactions of the Royal Society A, 2018. Comprises of 17 pages and 4 figures

Journal ref: Phil. Trans. R. Soc. A volume 376, issue 2118, 376 20170237, 2018

arXiv:1710.11227 [pdf, other]

doi 10.1007/s11538-018-0415-5

High-dimensional brain. A tool for encoding and rapid learning of memories by single neurons

Authors: Ivan Y. Tyukin, Alexander N. Gorban, Carlos Calvo, Julia Makarova, Valeri A. Makarov

Abstract: Codifying memories is one of the fundamental problems of modern Neuroscience. The functional mechanisms behind this phenomenon remain largely unknown. Experimental evidence suggests that some of the memory functions are performed by stratified brain structures such as, e.g., the hippocampus. In this particular case, single neurons in the CA1 region receive a highly multidimensional input from the… ▽ More Codifying memories is one of the fundamental problems of modern Neuroscience. The functional mechanisms behind this phenomenon remain largely unknown. Experimental evidence suggests that some of the memory functions are performed by stratified brain structures such as, e.g., the hippocampus. In this particular case, single neurons in the CA1 region receive a highly multidimensional input from the CA3 area, which is a hub for information processing. We thus assess the implication of the abundance of neuronal signalling routes converging onto single cells on the information processing. We show that single neurons can selectively detect and learn arbitrary information items, given that they operate in high dimensions. The argument is based on Stochastic Separation Theorems and the concentration of measure phenomena. We demonstrate that a simple enough functional neuronal model is capable of explaining: i) the extreme selectivity of single neurons to the information content, ii) simultaneous separation of several uncorrelated stimuli or informational items from a large set, and iii) dynamic learning of new items by associating them with already "known" ones. These results constitute a basis for organization of complex memories in ensembles of single neurons. Moreover, they show that no a priori assumptions on the structural organization of neuronal ensembles are necessary for explaining basic concepts of static and dynamic memories. △ Less

Submitted 27 January, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

MSC Class: 92B20; 91E40; 68T05

Journal ref: Bulletin of mathematical biology, 81(11), 4856-4888, 2019

arXiv:1709.01547 [pdf, other]

doi 10.3389/fnbot.2018.00049

Knowledge Transfer Between Artificial Intelligence Systems

Authors: Ivan Y. Tyukin, Alexander N. Gorban, Konstantin Sofeikov, Ilya Romanenko

Abstract: We consider the fundamental question: how a legacy "student" Artificial Intelligent (AI) system could learn from a legacy "teacher" AI system or a human expert without complete re-training and, most importantly, without requiring significant computational resources. Here "learning" is understood as an ability of one system to mimic responses of the other and vice-versa. We call such learning an Ar… ▽ More We consider the fundamental question: how a legacy "student" Artificial Intelligent (AI) system could learn from a legacy "teacher" AI system or a human expert without complete re-training and, most importantly, without requiring significant computational resources. Here "learning" is understood as an ability of one system to mimic responses of the other and vice-versa. We call such learning an Artificial Intelligence knowledge transfer. We show that if internal variables of the "student" Artificial Intelligent system have the structure of an $n$-dimensional topological vector space and $n$ is sufficiently high then, with probability close to one, the required knowledge transfer can be implemented by simple cascades of linear functionals. In particular, for $n$ sufficiently large, with probability close to one, the "student" system can successfully and non-iteratively learn $k\ll n$ new examples from the "teacher" (or correct the same number of mistakes) at the cost of two additional inner products. The concept is illustrated with an example of knowledge transfer from a pre-trained convolutional neural network to a simple linear classifier with HOG features. △ Less

Submitted 14 November, 2017; v1 submitted 5 September, 2017; originally announced September 2017.

MSC Class: 68T05; 68T30

Journal ref: Front Neurorobot. 2018; 12: 49

arXiv:1704.03549 [pdf, other]

Attention-based Extraction of Structured Information from Street View Imagery

Authors: Zbigniew Wojna, Alex Gorban, Dar-Shyang Lee, Kevin Murphy, Qian Yu, Yeqing Li, Julian Ibarz

Abstract: We present a neural network model - based on CNNs, RNNs and a novel attention mechanism - which achieves 84.2% accuracy on the challenging French Street Name Signs (FSNS) dataset, significantly outperforming the previous state of the art (Smith'16), which achieved 72.46%. Furthermore, our new method is much simpler and more general than the previous approach. To demonstrate the generality of our m… ▽ More We present a neural network model - based on CNNs, RNNs and a novel attention mechanism - which achieves 84.2% accuracy on the challenging French Street Name Signs (FSNS) dataset, significantly outperforming the previous state of the art (Smith'16), which achieved 72.46%. Furthermore, our new method is much simpler and more general than the previous approach. To demonstrate the generality of our model, we show that it also performs well on an even more challenging dataset derived from Google Street View, in which the goal is to extract business names from store fronts. Finally, we study the speed/accuracy tradeoff that results from using CNN feature extractors of different depths. Surprisingly, we find that deeper is not always better (in terms of accuracy, as well as speed). Our resulting model is simple, accurate and fast, allowing it to be used at scale on a variety of challenging real-world text extraction problems. △ Less

Submitted 20 August, 2017; v1 submitted 11 April, 2017; originally announced April 2017.

Comments: Updated references, added link to the source code

arXiv:1703.04455 [pdf, ps, other]

doi 10.1007/s00521-019-04687-8

Multivariate Gaussian and Student$-t$ Process Regression for Multi-output Prediction

Authors: Zexun Chen, Bo Wang, Alexander N. Gorban

Abstract: Gaussian process model for vector-valued function has been shown to be useful for multi-output prediction. The existing method for this model is to re-formulate the matrix-variate Gaussian distribution as a multivariate normal distribution. Although it is effective in many cases, re-formulation is not always workable and is difficult to apply to other distributions because not all matrix-variate d… ▽ More Gaussian process model for vector-valued function has been shown to be useful for multi-output prediction. The existing method for this model is to re-formulate the matrix-variate Gaussian distribution as a multivariate normal distribution. Although it is effective in many cases, re-formulation is not always workable and is difficult to apply to other distributions because not all matrix-variate distributions can be transformed to respective multivariate distributions, such as the case for matrix-variate Student$-t$ distribution. In this paper, we propose a unified framework which is used not only to introduce a novel multivariate Student$-t$ process regression model (MV-TPR) for multi-output prediction, but also to reformulate the multivariate Gaussian process regression (MV-GPR) that overcomes some limitations of the existing methods. Both MV-GPR and MV-TPR have closed-form expressions for the marginal likelihoods and predictive distributions under this unified framework and thus can adopt the same optimization approaches as used in the conventional GPR. The usefulness of the proposed methods is illustrated through several simulated and real data examples. In particular, we verify empirically that MV-TPR has superiority for the datasets considered, including air quality prediction and bike rent prediction. At last, the proposed methods are shown to produce profitable investment strategies in the stock markets. △ Less

Submitted 6 January, 2019; v1 submitted 13 March, 2017; originally announced March 2017.

Journal ref: NEURAL COMPUT APPL 32 (2020): 3005-3028

arXiv:1703.01203 [pdf, ps, other]

doi 10.1016/j.neunet.2017.07.014

Stochastic Separation Theorems

Authors: A. N. Gorban, I. Y. Tyukin

Abstract: The problem of non-iterative one-shot and non-destructive correction of unavoidable mistakes arises in all Artificial Intelligence applications in the real world. Its solution requires robust separation of samples with errors from samples where the system works properly. We demonstrate that in (moderately) high dimension this separation could be achieved with probability close to one by linear dis… ▽ More The problem of non-iterative one-shot and non-destructive correction of unavoidable mistakes arises in all Artificial Intelligence applications in the real world. Its solution requires robust separation of samples with errors from samples where the system works properly. We demonstrate that in (moderately) high dimension this separation could be achieved with probability close to one by linear discriminants. Surprisingly, separation of a new image from a very large set of known images is almost always possible even in moderately high dimensions by linear functionals, and coefficients of these functionals can be found explicitly. Based on fundamental properties of measure concentration, we show that for $M<a\exp(b{n})$ random $M$-element sets in $\mathbb{R}^n$ are linearly separable with probability $p$, $p>1-\vartheta$, where $1>\vartheta>0$ is a given small constant. Exact values of $a,b>0$ depend on the probability distribution that determines how the random $M$-element sets are drawn, and on the constant $\vartheta$. These {\em stochastic separation theorems} provide a new instrument for the development, analysis, and assessment of machine learning methods and algorithms in high dimension. Theoretical statements are illustrated with numerical examples. △ Less

Submitted 3 August, 2017; v1 submitted 3 March, 2017; originally announced March 2017.

Comments: 6 pages, accepted for publication in Neural Networks (Letter section)

MSC Class: 68T10 ACM Class: I.2.6

Journal ref: Neural Networks 94 (2017), 255-259

arXiv:1702.02633 [pdf, other]

doi 10.1007/s11004-017-9701-2

Pseudo-Outcrop Visualization of Borehole Images and Core Scans

Authors: Evgeny M. Mirkes, Alexander N. Gorban, Jeremy Levesley, Peter A. S. Elkington, James A. Whetton

Abstract: A pseudo-outcrop visualization is demonstrated for borehole and full-diameter rock core images to augment the ubiquitous unwrapped cylinder view and thereby to assist non-specialist interpreters. The pseudo-outcrop visualization is equivalent to a nonlinear projection of the image from borehole to earth frame of reference that creates a solid volume sliced longitudinally to reveal two or more face… ▽ More A pseudo-outcrop visualization is demonstrated for borehole and full-diameter rock core images to augment the ubiquitous unwrapped cylinder view and thereby to assist non-specialist interpreters. The pseudo-outcrop visualization is equivalent to a nonlinear projection of the image from borehole to earth frame of reference that creates a solid volume sliced longitudinally to reveal two or more faces in which the orientations of geological features indicate what is observed in the subsurface. A proxy for grain size is used to modulate the external dimensions of the plot to mimic profiles seen in real outcrops. The volume is created from a mixture of geological boundary elements and texture, the latter being the residue after the sum of boundary elements is subtracted from the original data. In the case of measurements from wireline microresistivity tools, whose circumferential coverage is substantially less than 100%, the missing circumferential data is first inpainted using multiscale directional transforms, which decompose the image into its elemental building structures, before reconstructing the full image. The pseudo-outcrop view enables direct observation of the angular relationships between features and aids visual comparison between borehole and core images, especially for the interested non-specialist. △ Less

Submitted 3 September, 2017; v1 submitted 8 February, 2017; originally announced February 2017.

Comments: Updated and corrected version with extended set of figures

Journal ref: Mathematical Geosciences, 2017

arXiv:1702.00831 [pdf, other]

doi 10.1080/00107514.2016.1256123

Beyond Navier--Stokes equations: Capillarity of ideal gas

Authors: A. N. Gorban, I. V. Karlin

Abstract: The system of Navier--Stokes--Fourier equations is one of the most celebrated systems of equations in modern science. It describes dynamics of fluids in the limit when gradients of density, velocity and temperature are sufficiently small, and loses its applicability when the flux becomes so non-equilibrium that the changes of velocity, density or temperature on the length compatible with the mean… ▽ More The system of Navier--Stokes--Fourier equations is one of the most celebrated systems of equations in modern science. It describes dynamics of fluids in the limit when gradients of density, velocity and temperature are sufficiently small, and loses its applicability when the flux becomes so non-equilibrium that the changes of velocity, density or temperature on the length compatible with the mean free path are non-negligible. The question is: how to model such fluxes? This problem is still open. (Despite the fact that the first `final equations of motion' modified for analysis of thermal creep in rarefied gas were proposed by Maxwell in 1879.) There are, at least, three possible answers: (i) use molecular dynamics with individual particles, (ii) use kinetic equations, like Boltzmann's equation, or (iii) find a new system of equations for description of fluid dynamics with better accounting of non-equilibrium effects. These three approaches work at different scales. We explore the third possibility using the recent findings of capillarity of internal layers in ideal gases and of saturation effect in dissipation (there is a limiting attenuation rate for very short waves in ideal gas and it cannot increase infinitely). One candidate equation is discussed in more detail, the Korteweg system proposed in 1901. The main ideas and approaches are illustrated by a kinetic system for which the problem of reduction of kinetics to fluid dynamics is analytically solvable. △ Less

Submitted 2 November, 2017; v1 submitted 2 February, 2017; originally announced February 2017.

Comments: Corrected misprints in Eqs (7) and (22), thanks to E. Azadi

Journal ref: Contemporary Physics, 58(1) (2017), 70-90

arXiv:1610.00494 [pdf, ps, other]

doi 10.1016/j.ins.2019.02.001

One-Trial Correction of Legacy AI Systems and Stochastic Separation Theorems

Authors: Alexander N. Gorban, Ilya Romanenko, Richard Burton, Ivan Y. Tyukin

Abstract: We consider the problem of efficient "on the fly" tuning of existing, or {\it legacy}, Artificial Intelligence (AI) systems. The legacy AI systems are allowed to be of arbitrary class, albeit the data they are using for computing interim or final decision responses should posses an underlying structure of a high-dimensional topological real vector space. The tuning method that we propose enables d… ▽ More We consider the problem of efficient "on the fly" tuning of existing, or {\it legacy}, Artificial Intelligence (AI) systems. The legacy AI systems are allowed to be of arbitrary class, albeit the data they are using for computing interim or final decision responses should posses an underlying structure of a high-dimensional topological real vector space. The tuning method that we propose enables dealing with errors without the need to re-train the system. Instead of re-training a simple cascade of perceptron nodes is added to the legacy system. The added cascade modulates the AI legacy system's decisions. If applied repeatedly, the process results in a network of modulating rules "dressing up" and improving performance of existing AI systems. Mathematical rationale behind the method is based on the fundamental property of measure concentration in high dimensional spaces. The method is illustrated with an example of fine-tuning a deep convolutional network that has been pre-trained to detect pedestrians in images. △ Less

Submitted 13 February, 2019; v1 submitted 3 October, 2016; originally announced October 2016.

Journal ref: Information Sciences, 484, 237-254, 2019

arXiv:1607.02180 [pdf, other]

Minimal cover of high-dimensional chaotic attractors by embedded coherent structures

Authors: Daniel L. Crane, Ruslan L. Davidchack, Alexander N. Gorban

Abstract: We propose a general method for constructing a minimal cover of high-dimensional chaotic attractors by embedded coherent structures, such as (but not limited to) periodic orbits. By minimal cover we mean a subset of available coherent structures such that the approximation of chaotic dynamics by a minimal cover with a predefined proximity threshold is as good as the approximation by the full avail… ▽ More We propose a general method for constructing a minimal cover of high-dimensional chaotic attractors by embedded coherent structures, such as (but not limited to) periodic orbits. By minimal cover we mean a subset of available coherent structures such that the approximation of chaotic dynamics by a minimal cover with a predefined proximity threshold is as good as the approximation by the full available set. The proximity measure can be chosen with considerable freedom and adapted to the properties of any given chaotic system. In the context of a Kuramoto-Sivashinsky chaotic attractor, we demonstrate that the minimal cover can be faithfully constructed even when the proximity measure is defined within a subspace of dimension much smaller than the dimension of space containing the attractor. We discuss how the minimal cover can be used to provide a reduced description of the attractor structure and the dynamics on it. △ Less

Submitted 24 December, 2017; v1 submitted 7 July, 2016; originally announced July 2016.

Comments: 5 pages, 3 figures

arXiv:1605.06276 [pdf, ps, other]

doi 10.1016/j.neunet.2016.08.007

Piece-wise quadratic approximations of arbitrary error functions for fast and robust machine learning

Authors: A. N. Gorban, E. M. Mirkes, A. Zinovyev

Abstract: Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrated many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore,… ▽ More Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrated many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore, a lot of recent applications in machine learning exploited properties of non-quadratic error functionals based on $L_1$ norm or even sub-linear potentials corresponding to quasinorms $L_p$ ($0<p<1$). The back side of these approaches is increase in computational cost for optimization. Till so far, no approaches have been suggested to deal with {\it arbitrary} error functionals, in a flexible and computationally efficient framework. In this paper, we develop a theory and basic universal data approximation algorithms ($k$-means, principal components, principal manifolds and graphs, regularized and sparse regression), based on piece-wise quadratic error potentials of subquadratic growth (PQSQ potentials). We develop a new and universal framework to minimize {\it arbitrary sub-quadratic error potentials} using an algorithm with guaranteed fast convergence to the local or global error minimum. The theory of PQSQ potentials is based on the notion of the cone of minorant functions, and represents a natural approximation formalism based on the application of min-plus algebra. The approach can be applied in most of existing machine learning methods, including methods of data approximation and regularized and sparse regression, leading to the improvement in the computational cost/accuracy trade-off. We demonstrate that on synthetic and real-life datasets PQSQ-based machine learning methods achieve orders of magnitude faster computational performance than the corresponding state-of-the-art methods. △ Less

Submitted 21 August, 2016; v1 submitted 20 May, 2016; originally announced May 2016.

Comments: Edited and extended version with algortihms of regularized regression

Journal ref: Neural Networks, Volume 84, December 2016, 28-38

arXiv:1604.06182 [pdf, other]

doi 10.1016/j.cviu.2016.10.018

The THUMOS Challenge on Action Recognition for Videos "in the Wild"

Authors: Haroon Idrees, Amir R. Zamir, Yu-Gang Jiang, Alex Gorban, Ivan Laptev, Rahul Sukthankar, Mubarak Shah

Abstract: Automatically recognizing and localizing wide ranges of human actions has crucial importance for video understanding. Towards this goal, the THUMOS challenge was introduced in 2013 to serve as a benchmark for action recognition. Until then, video action recognition, including THUMOS challenge, had focused primarily on the classification of pre-segmented (i.e., trimmed) videos, which is an artifici… ▽ More Automatically recognizing and localizing wide ranges of human actions has crucial importance for video understanding. Towards this goal, the THUMOS challenge was introduced in 2013 to serve as a benchmark for action recognition. Until then, video action recognition, including THUMOS challenge, had focused primarily on the classification of pre-segmented (i.e., trimmed) videos, which is an artificial task. In THUMOS 2014, we elevated action recognition to a more practical level by introducing temporally untrimmed videos. These also include `background videos' which share similar scenes and backgrounds as action videos, but are devoid of the specific actions. The three editions of the challenge organized in 2013--2015 have made THUMOS a common benchmark for action classification and detection and the annual challenge is widely attended by teams from around the world. In this paper we describe the THUMOS benchmark in detail and give an overview of data collection and annotation procedures. We present the evaluation protocols used to quantify results in the two THUMOS tasks of action classification and temporal detection. We also present results of submissions to the THUMOS 2015 challenge and review the participating approaches. Additionally, we include a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimmed videos, and how well methods trained on trimmed videos generalize to untrimmed videos. We conclude by proposing several directions and improvements for future THUMOS challenges. △ Less

Submitted 21 April, 2016; originally announced April 2016.

Comments: Preprint submitted to Computer Vision and Image Understanding

arXiv:1604.00627 [pdf, ps, other]

doi 10.1016/j.compbiomed.2016.06.004

Handling missing data in large healthcare dataset: a case study of unknown trauma outcomes

Authors: E. M. Mirkes, T. J. Coats, J. Levesley, A. N. Gorban

Abstract: Handling of missed data is one of the main tasks in data preprocessing especially in large public service datasets. We have analysed data from the Trauma Audit and Research Network (TARN) database, the largest trauma database in Europe. For the analysis we used 165,559 trauma cases. Among them, there are 19,289 cases (13.19\%) with unknown outcome. We have demonstrated that these outcomes are not… ▽ More Handling of missed data is one of the main tasks in data preprocessing especially in large public service datasets. We have analysed data from the Trauma Audit and Research Network (TARN) database, the largest trauma database in Europe. For the analysis we used 165,559 trauma cases. Among them, there are 19,289 cases (13.19\%) with unknown outcome. We have demonstrated that these outcomes are not missed `completely at random' and, hence, it is impossible just to exclude these cases from analysis despite the large amount of available data. We have developed a system of non-stationary Markov models for the handling of missed outcomes and validated these models on the data of 15,437 patients which arrived into TARN hospitals later than 24 hours but within 30 days from injury. We used these Markov models for the analysis of mortality. In particular, we corrected the observed fraction of death. Two naïve approaches give 7.20\% (available case study) or 6.36\% (if we assume that all unknown outcomes are `alive'). The corrected value is 6.78\%. Following the seminal paper of Trunkey (1983) the multimodality of mortality curves has become a much discussed idea. For the whole analysed TARN dataset the coefficient of mortality monotonically decreases in time but the stratified analysis of the mortality gives a different result: for lower severities the coefficient of mortality is a non-monotonic function of the time after injury and may have maxima at the second and third weeks. The approach developed here can be applied to various healthcare datasets which experience the problem of lost patients and missed outcomes. △ Less

Submitted 18 May, 2020; v1 submitted 3 April, 2016; originally announced April 2016.

Comments: Minor editing and additions

Journal ref: Computers in Biology and Medicine, 75 (2016) 203-216

arXiv:1603.06828 [pdf, other]

doi 10.5445/KSP/1000058749/11

Robust principal graphs for data approximation

Authors: A. N. Gorban, E. M. Mirkes, A. Zinovyev

Abstract: Revealing hidden geometry and topology in noisy data sets is a challenging task. Elastic principal graph is a computationally efficient and flexible data approximator based on embedding a graph into the data space and minimizing the energy functional penalizing the deviation of graph nodes both from data points and from pluri-harmonic configuration (generalization of linearity). The structure of p… ▽ More Revealing hidden geometry and topology in noisy data sets is a challenging task. Elastic principal graph is a computationally efficient and flexible data approximator based on embedding a graph into the data space and minimizing the energy functional penalizing the deviation of graph nodes both from data points and from pluri-harmonic configuration (generalization of linearity). The structure of principal graph is learned from data by application of a topological grammar which in the simplest case leads to the construction of principal curves or trees. In order to more efficiently cope with noise and outliers, here we suggest using a trimmed data approximation term to increase the robustness of the method. The modification of the method that we suggest does not affect either computational efficiency or general convergence properties of the original elastic graph method. The trimmed elastic energy functional remains a Lyapunov function for the optimization algorithm. On several examples of complex data distributions we demonstrate how the robust principal graphs learn the global data structure and show the advantage of using the trimmed data approximation term for the construction of principal graphs and other popular data approximators. △ Less

Submitted 24 November, 2016; v1 submitted 22 March, 2016; originally announced March 2016.

Comments: A talk given at ECDA2015 (European Conference on Data Analysis, September 2nd to 4th 2015, University of Essex, Colchester, UK), to be published in Archives of Data Science

Journal ref: Archives of Data Science, Series A, Vol. 2, No. 1, 2017

arXiv:1512.03949 [pdf, ps, other]

doi 10.1016/j.jtbi.2015.12.017

Evolution of adaptation mechanisms: adaptation energy, stress, and oscillating death

Authors: A. N. Gorban, T. A. Tyukina, E. V. Smirnova, L. I. Pokidysheva

Abstract: In 1938, H. Selye proposed the notion of adaptation energy and published "Experimental evidence supporting the conception of adaptation energy". Adaptation of an animal to different factors appears as the spending of one resource. Adaptation energy is a hypothetical extensive quantity spent for adaptation. This term causes much debate when one takes it literally, as a physical quantity, i.e. a sor… ▽ More In 1938, H. Selye proposed the notion of adaptation energy and published "Experimental evidence supporting the conception of adaptation energy". Adaptation of an animal to different factors appears as the spending of one resource. Adaptation energy is a hypothetical extensive quantity spent for adaptation. This term causes much debate when one takes it literally, as a physical quantity, i.e. a sort of energy. The controversial points of view impede the systematic use of the notion of adaptation energy despite experimental evidence. Nevertheless, the response to many harmful factors often has general non-specific form and we suggest that the mechanisms of physiological adaptation admit a very general and nonspecific description. We aim to demonstrate that Selye's adaptation energy is the cornerstone of the top-down approach to modelling of non-specific adaptation processes. We analyse Selye's axioms of adaptation energy together with Goldstone's modifications and propose a series of models for interpretation of these axioms. {\em Adaptation energy is considered as an internal coordinate on the `dominant path' in the model of adaptation}. The phenomena of `oscillating death' and `oscillating remission' are predicted on the base of the dynamical models of adaptation. Natural selection plays a key role in the evolution of mechanisms of physiological adaptation. We use the fitness optimization approach to study of the distribution of resources for neutralization of harmful factors, during adaptation to a multifactor environment, and analyse the optimal strategies for different systems of factors. △ Less

Submitted 17 March, 2016; v1 submitted 12 December, 2015; originally announced December 2015.

Journal ref: Journal of Theoretical Biology Volume 405, 21 September 2016, Pages 127-139

arXiv:1511.03054 [pdf, ps, other]

Fast Sampling of Evolving Systems with Periodic Trajectories

Authors: I. Yu. Tyukin, A. N. Gorban, T. A. Tyukina, J. Al Ameri, Yu. A. Korablev

Abstract: We propose a novel method for fast and scalable evaluation of periodic solutions of systems of ordinary differential equations for a given set of parameter values and initial conditions. The equations governing the system dynamics are supposed to be of a special class, albeit admitting nonlinear parametrization and state nonlinearities. The method enables to represent a given periodic solution as… ▽ More We propose a novel method for fast and scalable evaluation of periodic solutions of systems of ordinary differential equations for a given set of parameter values and initial conditions. The equations governing the system dynamics are supposed to be of a special class, albeit admitting nonlinear parametrization and state nonlinearities. The method enables to represent a given periodic solution as sums of computable integrals and functions that are explicitly dependent on parameters of interest and initial conditions. This allows invoking parallel computational streams in order to increase speed of calculations. Performance and practical implications of the method are illustrated with examples including classical predator-prey system and models of neuronal cells. △ Less

Submitted 27 May, 2016; v1 submitted 10 November, 2015; originally announced November 2015.

Comments: arXiv admin note: substantial text overlap with arXiv:1304.1648

MSC Class: 93B30; 34A05; 92B99; 93B15

arXiv:1511.02917 [pdf, other]

Detecting events and key actors in multi-person videos

Authors: Vignesh Ramanathan, Jonathan Huang, Sami Abu-El-Haija, Alexander Gorban, Kevin Murphy, Li Fei-Fei

Abstract: Multi-person event recognition is a challenging task, often with many people active in the scene but only a small subset contributing to an actual event. In this paper, we propose a model which learns to detect events in such videos while automatically "attending" to the people responsible for the event. Our model does not use explicit annotations regarding who or where those people are during tra… ▽ More Multi-person event recognition is a challenging task, often with many people active in the scene but only a small subset contributing to an actual event. In this paper, we propose a model which learns to detect events in such videos while automatically "attending" to the people responsible for the event. Our model does not use explicit annotations regarding who or where those people are during training and testing. In particular, we track people in videos and use a recurrent neural network (RNN) to represent the track features. We learn time-varying attention weights to combine these features at each time-instant. The attended features are then processed using another RNN for event detection/classification. Since most video datasets with multiple people are restricted to a small number of videos, we also collected a new basketball dataset comprising 257 basketball games with 14K event annotations corresponding to 11 event classes. Our model outperforms state-of-the-art methods for both event classification and detection on this new dataset. Additionally, we show that the attention mechanism is able to consistently localize the relevant players. △ Less

Submitted 16 March, 2016; v1 submitted 9 November, 2015; originally announced November 2015.

Comments: Accepted for publication in CVPR'16

arXiv:1506.06297 [pdf, ps, other]

The Five Factor Model of personality and evaluation of drug consumption risk

Authors: E. Fehrman, A. K. Muhammad, E. M. Mirkes, V. Egan, A. N. Gorban

Abstract: The problem of evaluating an individual's risk of drug consumption and misuse is highly important. An online survey methodology was employed to collect data including Big Five personality traits (NEO-FFI-R), impulsivity (BIS-11), sensation seeking (ImpSS), and demographic information. The data set contained information on the consumption of 18 central nervous system psychoactive drugs. Correlation… ▽ More The problem of evaluating an individual's risk of drug consumption and misuse is highly important. An online survey methodology was employed to collect data including Big Five personality traits (NEO-FFI-R), impulsivity (BIS-11), sensation seeking (ImpSS), and demographic information. The data set contained information on the consumption of 18 central nervous system psychoactive drugs. Correlation analysis demonstrated the existence of groups of drugs with strongly correlated consumption patterns. Three correlation pleiades were identified, named by the central drug in the pleiade: ecstasy, heroin, and benzodiazepines pleiades. An exhaustive search was performed to select the most effective subset of input features and data mining methods to classify users and non-users for each drug and pleiad. A number of classification methods were employed (decision tree, random forest, $k$-nearest neighbors, linear discriminant analysis, Gaussian mixture, probability density function estimation, logistic regression and na{ï}ve Bayes) and the most effective classifier was selected for each drug. The quality of classification was surprisingly high with sensitivity and specificity (evaluated by leave-one-out cross-validation) being greater than 70\% for almost all classification tasks. The best results with sensitivity and specificity being greater than 75\% were achieved for cannabis, crack, ecstasy, legal highs, LSD, and volatile substance abuse (VSA). △ Less

Submitted 15 January, 2017; v1 submitted 20 June, 2015; originally announced June 2015.

Comments: Significantly extended report with 67 pages, 27 tables, 21 figures

arXiv:1506.04631 [pdf, ps, other]

doi 10.1016/j.ins.2015.09.021

Approximation with Random Bases: Pro et Contra

Authors: Alexander N. Gorban, Ivan Yu. Tyukin, Danil V. Prokhorov, Konstantin I. Sofeikov

Abstract: In this work we discuss the problem of selecting suitable approximators from families of parameterized elementary functions that are known to be dense in a Hilbert space of functions. We consider and analyze published procedures, both randomized and deterministic, for selecting elements from these families that have been shown to ensure the rate of convergence in $L_2$ norm of order $O(1/N)$, wher… ▽ More In this work we discuss the problem of selecting suitable approximators from families of parameterized elementary functions that are known to be dense in a Hilbert space of functions. We consider and analyze published procedures, both randomized and deterministic, for selecting elements from these families that have been shown to ensure the rate of convergence in $L_2$ norm of order $O(1/N)$, where $N$ is the number of elements. We show that both randomized and deterministic procedures are successful if additional information about the families of functions to be approximated is provided. In the absence of such additional information one may observe exponential growth of the number of terms needed to approximate the function and/or extreme sensitivity of the outcome of the approximation to parameters. Implications of our analysis for applications of neural networks in modeling and control are illustrated with examples. △ Less

Submitted 24 October, 2015; v1 submitted 15 June, 2015; originally announced June 2015.

Comments: arXiv admin note: text overlap with arXiv:0905.0677

MSC Class: 41A45; 41A45; 90C59; 92B20; 68W20

Journal ref: Information Sciences 364-365, 10 October 2016, Pages 129-145

arXiv:1505.01440 [pdf, ps, other]

doi 10.1051/mmnp/201510316

Leaders do not look back, or do they?

Authors: Alexander N. Gorban, Nick Jarman, Erik Steur, Cees van Leeuwen, Ivan Tyukin

Abstract: We study the effect of adding to a directed chain of interconnected systems a directed feedback from the last element in the chain to the first. The problem is closely related to the fundamental question of how a change in network topology may influence the behavior of coupled systems. We begin the analysis by investigating a simple linear system. The matrix that specifies the system dynamics is t… ▽ More We study the effect of adding to a directed chain of interconnected systems a directed feedback from the last element in the chain to the first. The problem is closely related to the fundamental question of how a change in network topology may influence the behavior of coupled systems. We begin the analysis by investigating a simple linear system. The matrix that specifies the system dynamics is the transpose of the network Laplacian matrix, which codes the connectivity of the network. Our analysis shows that for any nonzero complex eigenvalue $λ$ of this matrix, the following inequality holds: $\frac{|\Im λ|}{|\Re λ|} \leq \cot\fracπ{n}$. This bound is sharp, as it becomes an equality for an eigenvalue of a simple directed cycle with uniform interaction weights. The latter has the slowest decay of oscillations among all other network configurations with the same number of states. The result is generalized to directed rings and chains of identical nonlinear oscillators. For directed rings, a lower bound $σ_c$ for the connection strengths that guarantees asymptotic synchronization is found to follow a similar pattern: $σ_c=\frac{1}{1-\cos\left( 2π/n\right)} $. Numerical analysis revealed that, depending on the network size $n$, multiple dynamic regimes co-exist in the state space of the system. In addition to the fully synchronous state a rotating wave solution occurs. The effect is observed in networks exceeding a certain critical size. The emergence of a rotating wave highlights the importance of long chains and loops in networks of oscillators: the larger the size of chains and loops, the more sensitive the network dynamics becomes to removal or addition of a single connection. △ Less

Submitted 21 May, 2015; v1 submitted 6 May, 2015; originally announced May 2015.

MSC Class: 34A30; 34D06; 34D45; 92B20; 92B25

Journal ref: Mathematical Modelling of Natural Phenomena, Vol. 10, No. 3, 2015, 212-231

arXiv:1504.08317 [pdf, ps, other]

Generalized Mass Action Law and Thermodynamics of Nonlinear Markov Processes

Authors: A. N. Gorban, V. N. Kolokoltsov

Abstract: The nonlinear Markov processes are the measure-valued dynamical systems which preserve positivity. They can be represented as the law of large numbers limits of general Markov models of interacting particles. In physics, the kinetic equations allow Lyapunov functionals (entropy, free energy, etc.). This may be considered as a sort of inheritance of the Lyapunov functionals from the microscopic mas… ▽ More The nonlinear Markov processes are the measure-valued dynamical systems which preserve positivity. They can be represented as the law of large numbers limits of general Markov models of interacting particles. In physics, the kinetic equations allow Lyapunov functionals (entropy, free energy, etc.). This may be considered as a sort of inheritance of the Lyapunov functionals from the microscopic master equations. We study nonlinear Markov processes that inherit thermodynamic properties from the microscopic linear Markov processes. We develop the thermodynamics of nonlinear Markov processes and analyze the asymptotic assumption, which are sufficient for this inheritance. △ Less

Submitted 30 April, 2015; originally announced April 2015.

Journal ref: Mathematical Modelling of Natural Phenomena 10 (5), 16-46, 2015

arXiv:1503.05869 [pdf]

Long and short range multi-locus QTL interactions in a complex trait of yeast

Authors: Evgeny M. Mirkes, Thomas Walsh, Edward J. Louis, Alexander N. Gorban

Abstract: We analyse interactions of Quantitative Trait Loci (QTL) in heat selected yeast by comparing them to an unselected pool of random individuals. Here we re-examine data on individual F12 progeny selected for heat tolerance, which have been genotyped at 25 locations identified by sequencing a selected pool [Parts, L., Cubillos, F. A., Warringer, J., Jain, K., Salinas, F., Bumpstead, S. J., Molin, M.,… ▽ More We analyse interactions of Quantitative Trait Loci (QTL) in heat selected yeast by comparing them to an unselected pool of random individuals. Here we re-examine data on individual F12 progeny selected for heat tolerance, which have been genotyped at 25 locations identified by sequencing a selected pool [Parts, L., Cubillos, F. A., Warringer, J., Jain, K., Salinas, F., Bumpstead, S. J., Molin, M., Zia, A., Simpson, J. T., Quail, M. A., Moses, A., Louis, E. J., Durbin, R., and Liti, G. (2011). Genome research, 21(7), 1131-1138]. 960 individuals were genotyped at these locations and multi-locus genotype frequencies were compared to 172 sequenced individuals from the original unselected pool (a control group). Various non-random associations were found across the genome, both within chromosomes and between chromosomes. Some of the non-random associations are likely due to retention of linkage disequilibrium in the F12 population, however many, including the inter-chromosomal interactions, must be due to genetic interactions in heat tolerance. One region of particular interest involves 3 linked loci on chromosome IV where the central variant responsible for heat tolerance is antagonistic, coming from the heat sensitive parent and the flanking ones are from the more heat tolerant parent. The 3-locus haplotypes in the selected individuals represent a highly biased sample of the population haplotypes with rare double recombinants in high frequency. These were missed in the original analysis and would never be seen without the multigenerational approach. We show that a statistical analysis of entropy and information gain in genotypes of a selected population can reveal further interactions than previously seen. Importantly this must be done in comparison to the unselected population's genotypes to account for inherent biases in the original population. △ Less

Submitted 19 March, 2015; originally announced March 2015.

arXiv:1412.0524 [pdf, ps, other]

doi 10.1109/CDC.2014.7039621

Further Results on Lyapunov-Like Conditions of Forward Invariance and Boundedness for a Class of Unstable Systems

Authors: A. N. Gorban, I. Yu. Tyukin, H. Nijmeijer

Abstract: We provide several characterizations of convergence to unstable equilibria in nonlinear systems. Our current contribution is three-fold. First we present simple algebraic conditions for establishing local convergence of non-trivial solutions of nonlinear systems to unstable equilibria. The conditions are based on the earlier work (A.N. Gorban, I.Yu. Tyukin, E. Steur, and H. Nijmeijer, SIAM Journal… ▽ More We provide several characterizations of convergence to unstable equilibria in nonlinear systems. Our current contribution is three-fold. First we present simple algebraic conditions for establishing local convergence of non-trivial solutions of nonlinear systems to unstable equilibria. The conditions are based on the earlier work (A.N. Gorban, I.Yu. Tyukin, E. Steur, and H. Nijmeijer, SIAM Journal on Control and Optimization, Vol. 51, No. 3, 2013) and can be viewed as an extension of the Lyapunov's first method in that they apply to systems in which the corresponding Jacobian has one zero eigenvalue. Second, we show that for a relevant subclass of systems, persistency of excitation of a function of time in the right-hand side of the equations governing dynamics of the system ensure existence of an attractor basin such that solutions passing through this basin in forward time converge to the origin exponentially. Finally we demonstrate that conditions developed in (A.N. Gorban, I.Yu. Tyukin, E. Steur, and H. Nijmeijer, SIAM Journal on Control and Optimization, Vol. 51, No. 3, 2013) may be remarkably tight. △ Less

Submitted 1 December, 2014; originally announced December 2014.

Comments: 53d IEEE Conference on Decision and Control, Los-Angeles, USA, 2014

MSC Class: 37B25

Journal ref: Proceedings, 53rd IEEE Conference on Decision and Control, December 15-17, 2014. Los Angeles, California, USA, 1557-1562

arXiv:1407.6927 [pdf, ps, other]

doi 10.1016/j.rinp.2014.09.002

Detailed balance in micro- and macrokinetics and micro-distinguishability of macro-processes

Authors: A. N. Gorban

Abstract: We develop a general framework for the discussion of detailed balance and analyse its microscopic background. We find that there should be two additions to the well-known $T$- or $PT$-invariance of the microscopic laws of motion: 1. Equilibrium should not spontaneously break the relevant $T$- or $PT$-symmetry. 2. The macroscopic processes should be microscopically distinguishable to guarantee… ▽ More We develop a general framework for the discussion of detailed balance and analyse its microscopic background. We find that there should be two additions to the well-known $T$- or $PT$-invariance of the microscopic laws of motion: 1. Equilibrium should not spontaneously break the relevant $T$- or $PT$-symmetry. 2. The macroscopic processes should be microscopically distinguishable to guarantee persistence of detailed balance in the model reduction from micro- to macrokinetics. We briefly discuss examples of the violation of these rules and the corresponding violation of detailed balance. △ Less

Submitted 8 September, 2014; v1 submitted 25 July, 2014; originally announced July 2014.

Comments: 7 pages, extended version with new sections: "Reciprocal relation and detailed balance" and "Relations between elementary processes beyond microreversibility and detailed balance."

Journal ref: Results in Physics 4 (2014), 142-147

arXiv:1406.5550 [pdf, other]

ViDaExpert: user-friendly tool for nonlinear visualization and analysis of multidimensional vectorial data

Authors: Alexander N. Gorban, Alexander Pitenko, Andrei Zinovyev

Abstract: ViDaExpert is a tool for visualization and analysis of multidimensional vectorial data. ViDaExpert is able to work with data tables of "object-feature" type that might contain numerical feature values as well as textual labels for rows (objects) and columns (features). ViDaExpert implements several statistical methods such as standard and weighted Principal Component Analysis (PCA) and the method… ▽ More ViDaExpert is a tool for visualization and analysis of multidimensional vectorial data. ViDaExpert is able to work with data tables of "object-feature" type that might contain numerical feature values as well as textual labels for rows (objects) and columns (features). ViDaExpert implements several statistical methods such as standard and weighted Principal Component Analysis (PCA) and the method of elastic maps (non-linear version of PCA), Linear Discriminant Analysis (LDA), multilinear regression, K-Means clustering, a variant of decision tree construction algorithm. Equipped with several user-friendly dialogs for configuring data point representations (size, shape, color) and fast 3D viewer, ViDaExpert is a handy tool allowing to construct an interactive 3D-scene representing a table of data in multidimensional space and perform its quick and insightfull statistical analysis, from basic to advanced methods. △ Less

Submitted 27 June, 2014; v1 submitted 20 June, 2014; originally announced June 2014.

MSC Class: 68N01; 68W25 ACM Class: G.3; G.4

arXiv:1310.0406 [pdf, ps, other]

doi 10.1090/S0273-0979-2013-01439-3

Hilbert's 6th Problem: Exact and Approximate Hydrodynamic Manifolds for Kinetic Equations

Authors: A. N. Gorban, I. Karlin

Abstract: The problem of the derivation of hydrodynamics from the Boltzmann equation and related dissipative systems is formulated as the problem of slow invariant manifold in the space of distributions. We review a few instances where such hydrodynamic manifolds were found analytically both as the result of summation of the Chapman--Enskog asymptotic expansion and by the direct solution of the invariance e… ▽ More The problem of the derivation of hydrodynamics from the Boltzmann equation and related dissipative systems is formulated as the problem of slow invariant manifold in the space of distributions. We review a few instances where such hydrodynamic manifolds were found analytically both as the result of summation of the Chapman--Enskog asymptotic expansion and by the direct solution of the invariance equation. These model cases, comprising Grad's moment systems, both linear and nonlinear, are studied in depth in order to gain understanding of what can be expected for the Boltzmann equation. Particularly, the dispersive dominance and saturation of dissipation rate of the exact hydrodynamics in the short-wave limit and the viscosity modification at high divergence of the flow velocity are indicated as severe obstacles to the resolution of Hilbert's 6th Problem. Furthermore, we review the derivation of the approximate hydrodynamic manifold for the Boltzmann equation using Newton's iteration and avoiding smallness parameters, and compare this to the exact solutions. Additionally, we discuss the problem of projection of the Boltzmann equation onto the approximate hydrodynamic invariant manifold using entropy concepts. Finally, a set of hypotheses is put forward where we describe open questions and set a horizon for what can be derived exactly or proven about the hydrodynamic manifolds for the Boltzmann equation in the future. △ Less

Submitted 3 October, 2013; v1 submitted 1 October, 2013; originally announced October 2013.

Comments: 58 pages, 8 Figures (v2: Technical improvement of eps files for better compatibility)

Journal ref: Bull. Amer. Math. Soc., 51(2), 2014, 186-246, Posted online: November 20, 2013

arXiv:1307.8339 [pdf]

doi 10.1088/1742-6596/490/1/012081

Multiscale principal component analysis

Authors: A. A. Akinduko, A. N. Gorban

Abstract: Principal component analysis (PCA) is an important tool in exploring data. The conventional approach to PCA leads to a solution which favours the structures with large variances. This is sensitive to outliers and could obfuscate interesting underlying structures. One of the equivalent definitions of PCA is that it seeks the subspaces that maximize the sum of squared pairwise distances between data… ▽ More Principal component analysis (PCA) is an important tool in exploring data. The conventional approach to PCA leads to a solution which favours the structures with large variances. This is sensitive to outliers and could obfuscate interesting underlying structures. One of the equivalent definitions of PCA is that it seeks the subspaces that maximize the sum of squared pairwise distances between data projections. This definition opens up more flexibility in the analysis of principal components which is useful in enhancing PCA. In this paper we introduce scales into PCA by maximizing only the sum of pairwise distances between projections for pairs of datapoints with distances within a chosen interval of values [l,u]. The resulting principal component decompositions in Multiscale PCA depend on point (l,u) on the plane and for each point we define projectors onto principal components. Cluster analysis of these projectors reveals the structures in the data at various scales. Each structure is described by the eigenvectors at the medoid point of the cluster which represent the structure. We also use the distortion of projections as a criterion for choosing an appropriate scale especially for data with outliers. This method was tested on both artificial distribution of data and real data. For data with multiscale structures, the method was able to reveal the different structures of the data and also to reduce the effect of outliers in the principal component analysis. △ Less

Submitted 31 July, 2013; originally announced July 2013.

Comments: 24 pages, 22 figures

arXiv:1307.8308 [pdf]

doi 10.1088/1742-6596/490/1/012082

Is it possible to predict long-term success with k-NN? Case Study of four market indices (FTSE100, DAX, HANGSENG, NASDAQ)

Authors: Y. Shi, A. N. Gorban, T. Y. Yang

Abstract: This case study tests the possibility of prediction for "success" (or "winner") components of four stock & shares market indices in a time period of three years from 02-Jul-2009 to 29-Jun-2012.We compare their performance ain two time frames: initial frame three months at the beginning (02/06/2009-30/09/2009) and the final three month frame (02/04/2012-29/06/2012). To label the components, average… ▽ More This case study tests the possibility of prediction for "success" (or "winner") components of four stock & shares market indices in a time period of three years from 02-Jul-2009 to 29-Jun-2012.We compare their performance ain two time frames: initial frame three months at the beginning (02/06/2009-30/09/2009) and the final three month frame (02/04/2012-29/06/2012). To label the components, average price ratio between two time frames in descending order is computed. The average price ratio is defined as the ratio between the mean prices of the beginning and final time period. The "winner" components are referred to the top one third of total components in the same order as average price ratio it means the mean price of final time period is relatively higher than the beginning time period. The "loser" components are referred to the last one third of total components in the same order as they have higher mean prices of beginning time period. We analyse, is there any information about the winner-looser separation in the initial fragments of the daily closing prices log-returns time series. The Leave-One-Out Cross-Validation with k-NN algorithm is applied on the daily log-return of components using a distance and proximity in the experiment. By looking at the error analysis, it shows that for HANGSENG and DAX index, there are clear signs of possibility to evaluate the probability of long-term success. The correlation distance matrix histograms and 2-D/3-D elastic maps generated from ViDaExpert show that the winner components are closer to each other and winner/loser components are separable on elastic maps for HANGSENG and DAX index while for the negative possibility indices, there is no sign of separation. △ Less

Submitted 31 July, 2013; originally announced July 2013.

Comments: 21 pages, 14 figures

arXiv:1305.4942 [pdf, ps, other]

doi 10.1016/j.compbiomed.2014.08.006

Computational diagnosis and risk evaluation for canine lymphoma

Authors: E. M. Mirkes, I. Alexandrakis, K. Slater, R. Tuli, A. N. Gorban

Abstract: The canine lymphoma blood test detects the levels of two biomarkers, the acute phase proteins (C-Reactive Protein and Haptoglobin). This test can be used for diagnostics, for screening, and for remission monitoring as well. We analyze clinical data, test various machine learning methods and select the best approach to these problems. Three family of methods, decision trees, kNN (including advanced… ▽ More The canine lymphoma blood test detects the levels of two biomarkers, the acute phase proteins (C-Reactive Protein and Haptoglobin). This test can be used for diagnostics, for screening, and for remission monitoring as well. We analyze clinical data, test various machine learning methods and select the best approach to these problems. Three family of methods, decision trees, kNN (including advanced and adaptive kNN) and probability density evaluation with radial basis functions, are used for classification and risk estimation. Several pre-processing approaches were implemented and compared. The best of them are used to create the diagnostic system. For the differential diagnosis the best solution gives the sensitivity and specificity of 83.5% and 77%, respectively (using three input features, CRP, Haptoglobin and standard clinical symptom). For the screening task, the decision tree method provides the best result, with sensitivity and specificity of 81.4% and >99%, respectively (using the same input features). If the clinical symptoms (Lymphadenopathy) are considered as unknown then a decision tree with CRP and Hapt only provides sensitivity 69% and specificity 83.5%. The lymphoma risk evaluation problem is formulated and solved. The best models are selected as the system for computational lymphoma diagnosis and evaluation the risk of lymphoma as well. These methods are implemented into a special web-accessed software and are applied to problem of monitoring dogs with lymphoma after treatment. It detects recurrence of lymphoma up to two months prior to the appearance of clinical signs. The risk map visualisation provides a friendly tool for explanatory data analysis. △ Less

Submitted 3 July, 2014; v1 submitted 21 May, 2013; originally announced May 2013.

Comments: 24 pages, 86 references in the bibliography, Significantly extended version with review of lymphoma biomarkers and data mining methods (Three new sections are added: 1.1. Biomarkers for canine lymphoma, 1.2. Acute phase proteins as lymphoma biomarkers and 3.1. Data mining methods for biomarker cancer diagnosis. Flowcharts of data analysis are included as supplementary material (20 pages)

Journal ref: Computers in Biology and Medicine, Volume 53, 1 October 2014, 279-290

arXiv:1304.1648 [pdf, ps, other]

Explicit Reduced-Order Integral Formulations of State and Parameter Estimation Problems for a Class of Nonlinear Systems

Authors: I. Yu. Tyukin, A. N. Gorban

Abstract: We propose a technique for reformulation of state and parameter estimation problems as that of matching explicitly computable definite integrals with known kernels to data. The technique applies for a class of systems of nonlinear ordinary differential equations and is aimed to exploit parallel computational streams in order to increase speed of calculations. The idea is based on the classical ada… ▽ More We propose a technique for reformulation of state and parameter estimation problems as that of matching explicitly computable definite integrals with known kernels to data. The technique applies for a class of systems of nonlinear ordinary differential equations and is aimed to exploit parallel computational streams in order to increase speed of calculations. The idea is based on the classical adaptive observers design. It has been shown that in case the data is periodic it may be possible to reduce dimensionality of the inference problem to that of the dimension of the vector of parameters entering the right-hand side of the model nonlinearly. Performance and practical implications of the method are illustrated on a benchmark model governing dynamics of voltage in generated in barnacle giant muscle. △ Less

Submitted 10 September, 2013; v1 submitted 5 April, 2013; originally announced April 2013.

MSC Class: 93B40; 93B30

arXiv:1303.3855 [pdf, ps, other]

doi 10.1016/j.camwa.2013.04.023

Gras** Complexity

Authors: A. N. Gorban, G. S. Yablonsky

Abstract: The century of complexity has come. The face of science has changed. Surprisingly, when we start asking about the essence of these changes and then critically analyse the answers, the result are mostly discouraging. Most of the answers are related to the properties that have been in the focus of scientific research already for more than a century (like non-linearity). This paper is Preface to the… ▽ More The century of complexity has come. The face of science has changed. Surprisingly, when we start asking about the essence of these changes and then critically analyse the answers, the result are mostly discouraging. Most of the answers are related to the properties that have been in the focus of scientific research already for more than a century (like non-linearity). This paper is Preface to the special issue "Gras** Complexity" of the journal "Computers and Mathematics with Applications". We analyse the change of era in science, its reasons and main changes in scientific activity and give a brief review of the papers in the issue. △ Less

Submitted 15 March, 2013; originally announced March 2013.

Comments: 8 pages, 3 figures, bibliography 52 items

Journal ref: Computers and Mathematics with Applications 65 (2013) 1421-1426

arXiv:1302.2645 [pdf]

doi 10.1007/978-3-642-38679-4_50

Geometrical complexity of data approximators

Authors: E. M. Mirkes, A. Zinovyev, A. N. Gorban

Abstract: There are many methods developed to approximate a cloud of vectors embedded in high-dimensional space by simpler objects: starting from principal points and linear manifolds to self-organizing maps, neural gas, elastic maps, various types of principal curves and principal trees, and so on. For each type of approximators the measure of the approximator complexity was developed too. These measures a… ▽ More There are many methods developed to approximate a cloud of vectors embedded in high-dimensional space by simpler objects: starting from principal points and linear manifolds to self-organizing maps, neural gas, elastic maps, various types of principal curves and principal trees, and so on. For each type of approximators the measure of the approximator complexity was developed too. These measures are necessary to find the balance between accuracy and complexity and to define the optimal approximations of a given type. We propose a measure of complexity (geometrical complexity) which is applicable to approximators of several types and which allows comparing data approximations of different types. △ Less

Submitted 3 May, 2013; v1 submitted 11 February, 2013; originally announced February 2013.

Comments: 10 pages, 3 figures, minor correction and extension

Journal ref: IWANN 2013, Advances in Computation Intelligence, Springer LNCS 7902, pp. 500-509, 2013

arXiv:1301.2379 [pdf, ps, other]

Extremal property of a simple cycle

Authors: A. N. Gorban

Abstract: We study systems with finite number of states $A_i$ ($i=1,..., n$), which obey the first order kinetics (master equation) without detailed balance. For any nonzero complex eigenvalue $λ$ we prove the inequality $\frac{|\Im λ|}{|\Re λ|} \leq \cot\fracπ{n}$. This bound is sharp and it becomes an equality for an eigenvalue of a simple irreversible cycle $A_1 \to A_2 \to... \to A_n \to A_1$ with equal… ▽ More We study systems with finite number of states $A_i$ ($i=1,..., n$), which obey the first order kinetics (master equation) without detailed balance. For any nonzero complex eigenvalue $λ$ we prove the inequality $\frac{|\Im λ|}{|\Re λ|} \leq \cot\fracπ{n}$. This bound is sharp and it becomes an equality for an eigenvalue of a simple irreversible cycle $A_1 \to A_2 \to... \to A_n \to A_1$ with equal rate constants of all transitions. Therefore, the simple cycle with the equal rate constants has the slowest decay of the oscillations among all first order kinetic systems with the same number of states. △ Less

Submitted 10 January, 2013; originally announced January 2013.

Comments: 3 pages, 1 Fig

arXiv:1212.6767 [pdf, other]

doi 10.3390/e16052408

General H-theorem and entropies that violate the second law

Authors: Alexander N. Gorban

Abstract: $H$-theorem states that the entropy production is nonnegative and, therefore, the entropy of a closed system should monotonically change in time. In information processing, the entropy production is positive for random transformation of signals (the information processing lemma). Originally, the $H… ▽ More $H$-theorem states that the entropy production is nonnegative and, therefore, the entropy of a closed system should monotonically change in time. In information processing, the entropy production is positive for random transformation of signals (the information processing lemma). Originally, the $H$-theorem and the information processing lemma were proved for the classical Boltzmann-Gibbs-Shannon entropy and for the correspondent divergence (the relative entropy). Many new entropies and divergences have been proposed during last decades and for all of them the $H$-theorem is needed. This note proposes a simple and general criterion to check whether the $H$-theorem is valid for a convex divergence $H$ and demonstrates that some of the popular divergences obey no $H$-theorem. We consider systems with $n$ states $A_i$ that obey first order kinetics (master equation). A convex function $H$ is a Lyapunov function for all master equations with given equilibrium if and only if its conditional minima properly describe the equilibria of pair transitions $A_i \rightleftharpoons A_j$. This theorem does not depend on the principle of detailed balance and is valid for general Markov kinetics. Elementary analysis of pair equilibria demonstrates that the popular Bregman divergences like Euclidean distance or Itakura-Saito distance in the space of distribution cannot be the universal Lyapunov functions for the first-order kinetics and can increase in Markov processes. Therefore, they violate the second law and the information processing lemma. In particular, for these measures of information (divergences) random manipulation with data may add information to data. The main results are extended to nonlinear generalized mass action law kinetic equations. In Appendix, a new family of the universal Lyapunov functions for the generalized mass action law kinetics is described. △ Less

Submitted 9 October, 2014; v1 submitted 30 December, 2012; originally announced December 2012.

Comments: 45 pages, a significantly extended postprint version with description of a new family of the universal Lyapunov functions for the generalized mass action law kinetics. A presentation "New universal Lyapunov functions for nonlinear kinetics" is added as an ancillary file

Journal ref: Entropy 2014, 16(5), 2408-2432

arXiv:1212.5142 [pdf, ps, other]

doi 10.1016/j.camwa.2013.01.004

Maxallent: Maximizers of all Entropies and Uncertainty of Uncertainty

Authors: A. N. Gorban

Abstract: The entropy maximum approach (Maxent) was developed as a minimization of the subjective uncertainty measured by the Boltzmann--Gibbs--Shannon entropy. Many new entropies have been invented in the second half of the 20th century. Now there exists a rich choice of entropies for fitting needs. This diversity of entropies gave rise to a Maxent "anarchism". Maxent approach is now the conditional maximi… ▽ More The entropy maximum approach (Maxent) was developed as a minimization of the subjective uncertainty measured by the Boltzmann--Gibbs--Shannon entropy. Many new entropies have been invented in the second half of the 20th century. Now there exists a rich choice of entropies for fitting needs. This diversity of entropies gave rise to a Maxent "anarchism". Maxent approach is now the conditional maximization of an appropriate entropy for the evaluation of the probability distribution when our information is partial and incomplete. The rich choice of non-classical entropies causes a new problem: which entropy is better for a given class of applications? We understand entropy as a measure of uncertainty which increases in Markov processes. In this work, we describe the most general ordering of the distribution space, with respect to which all continuous-time Markov processes are monotonic (the Markov order). For inference, this approach results in a set of conditionally "most random" distributions. Each distribution from this set is a maximizer of its own entropy. This "uncertainty of uncertainty" is unavoidable in analysis of non-equilibrium systems. Surprisingly, the constructive description of this set of maximizers is possible. Two decomposition theorems for Markov processes provide a tool for this description. △ Less

Submitted 5 November, 2013; v1 submitted 20 December, 2012; originally announced December 2012.

Comments: 23 pages, 4 figures, Correction in Conclusion (postprint)

Journal ref: Computers and Mathematics with Applications, Volume 65, Issue 10, May 2013, 1438-1456

arXiv:1207.2507 [pdf, ps, other]

doi 10.1016/j.physa.2012.10.009

Thermodynamics in the Limit of Irreversible Reactions

Authors: A. N. Gorban, E. M. Mirkes, G. S. Yablonsky

Abstract: For many real physico-chemical complex systems detailed mechanism includes both reversible and irreversible reactions. Such systems are typical in homogeneous combustion and heterogeneous catalytic oxidation. Most complex enzyme reactions include irreversible steps. The classical thermodynamics has no limit for irreversible reactions whereas the kinetic equations may have such a limit. We represen… ▽ More For many real physico-chemical complex systems detailed mechanism includes both reversible and irreversible reactions. Such systems are typical in homogeneous combustion and heterogeneous catalytic oxidation. Most complex enzyme reactions include irreversible steps. The classical thermodynamics has no limit for irreversible reactions whereas the kinetic equations may have such a limit. We represent the systems with irreversible reactions as the limits of the fully reversible systems when some of the equilibrium concentrations tend to zero. The structure of the limit reaction system crucially depends on the relative rates of this tendency to zero. We study the dynamics of the limit system and describe its limit behavior as $t \to \infty$. If the reversible systems obey the principle of detailed balance then the limit system with some irreversible reactions must satisfy the {\em extended principle of detailed balance}. It is formulated and proven in the form of two conditions: (i) the reversible part satisfies the principle of detailed balance and (ii) the convex hull of the stoichiometric vectors of the irreversible reactions does not intersect the linear span of the stoichiometric vectors of the reversible reactions. These conditions imply the existence of the global Lyapunov functionals and alow an algebraic description of the limit behavior. The thermodynamic theory of the irreversible limit of reversible reactions is illustrated by the analysis of hydrogen combustion. △ Less

Submitted 11 October, 2012; v1 submitted 10 July, 2012; originally announced July 2012.

Comments: 23 pages, extended version with figs

Journal ref: Physica A, Volume 392, Issue 6, 2013, Pages 1318-1335

arXiv:1205.2851 [pdf, other]

doi 10.3389/fgene.2012.00131

Reduction of dynamical biochemical reaction networks in computational biology

Authors: Ovidiu Radulescu, Alexander N. Gorban, Andrei Zinovyev, Vincent Noel

Abstract: Biochemical networks are used in computational biology, to model the static and dynamical details of systems involved in cell signaling, metabolism, and regulation of gene expression. Parametric and structural uncertainty, as well as combinatorial explosion are strong obstacles against analyzing the dynamics of large models of this type. Multi-scaleness is another property of these networks, that… ▽ More Biochemical networks are used in computational biology, to model the static and dynamical details of systems involved in cell signaling, metabolism, and regulation of gene expression. Parametric and structural uncertainty, as well as combinatorial explosion are strong obstacles against analyzing the dynamics of large models of this type. Multi-scaleness is another property of these networks, that can be used to get past some of these obstacles. Networks with many well separated time scales, can be reduced to simpler networks, in a way that depends only on the orders of magnitude and not on the exact values of the kinetic parameters. The main idea used for such robust simplifications of networks is the concept of dominance among model elements, allowing hierarchical organization of these elements according to their effects on the network dynamics. This concept finds a natural formulation in tropical geometry. We revisit, in the light of these new ideas, the main approaches to model reduction of reaction networks, such as quasi-steady state and quasi-equilibrium approximations, and provide practical recipes for model reduction of linear and nonlinear networks. We also discuss the application of model reduction to backward pruning machine learning techniques. △ Less

Submitted 13 May, 2012; originally announced May 2012.

Journal ref: Frontiers in Genetics, 2012, Volume3, Article 131

arXiv:1205.2052 [pdf, ps, other]

doi 10.1016/j.physa.2012.11.028

Local Equivalence of Reversible and General Markov Kinetics

Authors: A. N. Gorban

Abstract: We consider continuous--time Markov kinetics with a finite number of states and a given positive equilibrium distribution P*. For an arbitrary probability distribution $P$ we study the possible right hand sides, dP/dt, of the Kolmogorov (master) equations. We describe the cone of possible values of the velocity, dP/dt, as a function of P and P*. We prove that, surprisingly, these cones coincide fo… ▽ More We consider continuous--time Markov kinetics with a finite number of states and a given positive equilibrium distribution P*. For an arbitrary probability distribution $P$ we study the possible right hand sides, dP/dt, of the Kolmogorov (master) equations. We describe the cone of possible values of the velocity, dP/dt, as a function of P and P*. We prove that, surprisingly, these cones coincide for the class of all Markov processes with equilibrium P* and for the reversible Markov processes with detailed balance at this equilibrium. Therefore, for an arbitrary probability distribution $P$ and a general system there exists a system with detailed balance and the same equilibrium that has the same velocity dP/dt at point P. The set of Lyapunov functions for the reversible Markov processes coincides with the set of Lyapunov functions for general Markov kinetics. The results are extended to nonlinear systems with the generalized mass action law. △ Less

Submitted 22 November, 2012; v1 submitted 9 May, 2012; originally announced May 2012.

Comments: Significantly extended version, 21 pages

Journal ref: Physica A, Volume 392, Issue 5, 1 March 2013, 1111-1121

arXiv:1204.5941 [pdf]

doi 10.3934/mbe.2019329

Basic, simple and extendable kinetic model of protein synthesis

Authors: Alexander N. Gorban, Annick Harel-Bellan, Nadya Morozova, Andrei Zinovyev

Abstract: Protein synthesis is one of the most fundamental biological processes, which consumes a significant amount of cellular resources. Despite existence of multiple mathematical models of translation, varying in the level of mechanistical details, surprisingly, there is no basic and simple chemical kinetic model of this process, derived directly from the detailed kinetic model. One of the reasons for t… ▽ More Protein synthesis is one of the most fundamental biological processes, which consumes a significant amount of cellular resources. Despite existence of multiple mathematical models of translation, varying in the level of mechanistical details, surprisingly, there is no basic and simple chemical kinetic model of this process, derived directly from the detailed kinetic model. One of the reasons for this is that the translation process is characterized by indefinite number of states, thanks to existence of polysomes. We bypass this difficulty by applying a trick consisting in lum** multiple states of translated mRNA into few dynamical variables and by introducing a variable describing the pool of translating ribosomes. The simplest model can be solved analytically under some assumptions. The basic and simple model can be extended, if necessary, to take into account various phenomena such as the interaction between translating ribosomes, limited amount of ribosomal units or regulation of translation by microRNA. The model can be used as a building block (translation module) for more complex models of cellular processes. We demonstrate the utility of the model in two examples. First, we determine the critical parameters of the single protein synthesis for the case when the ribosomal units are abundant. Second, we demonstrate intrinsic bi-stability in the dynamics of the ribosomal protein turnover and predict that a minimal number of ribosomes should pre-exists in a living cell to sustain its protein synthesis machinery, even in the absence of proliferation. △ Less

Submitted 29 April, 2019; v1 submitted 26 April, 2012; originally announced April 2012.

Comments: 22 pages, 9 figures

Journal ref: Mathematical Biosciences and Engineering 2019, Volume 16, Issue 6: 6602-6622

Showing 51–100 of 165 results for author: Gorban, A