Search | arXiv e-print repository

arXiv:2406.19807 [pdf, other]

Deceptive Diffusion: Generating Synthetic Adversarial Examples

Authors: Lucas Beerens, Catherine F. Higham, Desmond J. Higham

Abstract: We introduce the concept of deceptive diffusion -- training a generative AI model to produce adversarial images. Whereas a traditional adversarial attack algorithm aims to perturb an existing image to induce a misclassificaton, the deceptive diffusion model can create an arbitrary number of new, misclassified images that are not directly associated with training or test images. Deceptive diffusion… ▽ More We introduce the concept of deceptive diffusion -- training a generative AI model to produce adversarial images. Whereas a traditional adversarial attack algorithm aims to perturb an existing image to induce a misclassificaton, the deceptive diffusion model can create an arbitrary number of new, misclassified images that are not directly associated with training or test images. Deceptive diffusion offers the possibility of strengthening defence algorithms by providing adversarial training data at scale, including types of misclassification that are otherwise difficult to find. In our experiments, we also investigate the effect of training on a partially attacked data set. This highlights a new type of vulnerability for generative diffusion models: if an attacker is able to stealthily poison a portion of the training data, then the resulting diffusion model will generate a similar proportion of misleading outputs. △ Less

Submitted 28 June, 2024; originally announced June 2024.

MSC Class: 68T07 ACM Class: I.2.0; I.5.1

arXiv:2406.12670 [pdf, other]

Stealth edits for provably fixing or attacking large language models

Authors: Oliver J. Sutton, Qinghua Zhou, Wei Wang, Desmond J. Higham, Alexander N. Gorban, Alexander Bastounis, Ivan Y. Tyukin

Abstract: We reveal new methods and the theoretical foundations of techniques for editing large language models. We also show how the new theory can be used to assess the editability of models and to expose their susceptibility to previously unknown malicious attacks. Our theoretical approach shows that a single metric (a specific measure of the intrinsic dimensionality of the model's features) is fundament… ▽ More We reveal new methods and the theoretical foundations of techniques for editing large language models. We also show how the new theory can be used to assess the editability of models and to expose their susceptibility to previously unknown malicious attacks. Our theoretical approach shows that a single metric (a specific measure of the intrinsic dimensionality of the model's features) is fundamental to predicting the success of popular editing approaches, and reveals new bridges between disparate families of editing methods. We collectively refer to these approaches as stealth editing methods, because they aim to directly and inexpensively update a model's weights to correct the model's responses to known hallucinating prompts without otherwise affecting the model's behaviour, without requiring retraining. By carefully applying the insight gleaned from our theoretical investigation, we are able to introduce a new network block -- named a jet-pack block -- which is optimised for highly selective model editing, uses only standard network operations, and can be inserted into existing networks. The intrinsic dimensionality metric also determines the vulnerability of a language model to a stealth attack: a small change to a model's weights which changes its response to a single attacker-chosen prompt. Stealth attacks do not require access to or knowledge of the model's training data, therefore representing a potent yet previously unrecognised threat to redistributed foundation models. They are computationally simple enough to be implemented in malware in many cases. Extensive experimental results illustrate and support the method and its theoretical underpinnings. Demos and source code for editing language models are available at https://github.com/qinghua-zhou/stealth-edits. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 24 pages, 9 figures. Open source implementation: https://github.com/qinghua-zhou/stealth-edits

MSC Class: 68T07; 68T50; 68W40 ACM Class: I.2.7; F.2.0

arXiv:2403.11993 [pdf, other]

Adaptive stepsize algorithms for Langevin dynamics

Authors: Alix Leroy, Benedict Leimkuhler, Jonas Latz, Desmond J. Higham

Abstract: We discuss the design of an invariant measure-preserving transformed dynamics for the numerical treatment of Langevin dynamics based on rescaling of time, with the goal of sampling from an invariant measure. Given an appropriate monitor function which characterizes the numerical difficulty of the problem as a function of the state of the system, this method allows the stepsizes to be reduced only… ▽ More We discuss the design of an invariant measure-preserving transformed dynamics for the numerical treatment of Langevin dynamics based on rescaling of time, with the goal of sampling from an invariant measure. Given an appropriate monitor function which characterizes the numerical difficulty of the problem as a function of the state of the system, this method allows the stepsizes to be reduced only when necessary, facilitating efficient recovery of long-time behavior. We study both the overdamped and underdamped Langevin dynamics. We investigate how an appropriate correction term that ensures preservation of the invariant measure should be incorporated into a numerical splitting scheme. Finally, we demonstrate the use of the technique in several model systems, including a Bayesian sampling problem with a steep prior. △ Less

Submitted 2 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

MSC Class: 65C30; 65C05; 65P10

arXiv:2402.07631 [pdf, other]

Higher-order Connection Laplacians for Directed Simplicial Complexes

Authors: Xue Gong, Desmond J. Higham, Konstantinos Zygalakis, Ginestra Bianconi

Abstract: Higher-order networks encode the many-body interactions existing in complex systems, such as the brain, protein complexes, and social interactions. Simplicial complexes are higher-order networks that allow a comprehensive investigation of the interplay between topology and dynamics. However, simplicial complexes have the limitation that they only capture undirected higher-order interactions while… ▽ More Higher-order networks encode the many-body interactions existing in complex systems, such as the brain, protein complexes, and social interactions. Simplicial complexes are higher-order networks that allow a comprehensive investigation of the interplay between topology and dynamics. However, simplicial complexes have the limitation that they only capture undirected higher-order interactions while in real-world scenarios, often there is a need to introduce the direction of simplices, extending the popular notion of direction of edges. On graphs and networks the Magnetic Laplacian, a special case of Connection Laplacian, is becoming a popular operator to treat edge directionality. Here we tackle the challenge of treating directional simplicial complexes by formulating Higher-order Connection Laplacians taking into account the configurations induced by the simplices' directions. Specifically, we define all the Connection Laplacians of directed simplicial complexes of dimension two and we discuss the induced higher-order diffusion dynamics by considering instructive synthetic examples of simplicial complexes. The proposed higher-order diffusion processes can be adopted in real scenarios when we want to consider higher-order diffusion displaying non-trivial frustration effects due to conflicting directionalities of the incident simplices. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: 34 pages, 13 figures

arXiv:2312.14977 [pdf, other]

Diffusion Models for Generative Artificial Intelligence: An Introduction for Applied Mathematicians

Authors: Catherine F. Higham, Desmond J. Higham, Peter Grindrod

Abstract: Generative artificial intelligence (AI) refers to algorithms that create synthetic but realistic output. Diffusion models currently offer state of the art performance in generative AI for images. They also form a key component in more general tools, including text-to-image generators and large language models. Diffusion models work by adding noise to the available training data and then learning h… ▽ More Generative artificial intelligence (AI) refers to algorithms that create synthetic but realistic output. Diffusion models currently offer state of the art performance in generative AI for images. They also form a key component in more general tools, including text-to-image generators and large language models. Diffusion models work by adding noise to the available training data and then learning how to reverse the process. The reverse operation may then be applied to new random data in order to produce new outputs. We provide a brief introduction to diffusion models for applied mathematicians and statisticians. Our key aims are (a) to present illustrative computational examples, (b) to give a careful derivation of the underlying mathematical formulas involved, and (c) to draw a connection with partial differential equation (PDE) diffusion models. We provide code for the computational experiments. We hope that this topic will be of interest to advanced undergraduate students and postgraduate students. Portions of the material may also provide useful motivational examples for those who teach courses in stochastic processes, inference, machine learning, PDEs or scientific computing. △ Less

Submitted 21 December, 2023; originally announced December 2023.

MSC Class: 68T07; 60J60 ACM Class: I.2; I.2.6

arXiv:2311.17128 [pdf, other]

Vulnerability Analysis of Transformer-based Optical Character Recognition to Adversarial Attacks

Authors: Lucas Beerens, Desmond J. Higham

Abstract: Recent advancements in Optical Character Recognition (OCR) have been driven by transformer-based models. OCR systems are critical in numerous high-stakes domains, yet their vulnerability to adversarial attack remains largely uncharted territory, raising concerns about security and compliance with emerging AI regulations. In this work we present a novel framework to assess the resilience of Transfo… ▽ More Recent advancements in Optical Character Recognition (OCR) have been driven by transformer-based models. OCR systems are critical in numerous high-stakes domains, yet their vulnerability to adversarial attack remains largely uncharted territory, raising concerns about security and compliance with emerging AI regulations. In this work we present a novel framework to assess the resilience of Transformer-based OCR (TrOCR) models. We develop and assess algorithms for both targeted and untargeted attacks. For the untargeted case, we measure the Character Error Rate (CER), while for the targeted case we use the success ratio. We find that TrOCR is highly vulnerable to untargeted attacks and somewhat less vulnerable to targeted attacks. On a benchmark handwriting data set, untargeted attacks can cause a CER of more than 1 without being noticeable to the eye. With a similar perturbation size, targeted attacks can lead to success rates of around $25\%$ -- here we attacked single tokens, requiring TrOCR to output the tenth most likely token from a large vocabulary. △ Less

Submitted 28 November, 2023; originally announced November 2023.

MSC Class: 65F35 ACM Class: I.2.10; G.1.3

arXiv:2309.09305 [pdf, other]

Connectivity of Random Geometric Hypergraphs

Authors: Henry-Louis de Kergorlay, Desmond J. Higham

Abstract: We consider a random geometric hypergraph model based on an underlying bipartite graph. Nodes and hyperedges are sampled uniformly in a domain, and a node is assigned to those hyperedges that lie with a certain radius. From a modelling perspective, we explain how the model captures higher order connections that arise in real data sets. Our main contribution is to study the connectivity properties… ▽ More We consider a random geometric hypergraph model based on an underlying bipartite graph. Nodes and hyperedges are sampled uniformly in a domain, and a node is assigned to those hyperedges that lie with a certain radius. From a modelling perspective, we explain how the model captures higher order connections that arise in real data sets. Our main contribution is to study the connectivity properties of the model. In an asymptotic limit where the number of nodes and hyperedges grow in tandem we give a condition on the radius that guarantees connectivity. △ Less

Submitted 17 September, 2023; originally announced September 2023.

MSC Class: 05C80 ACM Class: G.2.2

arXiv:2309.07072 [pdf, ps, other]

The Boundaries of Verifiable Accuracy, Robustness, and Generalisation in Deep Learning

Authors: Alexander Bastounis, Alexander N. Gorban, Anders C. Hansen, Desmond J. Higham, Danil Prokhorov, Oliver Sutton, Ivan Y. Tyukin, Qinghua Zhou

Abstract: In this work, we assess the theoretical limitations of determining guaranteed stability and accuracy of neural networks in classification tasks. We consider classical distribution-agnostic framework and algorithms minimising empirical risks and potentially subjected to some weights regularisation. We show that there is a large family of tasks for which computing and verifying ideal stable and accu… ▽ More In this work, we assess the theoretical limitations of determining guaranteed stability and accuracy of neural networks in classification tasks. We consider classical distribution-agnostic framework and algorithms minimising empirical risks and potentially subjected to some weights regularisation. We show that there is a large family of tasks for which computing and verifying ideal stable and accurate neural networks in the above settings is extremely challenging, if at all possible, even when such ideal solutions exist within the given class of neural architectures. △ Less

Submitted 13 September, 2023; originally announced September 2023.

MSC Class: 68T07; 68T05

arXiv:2309.03665 [pdf, other]

How adversarial attacks can disrupt seemingly stable accurate classifiers

Authors: Oliver J. Sutton, Qinghua Zhou, Ivan Y. Tyukin, Alexander N. Gorban, Alexander Bastounis, Desmond J. Higham

Abstract: Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show th… ▽ More Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability -- notably the simultaneous susceptibility of the (otherwise accurate) model to easily constructed adversarial attacks, and robustness to random perturbations of the input data. We confirm that the same phenomena are directly observed in practical neural networks trained on standard image classification problems, where even large additive random noise fails to trigger the adversarial instability of the network. A surprising takeaway is that even small margins separating a classifier's decision surface from training and testing data can hide adversarial susceptibility from being detected using randomly sampled perturbations. Counterintuitively, using additive noise during training or testing is therefore inefficient for eradicating or detecting adversarial examples, and more demanding adversarial training is required. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 11 pages, 8 figures, additional supplementary materials

arXiv:2309.02082 [pdf, ps, other]

Backward error analysis and the qualitative behaviour of stochastic optimization algorithms: Application to stochastic coordinate descent

Authors: Stefano Di Giovacchino, Desmond J. Higham, Konstantinos Zygalakis

Abstract: Stochastic optimization methods have been hugely successful in making large-scale optimization problems feasible when computing the full gradient is computationally prohibitive. Using the theory of modified equations for numerical integrators, we propose a class of stochastic differential equations that approximate the dynamics of general stochastic optimization methods more closely than the origi… ▽ More Stochastic optimization methods have been hugely successful in making large-scale optimization problems feasible when computing the full gradient is computationally prohibitive. Using the theory of modified equations for numerical integrators, we propose a class of stochastic differential equations that approximate the dynamics of general stochastic optimization methods more closely than the original gradient flow. Analyzing a modified stochastic differential equation can reveal qualitative insights about the associated optimization method. Here, we study mean-square stability of the modified equation in the case of stochastic coordinate descent. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: 15 pages; 3 figures

arXiv:2308.15092 [pdf, other]

Can We Rely on AI?

Authors: Desmond J. Higham

Abstract: Over the last decade, adversarial attack algorithms have revealed instabilities in deep learning tools. These algorithms raise issues regarding safety, reliability and interpretability in artificial intelligence; especially in high risk settings. From a practical perspective, there has been a war of escalation between those develo** attack and defence strategies. At a more theoretical level, res… ▽ More Over the last decade, adversarial attack algorithms have revealed instabilities in deep learning tools. These algorithms raise issues regarding safety, reliability and interpretability in artificial intelligence; especially in high risk settings. From a practical perspective, there has been a war of escalation between those develo** attack and defence strategies. At a more theoretical level, researchers have also studied bigger picture questions concerning the existence and computability of attacks. Here we give a brief overview of the topic, focusing on aspects that are likely to be of interest to researchers in applied and computational mathematics. △ Less

Submitted 29 August, 2023; originally announced August 2023.

MSC Class: 68T01; 68T05; 90C31 ACM Class: I.2.0; I.5.0

arXiv:2306.14266 [pdf, other]

Estimating Network Dimension When the Spectrum Struggles

Authors: Peter Grindrod, Desmond John Higham, Henry-Louis de Kergorlay

Abstract: What is the dimension of a network? Here, we view it as the smallest dimension of Euclidean space into which nodes can be embedded so that pairwise distances accurately reflect the connectivity structure. We show that a recently proposed and extremely efficient algorithm for data clouds, based on computing first and second nearest neighbour distances, can be used as the basis of an approach for es… ▽ More What is the dimension of a network? Here, we view it as the smallest dimension of Euclidean space into which nodes can be embedded so that pairwise distances accurately reflect the connectivity structure. We show that a recently proposed and extremely efficient algorithm for data clouds, based on computing first and second nearest neighbour distances, can be used as the basis of an approach for estimating the dimension of a network with weighted edges. We also show how the algorithm can be extended to unweighted networks when combined with spectral embedding. We illustrate the advantages of this technique over the widely-used approach of characterising dimension by visually searching for a suitable gap in the spectrum of the Laplacian. △ Less

Submitted 25 June, 2023; originally announced June 2023.

MSC Class: 05C20; 05C80; 05C85; 05C90; 05C82 ACM Class: G.2.2

arXiv:2306.02918 [pdf, other]

Adversarial Ink: Componentwise Backward Error Attacks on Deep Learning

Authors: Lucas Beerens, Desmond J. Higham

Abstract: Deep neural networks are capable of state-of-the-art performance in many classification tasks. However, they are known to be vulnerable to adversarial attacks -- small perturbations to the input that lead to a change in classification. We address this issue from the perspective of backward error and condition number, concepts that have proved useful in numerical analysis. To do this, we build on t… ▽ More Deep neural networks are capable of state-of-the-art performance in many classification tasks. However, they are known to be vulnerable to adversarial attacks -- small perturbations to the input that lead to a change in classification. We address this issue from the perspective of backward error and condition number, concepts that have proved useful in numerical analysis. To do this, we build on the work of Beuzeville et al. (2021). In particular, we develop a new class of attack algorithms that use componentwise relative perturbations. Such attacks are highly relevant in the case of handwritten documents or printed texts where, for example, the classification of signatures, postcodes, dates or numerical quantities may be altered by changing only the ink consistency and not the background. This makes the perturbed images look natural to the naked eye. Such ``adversarial ink'' attacks therefore reveal a weakness that can have a serious impact on safety and security. We illustrate the new attacks on real data and contrast them with existing algorithms. We also study the use of a componentwise condition number to quantify vulnerability. △ Less

Submitted 5 June, 2023; originally announced June 2023.

MSC Class: 65F35 ACM Class: I.2.10; G.1.3

arXiv:2207.13895 [pdf, ps, other]

Generative Hypergraph Models and Spectral Embedding

Authors: Xue Gong, Desmond J. Higham, Konstantinos Zygalakis

Abstract: Many complex systems involve interactions between more than two agents. Hypergraphs capture these higher-order interactions through hyperedges that may link more than two nodes. We consider the problem of embedding a hypergraph into low-dimensional Euclidean space so that most interactions are short-range. This embedding is relevant to many follow-on tasks, such as node reordering, clustering, and… ▽ More Many complex systems involve interactions between more than two agents. Hypergraphs capture these higher-order interactions through hyperedges that may link more than two nodes. We consider the problem of embedding a hypergraph into low-dimensional Euclidean space so that most interactions are short-range. This embedding is relevant to many follow-on tasks, such as node reordering, clustering, and visualization. We focus on two spectral embedding algorithms customized to hypergraphs which recover linear and periodic structures respectively. In the periodic case, nodes are positioned on the unit circle. We show that the two spectral hypergraph embedding algorithms are associated with a new class of generative hypergraph models. These models generate hyperedges according to node positions in the embedded space and encourage short-range connections. They allow us to quantify the relative presence of periodic and linear structures in the data through maximum likelihood. They also improve the interpretability of node embedding and provide a metric for hyperedge prediction. We demonstrate the hypergraph embedding and follow-on tasks -- including structure quantification, clustering and hyperedge prediction -- on synthetic and real-world hypergraphs. We find that the hypergraph approach can outperform clustering algorithms that use only dyadic edges. We also compare several triadic edge prediction methods on high school contact data where our algorithm improves upon benchmark methods when the amount of training data is limited. △ Less

Submitted 5 January, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

arXiv:2202.12769 [pdf, other]

Core-periphery detection in hypergraphs

Authors: Francesco Tudisco, Desmond J. Higham

Abstract: Core-periphery detection is a key task in exploratory network analysis where one aims to find a core, a set of nodes well-connected internally and with the periphery, and a periphery, a set of nodes connected only (or mostly) with the core. In this work we propose a model of core-periphery for higher-order networks modeled as hypergraphs and we propose a method for computing a core-score vector th… ▽ More Core-periphery detection is a key task in exploratory network analysis where one aims to find a core, a set of nodes well-connected internally and with the periphery, and a periphery, a set of nodes connected only (or mostly) with the core. In this work we propose a model of core-periphery for higher-order networks modeled as hypergraphs and we propose a method for computing a core-score vector that quantifies how close each node is to the core. In particular, we show that this method solves the corresponding non-convex core-periphery optimization problem globally to an arbitrary precision. This method turns out to coincide with the computation of the Perron eigenvector of a nonlinear hypergraph operator, suitably defined in term of the incidence matrix of the hypergraph, generalizing recently proposed centrality models for hypergraphs. We perform several experiments on synthetic and real-world hypergraphs showing that the proposed method outperforms alternative core-periphery detection algorithms, in particular those obtained by transferring established graph methods to the hypergraph setting via clique expansion. △ Less

Submitted 25 February, 2022; originally announced February 2022.

arXiv:2202.02888 [pdf, other]

Weighted enumeration of nonbacktracking walks on weighted graphs

Authors: Francesca Arrigo, Desmond J. Higham, Vanni Noferini, Ryan Wood

Abstract: We extend the notion of nonbacktracking walks from unweighted graphs to graphs whose edges have a nonnegative weight. Here the weight associated with a walk is taken to be the product over the weights along the individual edges. We give two ways to compute the associated generating function, and corresponding node centrality measures. One method works directly on the original graph and one uses a… ▽ More We extend the notion of nonbacktracking walks from unweighted graphs to graphs whose edges have a nonnegative weight. Here the weight associated with a walk is taken to be the product over the weights along the individual edges. We give two ways to compute the associated generating function, and corresponding node centrality measures. One method works directly on the original graph and one uses a line graph construction followed by a projection. The first method is more efficient, but the second has the advantage of extending naturally to time-evolving graphs. Computational results are also provided. △ Less

Submitted 18 January, 2024; v1 submitted 6 February, 2022; originally announced February 2022.

arXiv:2201.01543 [pdf, other]

Testing a QUBO Formulation of Core-periphery Partitioning on a Quantum Annealer

Authors: Catherine F. Higham, Desmond J. Higham, Francesco Tudisco

Abstract: We propose a new kernel that quantifies success for the task of computing a core-periphery partition for an undirected network. Finding the associated optimal partitioning may be expressed in the form of a quadratic unconstrained binary optimization (QUBO) problem, to which a state-of-the-art quantum annealer may be applied. We therefore make use of the new objective function to (a) judge the perf… ▽ More We propose a new kernel that quantifies success for the task of computing a core-periphery partition for an undirected network. Finding the associated optimal partitioning may be expressed in the form of a quadratic unconstrained binary optimization (QUBO) problem, to which a state-of-the-art quantum annealer may be applied. We therefore make use of the new objective function to (a) judge the performance of a quantum annealer, and (b) compare this approach with existing heuristic core-periphery partitioning methods. The quantum annealing is performed on the commercially available D-Wave machine. The QUBO problem involves a full matrix even when the underlying network is sparse. Hence, we develop and test a sparsified version of the original QUBO which increases the available problem dimension for the quantum annealer. Results are provided on both synthetic and real data sets, and we conclude that the QUBO/quantum annealing approach offers benefits in terms of optimizing this new quantity of interest. △ Less

Submitted 5 January, 2022; originally announced January 2022.

arXiv:2111.05715 [pdf, ps, other]

A Hierarchy of Network Models Giving Bistability Under Triadic Closure

Authors: Stefano Di Giovacchino, Desmond J. Higham, Konstantinos C. Zygalakis

Abstract: Triadic closure describes the tendency for new friendships to form between individuals who already have friends in common. It has been argued heuristically that the triadic closure effect can lead to bistability in the formation of large-scale social interaction networks. Here, depending on the initial state and the transient dynamics, the system may evolve towards either of two long-time states.… ▽ More Triadic closure describes the tendency for new friendships to form between individuals who already have friends in common. It has been argued heuristically that the triadic closure effect can lead to bistability in the formation of large-scale social interaction networks. Here, depending on the initial state and the transient dynamics, the system may evolve towards either of two long-time states. In this work, we propose and study a hierarchy of network evolution models that incorporate triadic closure, building on the work of Grindrod, Higham, and Parsons [Internet Mathematics, 8, 2012, 402--423]. We use a chemical kinetics framework, paying careful attention to the reaction rate scaling with respect to the system size. In a macroscale regime, we show rigorously that a bimodal steady-state distribution is admitted. This behavior corresponds to the existence of two distinct stable fixed points in a deterministic mean-field ODE. The macroscale model is also seen to capture an apparent metastability property of the microscale system. Computational simulations are used to support the analysis. △ Less

Submitted 10 November, 2021; originally announced November 2021.

Comments: 20 pages, 9 figures

MSC Class: 60J20; 60J74; 68R10

arXiv:2110.10526 [pdf, ps, other]

Dynamic Katz and Related Network Measures

Authors: Francesca Arrigo, Desmond J. Higham, Vanni Noferini, Ryan Wood

Abstract: We study walk-based centrality measures for time-ordered network sequences. For the case of standard dynamic walk-counting, we show how to derive and compute centrality measures induced by analytic functions. We also prove that dynamic Katz centrality, based on the resolvent function, has the unique advantage of allowing computations to be performed entirely at the node level. We then consider two… ▽ More We study walk-based centrality measures for time-ordered network sequences. For the case of standard dynamic walk-counting, we show how to derive and compute centrality measures induced by analytic functions. We also prove that dynamic Katz centrality, based on the resolvent function, has the unique advantage of allowing computations to be performed entirely at the node level. We then consider two distinct types of backtracking and develop a framework for capturing dynamic walk combinatorics when either or both is disallowed. △ Less

Submitted 20 October, 2021; originally announced October 2021.

MSC Class: 05A15; 05C50; 68R05; 91D30

arXiv:2108.05451 [pdf, other]

Mean Field Analysis of Hypergraph Contagion Model

Authors: Desmond J. Higham, Henry-Louis de Kergorlay

Abstract: We typically interact in groups, not just in pairs. For this reason, it has recently been proposed that the spread of information, opinion or disease should be modelled over a hypergraph rather than a standard graph. The use of hyperedges naturally allows for a nonlinear rate of transmission, in terms of both the group size and the number of infected group members, as is the case, for example, whe… ▽ More We typically interact in groups, not just in pairs. For this reason, it has recently been proposed that the spread of information, opinion or disease should be modelled over a hypergraph rather than a standard graph. The use of hyperedges naturally allows for a nonlinear rate of transmission, in terms of both the group size and the number of infected group members, as is the case, for example, when social distancing is encouraged. We consider a general class of individual-level, stochastic, susceptible-infected-susceptible models on a hypergraph, and focus on a mean field approximation proposed in [Arruda et al., Phys. Rev. Res., 2020]. We derive spectral conditions under which the mean field model predicts local or global stability of the infection-free state. We also compare these results with (a) a new condition that we derive for decay to zero in mean for the exact process, (b) conditions for a different mean field approximation in [Higham and de Kergorlay, Proc. Roy. Soc. A, 2021], and (c) numerical simulations of the microscale model. △ Less

Submitted 11 August, 2021; originally announced August 2021.

MSC Class: 92C60; 37N25; 05C65

arXiv:2107.03026 [pdf, ps, other]

doi 10.1098/rsos.211144

Directed Network Laplacians and Random Graph Models

Authors: Xue Gong, Desmond John Higham, Konstantinos Zygalakis

Abstract: We consider spectral methods that uncover hidden structures in directed networks. We establish and exploit connections between node reordering via (a) minimizing an objective function and (b) maximizing the likelihood of a random graph model. We focus on two existing spectral approaches that build and analyse Laplacian-style matrices via the minimization of frustration and trophic incoherence. The… ▽ More We consider spectral methods that uncover hidden structures in directed networks. We establish and exploit connections between node reordering via (a) minimizing an objective function and (b) maximizing the likelihood of a random graph model. We focus on two existing spectral approaches that build and analyse Laplacian-style matrices via the minimization of frustration and trophic incoherence. These algorithms aim to reveal directed periodic and linear hierarchies, respectively. We show that reordering nodes using the two algorithms, or map** them onto a specified lattice, is associated with new classes of directed random graph models. Using this random graph setting, we are able to compare the two algorithms on a given network and quantify which structure is more likely to be present. We illustrate the approach on synthetic and real networks, and discuss practical implementation issues. △ Less

Submitted 11 October, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

MSC Class: 05C20; 05C80; 05C85; 05C90; 05C82

arXiv:2106.13997 [pdf, other]

doi 10.1093/imamat/hxad027

The Feasibility and Inevitability of Stealth Attacks

Authors: Ivan Y. Tyukin, Desmond J. Higham, Alexander Bastounis, Eliyas Woldegeorgis, Alexander N. Gorban

Abstract: We develop and study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence (AI) systems including deep learning neural networks. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself. Such a stealth attack could be conducted by a mischievous, corrupt or disgr… ▽ More We develop and study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence (AI) systems including deep learning neural networks. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself. Such a stealth attack could be conducted by a mischievous, corrupt or disgruntled member of a software development team. It could also be made by those wishing to exploit a ``democratization of AI'' agenda, where network architectures and trained parameter sets are shared publicly. We develop a range of new implementable attack strategies with accompanying analysis, showing that with high probability a stealth attack can be made transparent, in the sense that system performance is unchanged on a fixed validation set which is unknown to the attacker, while evoking any desired output on a trigger input of interest. The attacker only needs to have estimates of the size of the validation set and the spread of the AI's relevant latent space. In the case of deep learning neural networks, we show that a one neuron attack is possible - a modification to the weights and bias associated with a single neuron - revealing a vulnerability arising from over-parameterization. We illustrate these concepts using state of the art architectures on two standard image data sets. Guided by the theory and computational results, we also propose strategies to guard against stealth attacks. △ Less

Submitted 4 January, 2023; v1 submitted 26 June, 2021; originally announced June 2021.

MSC Class: 68T01; 68T05; 90C31

Journal ref: IMA Journal of Applied Mathematics, October 2023, hxad027

arXiv:2103.07319 [pdf, other]

doi 10.1098/rspa.2021.0232

Epidemics on Hypergraphs: Spectral Thresholds for Extinction

Authors: Desmond John Higham, Henry-Louis de Kergorlay

Abstract: Epidemic spreading is well understood when a disease propagates around a contact graph. In a stochastic susceptible-infected-susceptible setting, spectral conditions characterise whether the disease vanishes. However, modelling human interactions using a graph is a simplification which only considers pairwise relationships. This does not fully represent the more realistic case where people mee… ▽ More Epidemic spreading is well understood when a disease propagates around a contact graph. In a stochastic susceptible-infected-susceptible setting, spectral conditions characterise whether the disease vanishes. However, modelling human interactions using a graph is a simplification which only considers pairwise relationships. This does not fully represent the more realistic case where people meet in groups. Hyperedges can be used to record such group interactions, yielding more faithful and flexible models, allowing for the rate of infection of a node to vary as a nonlinear function of the number of infectious neighbors. We discuss different types of contagion models in this hypergraph setting, and derive spectral conditions that characterize whether the disease vanishes. We study both the exact individual-level stochastic model and a deterministic mean field ODE approximation. Numerical simulations are provided to illustrate the analysis. We also interpret our results and show how the hypergraph model allows us to distinguish between contributions to infectiousness that (a) are inherent in the nature of the pathogen and (b) arise from behavioural choices (such as social distancing, increased hygiene and use of masks). This raises the possibility of more accurately quantifying the effect of interventions that are designed to contain the spread of a virus. △ Less

Submitted 12 March, 2021; originally announced March 2021.

MSC Class: 92C60; 37N25; 05C65 ACM Class: J.4; J.3

arXiv:2103.05031 [pdf, other]

Higher-order Network Analysis Takes Off, Fueled by Classical Ideas and New Data

Authors: Austin R. Benson, David F. Gleich, Desmond J. Higham

Abstract: Higher-order network analysis uses the ideas of hypergraphs, simplicial complexes, multilinear and tensor algebra, and more, to study complex systems. These are by now well established mathematical abstractions. What's new is that the ideas can be tested and refined on the type of large-scale data arising in today's digital world. This research area therefore is making an impact across many applic… ▽ More Higher-order network analysis uses the ideas of hypergraphs, simplicial complexes, multilinear and tensor algebra, and more, to study complex systems. These are by now well established mathematical abstractions. What's new is that the ideas can be tested and refined on the type of large-scale data arising in today's digital world. This research area therefore is making an impact across many applications. Here, we provide a brief history, guide, and survey. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: Based on the SIAM News online article https://sinews.siam.org/Details-Page/higher-order-network-analysis-takes-off-fueled-by-old-ideas-and-new-data

arXiv:2101.06215 [pdf, other]

Node and Edge Nonlinear Eigenvector Centrality for Hypergraphs

Authors: Francesco Tudisco, Desmond J. Higham

Abstract: Network scientists have shown that there is great value in studying pairwise interactions between components in a system. From a linear algebra point of view, this involves defining and evaluating functions of the associated adjacency matrix. Recent work indicates that there are further benefits from accounting directly for higher order interactions, notably through a hypergraph representation whe… ▽ More Network scientists have shown that there is great value in studying pairwise interactions between components in a system. From a linear algebra point of view, this involves defining and evaluating functions of the associated adjacency matrix. Recent work indicates that there are further benefits from accounting directly for higher order interactions, notably through a hypergraph representation where an edge may involve multiple nodes. Building on these ideas, we motivate, define and analyze a class of spectral centrality measures for identifying important nodes and hyperedges in hypergraphs, generalizing existing network science concepts. By exploiting the latest developments in nonlinear Perron-Frobenius theory, we show how the resulting constrained nonlinear eigenvalue problems have unique solutions that can be computed efficiently via a nonlinear power method iteration. We illustrate the measures on realistic data sets. △ Less

Submitted 24 August, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

arXiv:2012.02999 [pdf, other]

A Theory for Backtrack-Downweighted Walks

Authors: Francesca Arrigo, Desmond J. Higham, Vanni Noferini

Abstract: We develop a complete theory for the combinatorics of walk-counting on a directed graph in the case where each backtracking step is downweighted by a given factor. By deriving expressions for the associated generating functions, we also obtain linear systems for computing centrality measures in this setting. In particular, we show that backtrack-downweighted Katz-style network centrality can be co… ▽ More We develop a complete theory for the combinatorics of walk-counting on a directed graph in the case where each backtracking step is downweighted by a given factor. By deriving expressions for the associated generating functions, we also obtain linear systems for computing centrality measures in this setting. In particular, we show that backtrack-downweighted Katz-style network centrality can be computed at the same cost as standard Katz. Studying the limit of this centrality measure at its radius of convergence also leads to a new expression for backtrack-downweighted eigenvector centrality that generalizes previous work to the case where directed edges are present. The new theory allows us to combine advantages of standard and nonbacktracking cases, avoiding localization while accounting for tree-like structures. We illustrate the behaviour of the backtrack-downweighted centrality measure on both synthetic and real networks. △ Less

Submitted 5 December, 2020; originally announced December 2020.

MSC Class: 05C50; 05C82; 68R10

arXiv:2009.11369 [pdf, ps, other]

A Personal Perspective on Numerical Analysis and Optimization

Authors: Desmond J. Higham

Abstract: I give a brief, non-technical, historical perspective on numerical analysis and optimization. I also touch on emerging trends and future challenges. This content is based on the short presentation that I made at the opening ceremony of \emph{The International Conference on Numerical Analysis and Optimization}, which was held at Sultan Qaboos University, Muscat, Oman, on January 6--9, 2020. Of cour… ▽ More I give a brief, non-technical, historical perspective on numerical analysis and optimization. I also touch on emerging trends and future challenges. This content is based on the short presentation that I made at the opening ceremony of \emph{The International Conference on Numerical Analysis and Optimization}, which was held at Sultan Qaboos University, Muscat, Oman, on January 6--9, 2020. Of course, the material covered here is necessarily incomplete and biased towards my own interests and comfort zones. My aim is to give a feel for how the area has developed over the past few decades and how it may continue. △ Less

Submitted 23 September, 2020; originally announced September 2020.

MSC Class: 65 ACM Class: G.1.0

arXiv:2006.13984 [pdf, other]

Consistency of Anchor-based Spectral Clustering

Authors: Henry-Louis de Kergorlay, Desmond John Higham

Abstract: Anchor-based techniques reduce the computational complexity of spectral clustering algorithms. Although empirical tests have shown promising results, there is currently a lack of theoretical support for the anchoring approach. We define a specific anchor-based algorithm and show that it is amenable to rigorous analysis, as well as being effective in practice. We establish the theoretical consisten… ▽ More Anchor-based techniques reduce the computational complexity of spectral clustering algorithms. Although empirical tests have shown promising results, there is currently a lack of theoretical support for the anchoring approach. We define a specific anchor-based algorithm and show that it is amenable to rigorous analysis, as well as being effective in practice. We establish the theoretical consistency of the method in an asymptotic setting where data is sampled from an underlying continuous probability distribution. In particular, we provide sharp asymptotic conditions for the algorithm parameters which ensure that the anchor-based method can recover with high probability disjoint clusters that are mutually separated by a positive distance. We illustrate the performance of the algorithm on synthetic data and explain how the theoretical convergence analysis can be used to inform the practical choice of parameter scalings. We also test the accuracy and efficiency of the algorithm on two large scale real data sets. We find that the algorithm offers clear advantages over standard spectral clustering. We also find that it is competitive with the state-of-the-art LSC method of Chen and Cai (Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011), while having the added benefit of a consistency guarantee. △ Less

Submitted 27 June, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

arXiv:2004.04479 [pdf, ps, other]

doi 10.1109/IJCNN48605.2020.9207472

On Adversarial Examples and Stealth Attacks in Artificial Intelligence Systems

Authors: Ivan Y. Tyukin, Desmond J. Higham, Alexander N. Gorban

Abstract: In this work we present a formal theoretical framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. Our results apply to general multi-class classifiers that map from an input space into a decision space, including artificial neural networks used in deep learning applications. Two classes of attacks are considered. The first cla… ▽ More In this work we present a formal theoretical framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. Our results apply to general multi-class classifiers that map from an input space into a decision space, including artificial neural networks used in deep learning applications. Two classes of attacks are considered. The first class involves adversarial examples and concerns the introduction of small perturbations of the input data that cause misclassification. The second class, introduced here for the first time and named stealth attacks, involves small perturbations to the AI system itself. Here the perturbed system produces whatever output is desired by the attacker on a specific small data set, perhaps even a single input, but performs as normal on a validation set (which is unknown to the attacker). We show that in both cases, i.e., in the case of an attack based on adversarial examples and in the case of a stealth attack, the dimensionality of the AI's decision-making space is a major contributor to the AI's susceptibility. For attacks based on adversarial examples, a second crucial parameter is the absence of local concentrations in the data probability distribution, a property known as Smeared Absolute Continuity. According to our findings, robustness to adversarial examples requires either (a) the data distributions in the AI's feature space to have concentrated probability density functions or (b) the dimensionality of the AI's decision variables to be sufficiently small. We also show how to construct stealth attacks on high-dimensional AI systems that are hard to spot unless the validation set is made exponentially large. △ Less

Submitted 9 April, 2020; originally announced April 2020.

MSC Class: 68T05; 68T10; 90C31

Journal ref: 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, United Kingdom, 2020

arXiv:1910.12711 [pdf, other]

doi 10.1098/rspa.2019.0724

A framework for second order eigenvector centralities and clustering coefficients

Authors: Francesca Arrigo, Desmond J. Higham, Francesco Tudisco

Abstract: We propose and analyse a general tensor-based framework for incorporating second order features into network measures. This approach allows us to combine traditional pairwise links with information that records whether triples of nodes are involved in wedges or triangles. Our treatment covers classical spectral methods and recently proposed cases from the literature, but we also identify many inte… ▽ More We propose and analyse a general tensor-based framework for incorporating second order features into network measures. This approach allows us to combine traditional pairwise links with information that records whether triples of nodes are involved in wedges or triangles. Our treatment covers classical spectral methods and recently proposed cases from the literature, but we also identify many interesting extensions. In particular, we define a mutually-reinforcing (spectral) version of the classical clustering coefficient. The underlying object of study is a constrained nonlinear eigenvalue problem associated with a cubic tensor. Using recent results from nonlinear Perron--Frobenius theory, we establish existence and uniqueness under appropriate conditions, and show that the new spectral measures can be computed efficiently with a nonlinear power method. To illustrate the added value of the new formulation, we analyse the measures on a class of synthetic networks. We also give computational results on centrality and link prediction for real-world networks. △ Less

Submitted 28 October, 2019; originally announced October 2019.

arXiv:1909.03469 [pdf, other]

Accurate Computation of the Log-Sum-Exp and Softmax Functions

Authors: Pierre Blanchard, Desmond J. Higham, Nicholas J. Higham

Abstract: Evaluating the log-sum-exp function or the softmax function is a key step in many modern data science algorithms, notably in inference and classification. Because of the exponentials that these functions contain, the evaluation is prone to overflow and underflow, especially in low precision arithmetic. Software implementations commonly use alternative formulas that avoid overflow and reduce the ch… ▽ More Evaluating the log-sum-exp function or the softmax function is a key step in many modern data science algorithms, notably in inference and classification. Because of the exponentials that these functions contain, the evaluation is prone to overflow and underflow, especially in low precision arithmetic. Software implementations commonly use alternative formulas that avoid overflow and reduce the chance of harmful underflow, employing a shift or another rewriting. Although mathematically equivalent, these variants behave differently in floating-point arithmetic. We give rounding error analyses of different evaluation algorithms and interpret the error bounds using condition numbers for the functions. We conclude, based on the analysis and numerical experiments, that the shifted formulas are of similar accuracy to the unshifted ones and that the shifted softmax formula is typically more accurate than a division-free variant. △ Less

Submitted 8 September, 2019; originally announced September 2019.

Report number: MIMS EPrint 2019.19 MSC Class: 97N20 ACM Class: G.1.3; I.2.8; G.3; G.4

arXiv:1807.01496 [pdf, other]

Centrality-Friendship Paradoxes: When Our Friends Are More Important Than Us

Authors: Desmond J. Higham

Abstract: The friendship paradox states that, on average, our friends have more friends than we do. In network terms, the average degree over the nodes can never exceed the average degree over the neighbours of nodes. This effect, which is a classic example of sampling bias, has attracted much attention in the social science and network science literature, with variations and extensions of the paradox being… ▽ More The friendship paradox states that, on average, our friends have more friends than we do. In network terms, the average degree over the nodes can never exceed the average degree over the neighbours of nodes. This effect, which is a classic example of sampling bias, has attracted much attention in the social science and network science literature, with variations and extensions of the paradox being defined, tested and interpreted. Here, we show that a version of the paradox holds rigorously for eigenvector centrality: on average, our friends are more important than us. We then consider general matrix-function centrality, including Katz centrality, and give sufficient conditions for the paradox to hold. We also discuss which results can be generalized to the cases of directed and weighted edges. In this way, we add theoretical support for a field that has largely been evolving through empirical testing. △ Less

Submitted 4 July, 2018; originally announced July 2018.

MSC Class: 68R10; 94C15 ACM Class: G.2.2; F.2.1

arXiv:1804.09820 [pdf, other]

A Nonlinear Spectral Method for Core--Periphery Detection in Networks

Authors: Francesco Tudisco, Desmond J. Higham

Abstract: We derive and analyse a new iterative algorithm for detecting network core--periphery structure. Using techniques in nonlinear Perron-Frobenius theory, we prove global convergence to the unique solution of a relaxed version of a natural discrete optimization problem. On sparse networks, the cost of each iteration scales linearly with the number of nodes, making the algorithm feasible for large-sca… ▽ More We derive and analyse a new iterative algorithm for detecting network core--periphery structure. Using techniques in nonlinear Perron-Frobenius theory, we prove global convergence to the unique solution of a relaxed version of a natural discrete optimization problem. On sparse networks, the cost of each iteration scales linearly with the number of nodes, making the algorithm feasible for large-scale problems. We give an alternative interpretation of the algorithm from the perspective of maximum likelihood reordering of a new logistic core--periphery random graph model. This viewpoint also gives a new basis for quantitatively judging a core--periphery detection algorithm. We illustrate the algorithm on a range of synthetic and real networks, and show that it offers advantages over the current state-of-the-art. △ Less

Submitted 11 February, 2019; v1 submitted 25 April, 2018; originally announced April 2018.

arXiv:1801.05894 [pdf, other]

Deep Learning: An Introduction for Applied Mathematicians

Authors: Catherine F. Higham, Desmond J. Higham

Abstract: Multilayered artificial neural networks are becoming a pervasive tool in a host of application fields. At the heart of this deep learning revolution are familiar concepts from applied and computational mathematics; notably, in calculus, approximation theory, optimization and linear algebra. This article provides a very brief introduction to the basic ideas that underlie deep learning from an appli… ▽ More Multilayered artificial neural networks are becoming a pervasive tool in a host of application fields. At the heart of this deep learning revolution are familiar concepts from applied and computational mathematics; notably, in calculus, approximation theory, optimization and linear algebra. This article provides a very brief introduction to the basic ideas that underlie deep learning from an applied mathematics perspective. Our target audience includes postgraduate and final year undergraduate students in mathematics who are keen to learn about the area. The article may also be useful for instructors in mathematics who wish to enliven their classes with references to the application of deep learning techniques. We focus on three fundamental questions: what is a deep neural network? how is a network trained? what is the stochastic gradient method? We illustrate the ideas with a short MATLAB code that sets up and trains a network. We also show the use of state-of-the art software on a large scale image classification problem. We finish with references to the current literature. △ Less

Submitted 17 January, 2018; originally announced January 2018.

MSC Class: 97R40; 68T01; 65K10; 62M45 ACM Class: G.1.6; I.2.10; I.2.0; I.2.6

arXiv:1512.01588 [pdf, other]

Computational complexity analysis for Monte Carlo approximations of classically scaled population processes

Authors: David F. Anderson, Desmond J. Higham, Yu Sun

Abstract: We analyze and compare the computational complexity of different simulation strategies for Monte Carlo in the setting of classically scaled population processes. This allows a range of widely used competing strategies to be judged systematically. Our setting includes stochastically modeled biochemical systems. We consider the task of approximating the expected value of some path functional of the… ▽ More We analyze and compare the computational complexity of different simulation strategies for Monte Carlo in the setting of classically scaled population processes. This allows a range of widely used competing strategies to be judged systematically. Our setting includes stochastically modeled biochemical systems. We consider the task of approximating the expected value of some path functional of the state of the system at a fixed time point. We study the use of standard Monte Carlo when samples are produced by exact simulation and by approximation with tau-lea** or an Euler-Maruyama discretization of a diffusion approximation. Appropriate modifications of recently proposed multilevel Monte Carlo algorithms are also studied for the tau-lea** and Euler-Maruyama approaches. In order to quantify computational complexity in a tractable yet meaningful manner, we consider a parameterization that, in the mass action chemical kinetics setting, corresponds to the classical system size scaling. We base the analysis on a novel asymptotic regime where the required accuracy is a function of the model scaling parameter. Our new analysis shows that, under the specific assumptions made in the manuscript, if the bias inherent in the diffusion approximation is smaller than the required accuracy, then multilevel Monte Carlo for the diffusion approximation is most efficient, besting multilevel Monte Carlo with tau-lea** by a factor of a logarithm of the scaling parameter. However, if the bias of the diffusion model is greater than the error tolerance, or if the bias can not be bounded analytically, multilevel versions of tau-lea** are often the optimal choice. △ Less

Submitted 4 June, 2018; v1 submitted 4 December, 2015; originally announced December 2015.

Comments: Final accepted version

MSC Class: 60H35; 65C05

arXiv:1511.07305 [pdf, ps, other]

Block Matrix Formulations for Evolving Networks

Authors: Caterina Fenu, Desmond J. Higham

Abstract: Many types of pairwise interaction take the form of a fixed set of nodes with edges that appear and disappear over time. In the case of discrete-time evolution, the resulting evolving network may be represented by a time-ordered sequence of adjacency matrices. We consider here the issue of representing the system as a single, higher dimensional block matrix, built from the individual time-slices.… ▽ More Many types of pairwise interaction take the form of a fixed set of nodes with edges that appear and disappear over time. In the case of discrete-time evolution, the resulting evolving network may be represented by a time-ordered sequence of adjacency matrices. We consider here the issue of representing the system as a single, higher dimensional block matrix, built from the individual time-slices. We focus on the task of computing network centrality measures. From a modeling perspective, we show that there is a suitable block formulation that allows us to recover dynamic centrality measures respecting time's arrow. From a computational perspective, we show that the new block formulation leads to the design of more effective numerical algorithms. △ Less

Submitted 1 June, 2016; v1 submitted 23 November, 2015; originally announced November 2015.

Comments: 18 pages, 2 figures

MSC Class: 05C50; 15A69

arXiv:1505.00965 [pdf, other]

An Introduction to Multilevel Monte Carlo for Option Valuation

Authors: Desmond J. Higham

Abstract: Monte Carlo is a simple and flexible tool that is widely used in computational finance. In this context, it is common for the quantity of interest to be the expected value of a random variable defined via a stochastic differential equation. In 2008, Giles proposed a remarkable improvement to the approach of discretizing with a numerical method and applying standard Monte Carlo. His multilevel Mont… ▽ More Monte Carlo is a simple and flexible tool that is widely used in computational finance. In this context, it is common for the quantity of interest to be the expected value of a random variable defined via a stochastic differential equation. In 2008, Giles proposed a remarkable improvement to the approach of discretizing with a numerical method and applying standard Monte Carlo. His multilevel Monte Carlo method offers an order of speed up given by the inverse of epsilon, where epsilon is the required accuracy. So computations can run 100 times more quickly when two digits of accuracy are required. The multilevel philosophy has since been adopted by a range of researchers and a wealth of practically significant results has arisen, most of which have yet to make their way into the expository literature. In this work, we give a brief, accessible, introduction to multilevel Monte Carlo and summarize recent results applicable to the task of option evaluation. △ Less

Submitted 5 May, 2015; originally announced May 2015.

Comments: Submitted to International Journal of Computer Mathematics, special issue on Computational Methods in Finance

MSC Class: 65C30

arXiv:1412.3039 [pdf, ps, other]

Multilevel Monte Carlo for stochastic differential equations with small noise

Authors: David F. Anderson, Desmond J. Higham, Yu Sun

Abstract: We consider the problem of numerically estimating expectations of solutions to stochastic differential equations driven by Brownian motions in the commonly occurring small noise regime. We consider (i) standard Monte Carlo methods combined with numerical discretization algorithms tailored to the small noise setting, and (ii) a multilevel Monte Carlo method combined with a standard Euler-Maruyama i… ▽ More We consider the problem of numerically estimating expectations of solutions to stochastic differential equations driven by Brownian motions in the commonly occurring small noise regime. We consider (i) standard Monte Carlo methods combined with numerical discretization algorithms tailored to the small noise setting, and (ii) a multilevel Monte Carlo method combined with a standard Euler-Maruyama implementation. Under the assumptions we make on the underlying model, the multilevel method combined with Euler-Maruyama is often found to be the most efficient option. Moreover, under a wide range of scalings the multilevel method is found to give the same asymptotic complexity that would arise in the idealized case where we have access to exact samples of the required distribution at a cost of $O(1)$ per sample. A key step in our analysis is to analyze the variance between two coupled paths directly, as opposed to their $L^2$ distance. Careful simulations are provided to illustrate the asymptotic results. △ Less

Submitted 4 June, 2015; v1 submitted 9 December, 2014; originally announced December 2014.

Comments: A section making a comparison with results in the jump process setting has been added. We have also taken several opportunities to clarify the presentation and interpret the results more fully

MSC Class: 60H35; 65C05; 65C30

arXiv:1409.1831 [pdf, other]

Opportunities at the Mathematics/Future Cities Interface

Authors: Peter Grindrod, Desmond J. Higham, Robert S. MacKay

Abstract: We make the case for mathematicians and statisticians to stake their claim in the fast-moving and high-impact research field that is becoming known as Future Cities. After assessing the Future Cities arena, we provide some illustrative challenges where mathematical scientists can make an impact. We make the case for mathematicians and statisticians to stake their claim in the fast-moving and high-impact research field that is becoming known as Future Cities. After assessing the Future Cities arena, we provide some illustrative challenges where mathematical scientists can make an impact. △ Less

Submitted 5 September, 2014; originally announced September 2014.

Comments: A revised version of this document will appear in SIAM News (Society of Industrial and Applied Mathematics)

Report number: University of Strathclyde Mathematics and Statistics Research Report 10 (2014) MSC Class: 05C82; 94C15; 65F50

arXiv:1406.2017 [pdf, other]

Anticipating Activity in Social Media Spikes

Authors: Desmond J. Higham, Peter Grindrod, Alexander V. Mantzaris, Amanda Otley, Peter Laflin

Abstract: We propose a novel mathematical model for the activity of microbloggers during an external, event-driven spike. The model leads to a testable prediction of who would become most active if a spike were to take place. This type of information is of great interest to commercial organisations, governments and charities, as it identifies key players who can be targeted with information in real time whe… ▽ More We propose a novel mathematical model for the activity of microbloggers during an external, event-driven spike. The model leads to a testable prediction of who would become most active if a spike were to take place. This type of information is of great interest to commercial organisations, governments and charities, as it identifies key players who can be targeted with information in real time when the network is most receptive. The model takes account of the fact that dynamic interactions evolve over an underlying, static network that records who listens to whom. The model is based on the assumption that, in the case where the entire community has become aware of an external news event, a key driver of activity is the motivation to participate by responding to incoming messages. We test the model on a large scale Twitter conversation concerning the appointment of a UK Premier League football club manager. We also present further results for a Bundesliga football match, a marketing event and a television programme. In each case we find that exploiting the underlying connectivity structure improves the prediction of who will be active during a spike. We also show how the half-life of a spike in activity can be quantified in terms of the network size and the typical response rate. △ Less

Submitted 8 June, 2014; originally announced June 2014.

arXiv:1310.2676 [pdf, other]

Complexity of Multilevel Monte Carlo Tau-Lea**

Authors: David F. Anderson, Desmond J. Higham, Yu Sun

Abstract: Tau-lea** is a popular discretization method for generating approximate paths of continuous time, discrete space, Markov chains, notably for biochemical reaction systems. To compute expected values in this context, an appropriate multilevel Monte Carlo form of tau-lea** has been shown to improve efficiency dramatically. In this work we derive new analytic results concerning the computational c… ▽ More Tau-lea** is a popular discretization method for generating approximate paths of continuous time, discrete space, Markov chains, notably for biochemical reaction systems. To compute expected values in this context, an appropriate multilevel Monte Carlo form of tau-lea** has been shown to improve efficiency dramatically. In this work we derive new analytic results concerning the computational complexity of multilevel Monte Carlo tau-lea** that are significantly sharper than previous ones. We avoid taking asymptotic limits, and focus on a practical setting where the system size is large enough for many events to take place along a path, so that exact simulation of paths is expensive, making tau-lea** an attractive option. We use a general scaling of the system components that allows for the reaction rate constants and the abundances of species to vary over several orders of magnitude, and we exploit the random time change representation developed by Kurtz. The key feature of the analysis that allows for the sharper bounds is that when comparing relevant pairs of processes we analyze the variance of their difference directly rather than bounding via the second moment. Use of the second moment is natural in the setting of a diffusion equation, where multilevel was first developed and where strong convergence results for numerical methods are readily available, but is not optimal for the Poisson-driven jump systems that we consider here. We also present computational results that illustrate the new analysis. △ Less

Submitted 1 August, 2014; v1 submitted 9 October, 2013; originally announced October 2013.

Comments: 24 pages and 2 figures. Minor edits since last version

MSC Class: 60H35; 92C40

arXiv:1207.5047 [pdf, other]

doi 10.1093/comnet/cnt001

Dynamic Network Centrality Summarizes Learning in the Human Brain

Authors: Alexander V. Mantzaris, Danielle S. Bassett, Nicholas F. Wymbs, Ernesto Estrada, Mason A. Porter, Peter J. Mucha, Scott T. Grafton, Desmond J. Higham

Abstract: We study functional activity in the human brain using functional Magnetic Resonance Imaging and recently developed tools from network science. The data arise from the performance of a simple behavioural motor learning task. Unsupervised clustering of subjects with respect to similarity of network activity measured over three days of practice produces significant evidence of `learning', in the sens… ▽ More We study functional activity in the human brain using functional Magnetic Resonance Imaging and recently developed tools from network science. The data arise from the performance of a simple behavioural motor learning task. Unsupervised clustering of subjects with respect to similarity of network activity measured over three days of practice produces significant evidence of `learning', in the sense that subjects typically move between clusters (of subjects whose dynamics are similar) as time progresses. However, the high dimensionality and time-dependent nature of the data makes it difficult to explain which brain regions are driving this distinction. Using network centrality measures that respect the arrow of time, we express the data in an extremely compact form that characterizes the aggregate activity of each brain region in each experiment using a single coefficient, while reproducing information about learning that was discovered using the full data set. This compact summary allows key brain regions contributing to centrality to be visualized and interpreted. We thereby provide a proof of principle for the use of recently proposed dynamic centrality measures on temporal network data in neuroscience. △ Less

Submitted 20 July, 2012; originally announced July 2012.

Journal ref: jcomplexnetw (2013) 1(1): 83-92

arXiv:1204.1647 [pdf, ps, other]

Convergence, Non-negativity and Stability of a New Milstein Scheme with Applications to Finance

Authors: Desmond J. Higham, Xuerong Mao, Lukasz Szpruch

Abstract: We propose and analyse a new Milstein type scheme for simulating stochastic differential equations (SDEs) with highly nonlinear coefficients. Our work is motivated by the need to justify multi-level Monte Carlo simulations for mean-reverting financial models with polynomial growth in the diffusion term. We introduce a double implicit Milstein scheme and show that it possesses desirable properties.… ▽ More We propose and analyse a new Milstein type scheme for simulating stochastic differential equations (SDEs) with highly nonlinear coefficients. Our work is motivated by the need to justify multi-level Monte Carlo simulations for mean-reverting financial models with polynomial growth in the diffusion term. We introduce a double implicit Milstein scheme and show that it possesses desirable properties. It converges strongly and preserves non-negativity for a rich family of financial models and can reproduce linear and nonlinear stability behaviour of the underlying SDE without severe restriction on the time step. Although the scheme is implicit, we point out examples of financial models where an explicit formula for the solution to the scheme can be found. △ Less

Submitted 7 April, 2012; originally announced April 2012.

arXiv:1107.2181 [pdf, other]

Multi-level Monte Carlo for continuous time Markov chains, with applications in biochemical kinetics

Authors: David F. Anderson, Desmond J. Higham

Abstract: We show how to extend a recently proposed multi-level Monte Carlo approach to the continuous time Markov chain setting, thereby greatly lowering the computational complexity needed to compute expected values of functions of the state of the system to a specified accuracy. The extension is non-trivial, exploiting a coupling of the requisite processes that is easy to simulate while providing a small… ▽ More We show how to extend a recently proposed multi-level Monte Carlo approach to the continuous time Markov chain setting, thereby greatly lowering the computational complexity needed to compute expected values of functions of the state of the system to a specified accuracy. The extension is non-trivial, exploiting a coupling of the requisite processes that is easy to simulate while providing a small variance for the estimator. Further, and in a stark departure from other implementations of multi-level Monte Carlo, we show how to produce an unbiased estimator that is significantly less computationally expensive than the usual unbiased estimator arising from exact algorithms in conjunction with crude Monte Carlo. We thereby dramatically improve, in a quantifiable manner, the basic computational complexity of current approaches that have many names and variants across the scientific literature, including the Bortz-Kalos-Lebowitz algorithm, discrete event simulation, dynamic Monte Carlo, kinetic Monte Carlo, the n-fold way, the next reaction method,the residence-time algorithm, the stochastic simulation algorithm, Gillespie's algorithm, and tau-lea**. The new algorithm applies generically, but we also give an example where the coupling idea alone, even without a multi-level discretization, can be used to improve efficiency by exploiting system structure. Stochastically modeled chemical reaction networks provide a very important application for this work. Hence, we use this context for our notation, terminology, natural scalings, and computational examples. △ Less

Submitted 21 November, 2011; v1 submitted 11 July, 2011; originally announced July 2011.

Comments: Improved description of the constants in statement of Theorems

MSC Class: 60H35; 65C99; 92C40

arXiv:0905.4102 [pdf]

doi 10.1016/j.physa.2008.11.011

Communicability Betweenness in Complex Networks

Authors: Ernesto Estrada, Desmond J. Higham, Naomichi Hatano

Abstract: Betweenness measures provide quantitative tools to pick out fine details from the massive amount of interaction data that is available from large complex networks. They allow us to study the extent to which a node takes part when information is passed around the network. Nodes with high betweenness may be regarded as key players that have a highly active role. At one extreme, betweenness has bee… ▽ More Betweenness measures provide quantitative tools to pick out fine details from the massive amount of interaction data that is available from large complex networks. They allow us to study the extent to which a node takes part when information is passed around the network. Nodes with high betweenness may be regarded as key players that have a highly active role. At one extreme, betweenness has been defined by considering information passing only through the shortest paths between pairs of nodes. At the other extreme, an alternative type of betweenness has been defined by considering all possible walks of any length. In this work, we propose a betweenness measure that lies between these two opposing viewpoints. We allow information to pass through all possible routes, but introduce a scaling so that longer walks carry less importance. This new definition shares a similar philosophy to that of communicability for pairs of nodes in a network, which was introduced by Estrada and Hatano (Phys. Rev. E 77 (2008) 036111). Having defined this new communicability betweenness measure, we show that it can be characterized neatly in terms of the exponential of the adjacency matrix. We also show that this measure is closely related to a Frechet derivative of the matrix exponential. This allows us to conclude that it also describes network sensitivity when the edges of a given node are subject to infinitesimally small perturbations. Using illustrative synthetic and real life networks, we show that the new betweenness measure behaves differently to existing versions, and in particular we show that it recovers meaningful biological information from a protein-protein interaction network. △ Less

Submitted 25 May, 2009; originally announced May 2009.

Comments: 32 pages, 7 figures, 3 tables

Journal ref: Physica A 388 (2009) 764-774

arXiv:0905.4101 [pdf]

doi 10.1103/PhysRevE.78.026102

Communicability and multipartite structures in complex networks at negative absolute temperatures

Authors: Ernesto Estrada, Desmond J. Higham, Naomichi Hatano

Abstract: We here present a method of clearly identifying multi-partite subgraphs in a network. The method is based on a recently introduced concept of the communicability, which very clearly identifies communities in a complex network. We here show that, while the communicability at a positive temperature is useful in identifying communities, the communicability at a negative temperature is useful in ide… ▽ More We here present a method of clearly identifying multi-partite subgraphs in a network. The method is based on a recently introduced concept of the communicability, which very clearly identifies communities in a complex network. We here show that, while the communicability at a positive temperature is useful in identifying communities, the communicability at a negative temperature is useful in idenfitying multi-partitite subgraphs; the latter quantity between two nodes is positive when the two nodes belong to the same subgraph and is negative when not. The method is able to discover `almost' multi-partite structures, where inter-community connections vastly outweigh intracommunity connections. We illustrate the relevance of this work to real-life food web and protein-protein interaction networks. △ Less

Submitted 25 May, 2009; originally announced May 2009.

Comments: 24 pages, 5 figures

Journal ref: Phys. Rev. E 78, 026102 (2008)

arXiv:0811.0769 [pdf, ps, other]

Communicability in complex brain networks

Authors: Jonathan J. Crofts, Desmond J. Higham

Abstract: Recent advances in experimental neuroscience allow, for the first time, non-invasive studies of the white matter tracts in the human central nervous system, thus making available cutting-edge brain anatomical data describing these global connectivity patterns. This new, non-invasive, technique uses magnetic resonance imaging to construct a snap-shot of the cortical network within the living huma… ▽ More Recent advances in experimental neuroscience allow, for the first time, non-invasive studies of the white matter tracts in the human central nervous system, thus making available cutting-edge brain anatomical data describing these global connectivity patterns. This new, non-invasive, technique uses magnetic resonance imaging to construct a snap-shot of the cortical network within the living human brain. Here, we report on the initial success of a new weighted network communicability measure in distinguishing local and global differences between diseased patients and controls. This approach builds on recent advances in network science, where an underlying connectivity structure is used as a means to measure the ease with which information can flow between nodes. One advantage of our method is that it deals directly with the real-valued connectivity data, thereby avoiding the need to discretise the corresponding adjacency matrix, that is, to round weights up to 1 or down to 0, depending upon some threshold value. Experimental results indicate that the new approach is able to highlight biologically relevant features that are not immediately apparent from the raw connectivity data. △ Less

Submitted 5 November, 2008; originally announced November 2008.

Report number: University of Strathclyde, technical report 2008 #18

Showing 1–47 of 47 results for author: Higham, D J