Search | arXiv e-print repository

arXiv:2406.19807 [pdf, other]

Deceptive Diffusion: Generating Synthetic Adversarial Examples

Authors: Lucas Beerens, Catherine F. Higham, Desmond J. Higham

Abstract: We introduce the concept of deceptive diffusion -- training a generative AI model to produce adversarial images. Whereas a traditional adversarial attack algorithm aims to perturb an existing image to induce a misclassificaton, the deceptive diffusion model can create an arbitrary number of new, misclassified images that are not directly associated with training or test images. Deceptive diffusion… ▽ More We introduce the concept of deceptive diffusion -- training a generative AI model to produce adversarial images. Whereas a traditional adversarial attack algorithm aims to perturb an existing image to induce a misclassificaton, the deceptive diffusion model can create an arbitrary number of new, misclassified images that are not directly associated with training or test images. Deceptive diffusion offers the possibility of strengthening defence algorithms by providing adversarial training data at scale, including types of misclassification that are otherwise difficult to find. In our experiments, we also investigate the effect of training on a partially attacked data set. This highlights a new type of vulnerability for generative diffusion models: if an attacker is able to stealthily poison a portion of the training data, then the resulting diffusion model will generate a similar proportion of misleading outputs. △ Less

Submitted 28 June, 2024; originally announced June 2024.

MSC Class: 68T07 ACM Class: I.2.0; I.5.1

arXiv:2406.12670 [pdf, other]

Stealth edits for provably fixing or attacking large language models

Authors: Oliver J. Sutton, Qinghua Zhou, Wei Wang, Desmond J. Higham, Alexander N. Gorban, Alexander Bastounis, Ivan Y. Tyukin

Abstract: We reveal new methods and the theoretical foundations of techniques for editing large language models. We also show how the new theory can be used to assess the editability of models and to expose their susceptibility to previously unknown malicious attacks. Our theoretical approach shows that a single metric (a specific measure of the intrinsic dimensionality of the model's features) is fundament… ▽ More We reveal new methods and the theoretical foundations of techniques for editing large language models. We also show how the new theory can be used to assess the editability of models and to expose their susceptibility to previously unknown malicious attacks. Our theoretical approach shows that a single metric (a specific measure of the intrinsic dimensionality of the model's features) is fundamental to predicting the success of popular editing approaches, and reveals new bridges between disparate families of editing methods. We collectively refer to these approaches as stealth editing methods, because they aim to directly and inexpensively update a model's weights to correct the model's responses to known hallucinating prompts without otherwise affecting the model's behaviour, without requiring retraining. By carefully applying the insight gleaned from our theoretical investigation, we are able to introduce a new network block -- named a jet-pack block -- which is optimised for highly selective model editing, uses only standard network operations, and can be inserted into existing networks. The intrinsic dimensionality metric also determines the vulnerability of a language model to a stealth attack: a small change to a model's weights which changes its response to a single attacker-chosen prompt. Stealth attacks do not require access to or knowledge of the model's training data, therefore representing a potent yet previously unrecognised threat to redistributed foundation models. They are computationally simple enough to be implemented in malware in many cases. Extensive experimental results illustrate and support the method and its theoretical underpinnings. Demos and source code for editing language models are available at https://github.com/qinghua-zhou/stealth-edits. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 24 pages, 9 figures. Open source implementation: https://github.com/qinghua-zhou/stealth-edits

MSC Class: 68T07; 68T50; 68W40 ACM Class: I.2.7; F.2.0

arXiv:2402.07631 [pdf, other]

Higher-order Connection Laplacians for Directed Simplicial Complexes

Authors: Xue Gong, Desmond J. Higham, Konstantinos Zygalakis, Ginestra Bianconi

Abstract: Higher-order networks encode the many-body interactions existing in complex systems, such as the brain, protein complexes, and social interactions. Simplicial complexes are higher-order networks that allow a comprehensive investigation of the interplay between topology and dynamics. However, simplicial complexes have the limitation that they only capture undirected higher-order interactions while… ▽ More Higher-order networks encode the many-body interactions existing in complex systems, such as the brain, protein complexes, and social interactions. Simplicial complexes are higher-order networks that allow a comprehensive investigation of the interplay between topology and dynamics. However, simplicial complexes have the limitation that they only capture undirected higher-order interactions while in real-world scenarios, often there is a need to introduce the direction of simplices, extending the popular notion of direction of edges. On graphs and networks the Magnetic Laplacian, a special case of Connection Laplacian, is becoming a popular operator to treat edge directionality. Here we tackle the challenge of treating directional simplicial complexes by formulating Higher-order Connection Laplacians taking into account the configurations induced by the simplices' directions. Specifically, we define all the Connection Laplacians of directed simplicial complexes of dimension two and we discuss the induced higher-order diffusion dynamics by considering instructive synthetic examples of simplicial complexes. The proposed higher-order diffusion processes can be adopted in real scenarios when we want to consider higher-order diffusion displaying non-trivial frustration effects due to conflicting directionalities of the incident simplices. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: 34 pages, 13 figures

arXiv:2312.14977 [pdf, other]

Diffusion Models for Generative Artificial Intelligence: An Introduction for Applied Mathematicians

Authors: Catherine F. Higham, Desmond J. Higham, Peter Grindrod

Abstract: Generative artificial intelligence (AI) refers to algorithms that create synthetic but realistic output. Diffusion models currently offer state of the art performance in generative AI for images. They also form a key component in more general tools, including text-to-image generators and large language models. Diffusion models work by adding noise to the available training data and then learning h… ▽ More Generative artificial intelligence (AI) refers to algorithms that create synthetic but realistic output. Diffusion models currently offer state of the art performance in generative AI for images. They also form a key component in more general tools, including text-to-image generators and large language models. Diffusion models work by adding noise to the available training data and then learning how to reverse the process. The reverse operation may then be applied to new random data in order to produce new outputs. We provide a brief introduction to diffusion models for applied mathematicians and statisticians. Our key aims are (a) to present illustrative computational examples, (b) to give a careful derivation of the underlying mathematical formulas involved, and (c) to draw a connection with partial differential equation (PDE) diffusion models. We provide code for the computational experiments. We hope that this topic will be of interest to advanced undergraduate students and postgraduate students. Portions of the material may also provide useful motivational examples for those who teach courses in stochastic processes, inference, machine learning, PDEs or scientific computing. △ Less

Submitted 21 December, 2023; originally announced December 2023.

MSC Class: 68T07; 60J60 ACM Class: I.2; I.2.6

arXiv:2311.17128 [pdf, other]

Vulnerability Analysis of Transformer-based Optical Character Recognition to Adversarial Attacks

Authors: Lucas Beerens, Desmond J. Higham

Abstract: Recent advancements in Optical Character Recognition (OCR) have been driven by transformer-based models. OCR systems are critical in numerous high-stakes domains, yet their vulnerability to adversarial attack remains largely uncharted territory, raising concerns about security and compliance with emerging AI regulations. In this work we present a novel framework to assess the resilience of Transfo… ▽ More Recent advancements in Optical Character Recognition (OCR) have been driven by transformer-based models. OCR systems are critical in numerous high-stakes domains, yet their vulnerability to adversarial attack remains largely uncharted territory, raising concerns about security and compliance with emerging AI regulations. In this work we present a novel framework to assess the resilience of Transformer-based OCR (TrOCR) models. We develop and assess algorithms for both targeted and untargeted attacks. For the untargeted case, we measure the Character Error Rate (CER), while for the targeted case we use the success ratio. We find that TrOCR is highly vulnerable to untargeted attacks and somewhat less vulnerable to targeted attacks. On a benchmark handwriting data set, untargeted attacks can cause a CER of more than 1 without being noticeable to the eye. With a similar perturbation size, targeted attacks can lead to success rates of around $25\%$ -- here we attacked single tokens, requiring TrOCR to output the tenth most likely token from a large vocabulary. △ Less

Submitted 28 November, 2023; originally announced November 2023.

MSC Class: 65F35 ACM Class: I.2.10; G.1.3

arXiv:2309.07072 [pdf, ps, other]

The Boundaries of Verifiable Accuracy, Robustness, and Generalisation in Deep Learning

Authors: Alexander Bastounis, Alexander N. Gorban, Anders C. Hansen, Desmond J. Higham, Danil Prokhorov, Oliver Sutton, Ivan Y. Tyukin, Qinghua Zhou

Abstract: In this work, we assess the theoretical limitations of determining guaranteed stability and accuracy of neural networks in classification tasks. We consider classical distribution-agnostic framework and algorithms minimising empirical risks and potentially subjected to some weights regularisation. We show that there is a large family of tasks for which computing and verifying ideal stable and accu… ▽ More In this work, we assess the theoretical limitations of determining guaranteed stability and accuracy of neural networks in classification tasks. We consider classical distribution-agnostic framework and algorithms minimising empirical risks and potentially subjected to some weights regularisation. We show that there is a large family of tasks for which computing and verifying ideal stable and accurate neural networks in the above settings is extremely challenging, if at all possible, even when such ideal solutions exist within the given class of neural architectures. △ Less

Submitted 13 September, 2023; originally announced September 2023.

MSC Class: 68T07; 68T05

arXiv:2309.03665 [pdf, other]

How adversarial attacks can disrupt seemingly stable accurate classifiers

Authors: Oliver J. Sutton, Qinghua Zhou, Ivan Y. Tyukin, Alexander N. Gorban, Alexander Bastounis, Desmond J. Higham

Abstract: Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show th… ▽ More Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability -- notably the simultaneous susceptibility of the (otherwise accurate) model to easily constructed adversarial attacks, and robustness to random perturbations of the input data. We confirm that the same phenomena are directly observed in practical neural networks trained on standard image classification problems, where even large additive random noise fails to trigger the adversarial instability of the network. A surprising takeaway is that even small margins separating a classifier's decision surface from training and testing data can hide adversarial susceptibility from being detected using randomly sampled perturbations. Counterintuitively, using additive noise during training or testing is therefore inefficient for eradicating or detecting adversarial examples, and more demanding adversarial training is required. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 11 pages, 8 figures, additional supplementary materials

arXiv:2308.15092 [pdf, other]

Can We Rely on AI?

Authors: Desmond J. Higham

Abstract: Over the last decade, adversarial attack algorithms have revealed instabilities in deep learning tools. These algorithms raise issues regarding safety, reliability and interpretability in artificial intelligence; especially in high risk settings. From a practical perspective, there has been a war of escalation between those develo** attack and defence strategies. At a more theoretical level, res… ▽ More Over the last decade, adversarial attack algorithms have revealed instabilities in deep learning tools. These algorithms raise issues regarding safety, reliability and interpretability in artificial intelligence; especially in high risk settings. From a practical perspective, there has been a war of escalation between those develo** attack and defence strategies. At a more theoretical level, researchers have also studied bigger picture questions concerning the existence and computability of attacks. Here we give a brief overview of the topic, focusing on aspects that are likely to be of interest to researchers in applied and computational mathematics. △ Less

Submitted 29 August, 2023; originally announced August 2023.

MSC Class: 68T01; 68T05; 90C31 ACM Class: I.2.0; I.5.0

arXiv:2306.14266 [pdf, other]

Estimating Network Dimension When the Spectrum Struggles

Authors: Peter Grindrod, Desmond John Higham, Henry-Louis de Kergorlay

Abstract: What is the dimension of a network? Here, we view it as the smallest dimension of Euclidean space into which nodes can be embedded so that pairwise distances accurately reflect the connectivity structure. We show that a recently proposed and extremely efficient algorithm for data clouds, based on computing first and second nearest neighbour distances, can be used as the basis of an approach for es… ▽ More What is the dimension of a network? Here, we view it as the smallest dimension of Euclidean space into which nodes can be embedded so that pairwise distances accurately reflect the connectivity structure. We show that a recently proposed and extremely efficient algorithm for data clouds, based on computing first and second nearest neighbour distances, can be used as the basis of an approach for estimating the dimension of a network with weighted edges. We also show how the algorithm can be extended to unweighted networks when combined with spectral embedding. We illustrate the advantages of this technique over the widely-used approach of characterising dimension by visually searching for a suitable gap in the spectrum of the Laplacian. △ Less

Submitted 25 June, 2023; originally announced June 2023.

MSC Class: 05C20; 05C80; 05C85; 05C90; 05C82 ACM Class: G.2.2

arXiv:2306.02918 [pdf, other]

Adversarial Ink: Componentwise Backward Error Attacks on Deep Learning

Authors: Lucas Beerens, Desmond J. Higham

Abstract: Deep neural networks are capable of state-of-the-art performance in many classification tasks. However, they are known to be vulnerable to adversarial attacks -- small perturbations to the input that lead to a change in classification. We address this issue from the perspective of backward error and condition number, concepts that have proved useful in numerical analysis. To do this, we build on t… ▽ More Deep neural networks are capable of state-of-the-art performance in many classification tasks. However, they are known to be vulnerable to adversarial attacks -- small perturbations to the input that lead to a change in classification. We address this issue from the perspective of backward error and condition number, concepts that have proved useful in numerical analysis. To do this, we build on the work of Beuzeville et al. (2021). In particular, we develop a new class of attack algorithms that use componentwise relative perturbations. Such attacks are highly relevant in the case of handwritten documents or printed texts where, for example, the classification of signatures, postcodes, dates or numerical quantities may be altered by changing only the ink consistency and not the background. This makes the perturbed images look natural to the naked eye. Such ``adversarial ink'' attacks therefore reveal a weakness that can have a serious impact on safety and security. We illustrate the new attacks on real data and contrast them with existing algorithms. We also study the use of a componentwise condition number to quantify vulnerability. △ Less

Submitted 5 June, 2023; originally announced June 2023.

MSC Class: 65F35 ACM Class: I.2.10; G.1.3

arXiv:2207.13895 [pdf, ps, other]

Generative Hypergraph Models and Spectral Embedding

Authors: Xue Gong, Desmond J. Higham, Konstantinos Zygalakis

Abstract: Many complex systems involve interactions between more than two agents. Hypergraphs capture these higher-order interactions through hyperedges that may link more than two nodes. We consider the problem of embedding a hypergraph into low-dimensional Euclidean space so that most interactions are short-range. This embedding is relevant to many follow-on tasks, such as node reordering, clustering, and… ▽ More Many complex systems involve interactions between more than two agents. Hypergraphs capture these higher-order interactions through hyperedges that may link more than two nodes. We consider the problem of embedding a hypergraph into low-dimensional Euclidean space so that most interactions are short-range. This embedding is relevant to many follow-on tasks, such as node reordering, clustering, and visualization. We focus on two spectral embedding algorithms customized to hypergraphs which recover linear and periodic structures respectively. In the periodic case, nodes are positioned on the unit circle. We show that the two spectral hypergraph embedding algorithms are associated with a new class of generative hypergraph models. These models generate hyperedges according to node positions in the embedded space and encourage short-range connections. They allow us to quantify the relative presence of periodic and linear structures in the data through maximum likelihood. They also improve the interpretability of node embedding and provide a metric for hyperedge prediction. We demonstrate the hypergraph embedding and follow-on tasks -- including structure quantification, clustering and hyperedge prediction -- on synthetic and real-world hypergraphs. We find that the hypergraph approach can outperform clustering algorithms that use only dyadic edges. We also compare several triadic edge prediction methods on high school contact data where our algorithm improves upon benchmark methods when the amount of training data is limited. △ Less

Submitted 5 January, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

arXiv:2202.12769 [pdf, other]

Core-periphery detection in hypergraphs

Authors: Francesco Tudisco, Desmond J. Higham

Abstract: Core-periphery detection is a key task in exploratory network analysis where one aims to find a core, a set of nodes well-connected internally and with the periphery, and a periphery, a set of nodes connected only (or mostly) with the core. In this work we propose a model of core-periphery for higher-order networks modeled as hypergraphs and we propose a method for computing a core-score vector th… ▽ More Core-periphery detection is a key task in exploratory network analysis where one aims to find a core, a set of nodes well-connected internally and with the periphery, and a periphery, a set of nodes connected only (or mostly) with the core. In this work we propose a model of core-periphery for higher-order networks modeled as hypergraphs and we propose a method for computing a core-score vector that quantifies how close each node is to the core. In particular, we show that this method solves the corresponding non-convex core-periphery optimization problem globally to an arbitrary precision. This method turns out to coincide with the computation of the Perron eigenvector of a nonlinear hypergraph operator, suitably defined in term of the incidence matrix of the hypergraph, generalizing recently proposed centrality models for hypergraphs. We perform several experiments on synthetic and real-world hypergraphs showing that the proposed method outperforms alternative core-periphery detection algorithms, in particular those obtained by transferring established graph methods to the hypergraph setting via clique expansion. △ Less

Submitted 25 February, 2022; originally announced February 2022.

arXiv:2201.01543 [pdf, other]

Testing a QUBO Formulation of Core-periphery Partitioning on a Quantum Annealer

Authors: Catherine F. Higham, Desmond J. Higham, Francesco Tudisco

Abstract: We propose a new kernel that quantifies success for the task of computing a core-periphery partition for an undirected network. Finding the associated optimal partitioning may be expressed in the form of a quadratic unconstrained binary optimization (QUBO) problem, to which a state-of-the-art quantum annealer may be applied. We therefore make use of the new objective function to (a) judge the perf… ▽ More We propose a new kernel that quantifies success for the task of computing a core-periphery partition for an undirected network. Finding the associated optimal partitioning may be expressed in the form of a quadratic unconstrained binary optimization (QUBO) problem, to which a state-of-the-art quantum annealer may be applied. We therefore make use of the new objective function to (a) judge the performance of a quantum annealer, and (b) compare this approach with existing heuristic core-periphery partitioning methods. The quantum annealing is performed on the commercially available D-Wave machine. The QUBO problem involves a full matrix even when the underlying network is sparse. Hence, we develop and test a sparsified version of the original QUBO which increases the available problem dimension for the quantum annealer. Results are provided on both synthetic and real data sets, and we conclude that the QUBO/quantum annealing approach offers benefits in terms of optimizing this new quantity of interest. △ Less

Submitted 5 January, 2022; originally announced January 2022.

arXiv:2111.05715 [pdf, ps, other]

A Hierarchy of Network Models Giving Bistability Under Triadic Closure

Authors: Stefano Di Giovacchino, Desmond J. Higham, Konstantinos C. Zygalakis

Abstract: Triadic closure describes the tendency for new friendships to form between individuals who already have friends in common. It has been argued heuristically that the triadic closure effect can lead to bistability in the formation of large-scale social interaction networks. Here, depending on the initial state and the transient dynamics, the system may evolve towards either of two long-time states.… ▽ More Triadic closure describes the tendency for new friendships to form between individuals who already have friends in common. It has been argued heuristically that the triadic closure effect can lead to bistability in the formation of large-scale social interaction networks. Here, depending on the initial state and the transient dynamics, the system may evolve towards either of two long-time states. In this work, we propose and study a hierarchy of network evolution models that incorporate triadic closure, building on the work of Grindrod, Higham, and Parsons [Internet Mathematics, 8, 2012, 402--423]. We use a chemical kinetics framework, paying careful attention to the reaction rate scaling with respect to the system size. In a macroscale regime, we show rigorously that a bimodal steady-state distribution is admitted. This behavior corresponds to the existence of two distinct stable fixed points in a deterministic mean-field ODE. The macroscale model is also seen to capture an apparent metastability property of the microscale system. Computational simulations are used to support the analysis. △ Less

Submitted 10 November, 2021; originally announced November 2021.

Comments: 20 pages, 9 figures

MSC Class: 60J20; 60J74; 68R10

arXiv:2107.03026 [pdf, ps, other]

doi 10.1098/rsos.211144

Directed Network Laplacians and Random Graph Models

Authors: Xue Gong, Desmond John Higham, Konstantinos Zygalakis

Abstract: We consider spectral methods that uncover hidden structures in directed networks. We establish and exploit connections between node reordering via (a) minimizing an objective function and (b) maximizing the likelihood of a random graph model. We focus on two existing spectral approaches that build and analyse Laplacian-style matrices via the minimization of frustration and trophic incoherence. The… ▽ More We consider spectral methods that uncover hidden structures in directed networks. We establish and exploit connections between node reordering via (a) minimizing an objective function and (b) maximizing the likelihood of a random graph model. We focus on two existing spectral approaches that build and analyse Laplacian-style matrices via the minimization of frustration and trophic incoherence. These algorithms aim to reveal directed periodic and linear hierarchies, respectively. We show that reordering nodes using the two algorithms, or map** them onto a specified lattice, is associated with new classes of directed random graph models. Using this random graph setting, we are able to compare the two algorithms on a given network and quantify which structure is more likely to be present. We illustrate the approach on synthetic and real networks, and discuss practical implementation issues. △ Less

Submitted 11 October, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

MSC Class: 05C20; 05C80; 05C85; 05C90; 05C82

arXiv:2106.13997 [pdf, other]

doi 10.1093/imamat/hxad027

The Feasibility and Inevitability of Stealth Attacks

Authors: Ivan Y. Tyukin, Desmond J. Higham, Alexander Bastounis, Eliyas Woldegeorgis, Alexander N. Gorban

Abstract: We develop and study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence (AI) systems including deep learning neural networks. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself. Such a stealth attack could be conducted by a mischievous, corrupt or disgr… ▽ More We develop and study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence (AI) systems including deep learning neural networks. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself. Such a stealth attack could be conducted by a mischievous, corrupt or disgruntled member of a software development team. It could also be made by those wishing to exploit a ``democratization of AI'' agenda, where network architectures and trained parameter sets are shared publicly. We develop a range of new implementable attack strategies with accompanying analysis, showing that with high probability a stealth attack can be made transparent, in the sense that system performance is unchanged on a fixed validation set which is unknown to the attacker, while evoking any desired output on a trigger input of interest. The attacker only needs to have estimates of the size of the validation set and the spread of the AI's relevant latent space. In the case of deep learning neural networks, we show that a one neuron attack is possible - a modification to the weights and bias associated with a single neuron - revealing a vulnerability arising from over-parameterization. We illustrate these concepts using state of the art architectures on two standard image data sets. Guided by the theory and computational results, we also propose strategies to guard against stealth attacks. △ Less

Submitted 4 January, 2023; v1 submitted 26 June, 2021; originally announced June 2021.

MSC Class: 68T01; 68T05; 90C31

Journal ref: IMA Journal of Applied Mathematics, October 2023, hxad027

arXiv:2103.07319 [pdf, other]

doi 10.1098/rspa.2021.0232

Epidemics on Hypergraphs: Spectral Thresholds for Extinction

Authors: Desmond John Higham, Henry-Louis de Kergorlay

Abstract: Epidemic spreading is well understood when a disease propagates around a contact graph. In a stochastic susceptible-infected-susceptible setting, spectral conditions characterise whether the disease vanishes. However, modelling human interactions using a graph is a simplification which only considers pairwise relationships. This does not fully represent the more realistic case where people mee… ▽ More Epidemic spreading is well understood when a disease propagates around a contact graph. In a stochastic susceptible-infected-susceptible setting, spectral conditions characterise whether the disease vanishes. However, modelling human interactions using a graph is a simplification which only considers pairwise relationships. This does not fully represent the more realistic case where people meet in groups. Hyperedges can be used to record such group interactions, yielding more faithful and flexible models, allowing for the rate of infection of a node to vary as a nonlinear function of the number of infectious neighbors. We discuss different types of contagion models in this hypergraph setting, and derive spectral conditions that characterize whether the disease vanishes. We study both the exact individual-level stochastic model and a deterministic mean field ODE approximation. Numerical simulations are provided to illustrate the analysis. We also interpret our results and show how the hypergraph model allows us to distinguish between contributions to infectiousness that (a) are inherent in the nature of the pathogen and (b) arise from behavioural choices (such as social distancing, increased hygiene and use of masks). This raises the possibility of more accurately quantifying the effect of interventions that are designed to contain the spread of a virus. △ Less

Submitted 12 March, 2021; originally announced March 2021.

MSC Class: 92C60; 37N25; 05C65 ACM Class: J.4; J.3

arXiv:2103.05031 [pdf, other]

Higher-order Network Analysis Takes Off, Fueled by Classical Ideas and New Data

Authors: Austin R. Benson, David F. Gleich, Desmond J. Higham

Abstract: Higher-order network analysis uses the ideas of hypergraphs, simplicial complexes, multilinear and tensor algebra, and more, to study complex systems. These are by now well established mathematical abstractions. What's new is that the ideas can be tested and refined on the type of large-scale data arising in today's digital world. This research area therefore is making an impact across many applic… ▽ More Higher-order network analysis uses the ideas of hypergraphs, simplicial complexes, multilinear and tensor algebra, and more, to study complex systems. These are by now well established mathematical abstractions. What's new is that the ideas can be tested and refined on the type of large-scale data arising in today's digital world. This research area therefore is making an impact across many applications. Here, we provide a brief history, guide, and survey. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: Based on the SIAM News online article https://sinews.siam.org/Details-Page/higher-order-network-analysis-takes-off-fueled-by-old-ideas-and-new-data

arXiv:2101.06215 [pdf, other]

Node and Edge Nonlinear Eigenvector Centrality for Hypergraphs

Authors: Francesco Tudisco, Desmond J. Higham

Abstract: Network scientists have shown that there is great value in studying pairwise interactions between components in a system. From a linear algebra point of view, this involves defining and evaluating functions of the associated adjacency matrix. Recent work indicates that there are further benefits from accounting directly for higher order interactions, notably through a hypergraph representation whe… ▽ More Network scientists have shown that there is great value in studying pairwise interactions between components in a system. From a linear algebra point of view, this involves defining and evaluating functions of the associated adjacency matrix. Recent work indicates that there are further benefits from accounting directly for higher order interactions, notably through a hypergraph representation where an edge may involve multiple nodes. Building on these ideas, we motivate, define and analyze a class of spectral centrality measures for identifying important nodes and hyperedges in hypergraphs, generalizing existing network science concepts. By exploiting the latest developments in nonlinear Perron-Frobenius theory, we show how the resulting constrained nonlinear eigenvalue problems have unique solutions that can be computed efficiently via a nonlinear power method iteration. We illustrate the measures on realistic data sets. △ Less

Submitted 24 August, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

arXiv:2012.02999 [pdf, other]

A Theory for Backtrack-Downweighted Walks

Authors: Francesca Arrigo, Desmond J. Higham, Vanni Noferini

Abstract: We develop a complete theory for the combinatorics of walk-counting on a directed graph in the case where each backtracking step is downweighted by a given factor. By deriving expressions for the associated generating functions, we also obtain linear systems for computing centrality measures in this setting. In particular, we show that backtrack-downweighted Katz-style network centrality can be co… ▽ More We develop a complete theory for the combinatorics of walk-counting on a directed graph in the case where each backtracking step is downweighted by a given factor. By deriving expressions for the associated generating functions, we also obtain linear systems for computing centrality measures in this setting. In particular, we show that backtrack-downweighted Katz-style network centrality can be computed at the same cost as standard Katz. Studying the limit of this centrality measure at its radius of convergence also leads to a new expression for backtrack-downweighted eigenvector centrality that generalizes previous work to the case where directed edges are present. The new theory allows us to combine advantages of standard and nonbacktracking cases, avoiding localization while accounting for tree-like structures. We illustrate the behaviour of the backtrack-downweighted centrality measure on both synthetic and real networks. △ Less

Submitted 5 December, 2020; originally announced December 2020.

MSC Class: 05C50; 05C82; 68R10

arXiv:2006.13984 [pdf, other]

Consistency of Anchor-based Spectral Clustering

Authors: Henry-Louis de Kergorlay, Desmond John Higham

Abstract: Anchor-based techniques reduce the computational complexity of spectral clustering algorithms. Although empirical tests have shown promising results, there is currently a lack of theoretical support for the anchoring approach. We define a specific anchor-based algorithm and show that it is amenable to rigorous analysis, as well as being effective in practice. We establish the theoretical consisten… ▽ More Anchor-based techniques reduce the computational complexity of spectral clustering algorithms. Although empirical tests have shown promising results, there is currently a lack of theoretical support for the anchoring approach. We define a specific anchor-based algorithm and show that it is amenable to rigorous analysis, as well as being effective in practice. We establish the theoretical consistency of the method in an asymptotic setting where data is sampled from an underlying continuous probability distribution. In particular, we provide sharp asymptotic conditions for the algorithm parameters which ensure that the anchor-based method can recover with high probability disjoint clusters that are mutually separated by a positive distance. We illustrate the performance of the algorithm on synthetic data and explain how the theoretical convergence analysis can be used to inform the practical choice of parameter scalings. We also test the accuracy and efficiency of the algorithm on two large scale real data sets. We find that the algorithm offers clear advantages over standard spectral clustering. We also find that it is competitive with the state-of-the-art LSC method of Chen and Cai (Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011), while having the added benefit of a consistency guarantee. △ Less

Submitted 27 June, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

arXiv:2004.04479 [pdf, ps, other]

doi 10.1109/IJCNN48605.2020.9207472

On Adversarial Examples and Stealth Attacks in Artificial Intelligence Systems

Authors: Ivan Y. Tyukin, Desmond J. Higham, Alexander N. Gorban

Abstract: In this work we present a formal theoretical framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. Our results apply to general multi-class classifiers that map from an input space into a decision space, including artificial neural networks used in deep learning applications. Two classes of attacks are considered. The first cla… ▽ More In this work we present a formal theoretical framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. Our results apply to general multi-class classifiers that map from an input space into a decision space, including artificial neural networks used in deep learning applications. Two classes of attacks are considered. The first class involves adversarial examples and concerns the introduction of small perturbations of the input data that cause misclassification. The second class, introduced here for the first time and named stealth attacks, involves small perturbations to the AI system itself. Here the perturbed system produces whatever output is desired by the attacker on a specific small data set, perhaps even a single input, but performs as normal on a validation set (which is unknown to the attacker). We show that in both cases, i.e., in the case of an attack based on adversarial examples and in the case of a stealth attack, the dimensionality of the AI's decision-making space is a major contributor to the AI's susceptibility. For attacks based on adversarial examples, a second crucial parameter is the absence of local concentrations in the data probability distribution, a property known as Smeared Absolute Continuity. According to our findings, robustness to adversarial examples requires either (a) the data distributions in the AI's feature space to have concentrated probability density functions or (b) the dimensionality of the AI's decision variables to be sufficiently small. We also show how to construct stealth attacks on high-dimensional AI systems that are hard to spot unless the validation set is made exponentially large. △ Less

Submitted 9 April, 2020; originally announced April 2020.

MSC Class: 68T05; 68T10; 90C31

Journal ref: 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, United Kingdom, 2020

arXiv:1910.12711 [pdf, other]

doi 10.1098/rspa.2019.0724

A framework for second order eigenvector centralities and clustering coefficients

Authors: Francesca Arrigo, Desmond J. Higham, Francesco Tudisco

Abstract: We propose and analyse a general tensor-based framework for incorporating second order features into network measures. This approach allows us to combine traditional pairwise links with information that records whether triples of nodes are involved in wedges or triangles. Our treatment covers classical spectral methods and recently proposed cases from the literature, but we also identify many inte… ▽ More We propose and analyse a general tensor-based framework for incorporating second order features into network measures. This approach allows us to combine traditional pairwise links with information that records whether triples of nodes are involved in wedges or triangles. Our treatment covers classical spectral methods and recently proposed cases from the literature, but we also identify many interesting extensions. In particular, we define a mutually-reinforcing (spectral) version of the classical clustering coefficient. The underlying object of study is a constrained nonlinear eigenvalue problem associated with a cubic tensor. Using recent results from nonlinear Perron--Frobenius theory, we establish existence and uniqueness under appropriate conditions, and show that the new spectral measures can be computed efficiently with a nonlinear power method. To illustrate the added value of the new formulation, we analyse the measures on a class of synthetic networks. We also give computational results on centrality and link prediction for real-world networks. △ Less

Submitted 28 October, 2019; originally announced October 2019.

arXiv:1807.01496 [pdf, other]

Centrality-Friendship Paradoxes: When Our Friends Are More Important Than Us

Authors: Desmond J. Higham

Abstract: The friendship paradox states that, on average, our friends have more friends than we do. In network terms, the average degree over the nodes can never exceed the average degree over the neighbours of nodes. This effect, which is a classic example of sampling bias, has attracted much attention in the social science and network science literature, with variations and extensions of the paradox being… ▽ More The friendship paradox states that, on average, our friends have more friends than we do. In network terms, the average degree over the nodes can never exceed the average degree over the neighbours of nodes. This effect, which is a classic example of sampling bias, has attracted much attention in the social science and network science literature, with variations and extensions of the paradox being defined, tested and interpreted. Here, we show that a version of the paradox holds rigorously for eigenvector centrality: on average, our friends are more important than us. We then consider general matrix-function centrality, including Katz centrality, and give sufficient conditions for the paradox to hold. We also discuss which results can be generalized to the cases of directed and weighted edges. In this way, we add theoretical support for a field that has largely been evolving through empirical testing. △ Less

Submitted 4 July, 2018; originally announced July 2018.

MSC Class: 68R10; 94C15 ACM Class: G.2.2; F.2.1

arXiv:1804.09820 [pdf, other]

A Nonlinear Spectral Method for Core--Periphery Detection in Networks

Authors: Francesco Tudisco, Desmond J. Higham

Abstract: We derive and analyse a new iterative algorithm for detecting network core--periphery structure. Using techniques in nonlinear Perron-Frobenius theory, we prove global convergence to the unique solution of a relaxed version of a natural discrete optimization problem. On sparse networks, the cost of each iteration scales linearly with the number of nodes, making the algorithm feasible for large-sca… ▽ More We derive and analyse a new iterative algorithm for detecting network core--periphery structure. Using techniques in nonlinear Perron-Frobenius theory, we prove global convergence to the unique solution of a relaxed version of a natural discrete optimization problem. On sparse networks, the cost of each iteration scales linearly with the number of nodes, making the algorithm feasible for large-scale problems. We give an alternative interpretation of the algorithm from the perspective of maximum likelihood reordering of a new logistic core--periphery random graph model. This viewpoint also gives a new basis for quantitatively judging a core--periphery detection algorithm. We illustrate the algorithm on a range of synthetic and real networks, and show that it offers advantages over the current state-of-the-art. △ Less

Submitted 11 February, 2019; v1 submitted 25 April, 2018; originally announced April 2018.

arXiv:1801.05894 [pdf, other]

Deep Learning: An Introduction for Applied Mathematicians

Authors: Catherine F. Higham, Desmond J. Higham

Abstract: Multilayered artificial neural networks are becoming a pervasive tool in a host of application fields. At the heart of this deep learning revolution are familiar concepts from applied and computational mathematics; notably, in calculus, approximation theory, optimization and linear algebra. This article provides a very brief introduction to the basic ideas that underlie deep learning from an appli… ▽ More Multilayered artificial neural networks are becoming a pervasive tool in a host of application fields. At the heart of this deep learning revolution are familiar concepts from applied and computational mathematics; notably, in calculus, approximation theory, optimization and linear algebra. This article provides a very brief introduction to the basic ideas that underlie deep learning from an applied mathematics perspective. Our target audience includes postgraduate and final year undergraduate students in mathematics who are keen to learn about the area. The article may also be useful for instructors in mathematics who wish to enliven their classes with references to the application of deep learning techniques. We focus on three fundamental questions: what is a deep neural network? how is a network trained? what is the stochastic gradient method? We illustrate the ideas with a short MATLAB code that sets up and trains a network. We also show the use of state-of-the art software on a large scale image classification problem. We finish with references to the current literature. △ Less

Submitted 17 January, 2018; originally announced January 2018.

MSC Class: 97R40; 68T01; 65K10; 62M45 ACM Class: G.1.6; I.2.10; I.2.0; I.2.6

arXiv:1511.07305 [pdf, ps, other]

Block Matrix Formulations for Evolving Networks

Authors: Caterina Fenu, Desmond J. Higham

Abstract: Many types of pairwise interaction take the form of a fixed set of nodes with edges that appear and disappear over time. In the case of discrete-time evolution, the resulting evolving network may be represented by a time-ordered sequence of adjacency matrices. We consider here the issue of representing the system as a single, higher dimensional block matrix, built from the individual time-slices.… ▽ More Many types of pairwise interaction take the form of a fixed set of nodes with edges that appear and disappear over time. In the case of discrete-time evolution, the resulting evolving network may be represented by a time-ordered sequence of adjacency matrices. We consider here the issue of representing the system as a single, higher dimensional block matrix, built from the individual time-slices. We focus on the task of computing network centrality measures. From a modeling perspective, we show that there is a suitable block formulation that allows us to recover dynamic centrality measures respecting time's arrow. From a computational perspective, we show that the new block formulation leads to the design of more effective numerical algorithms. △ Less

Submitted 1 June, 2016; v1 submitted 23 November, 2015; originally announced November 2015.

Comments: 18 pages, 2 figures

MSC Class: 05C50; 15A69

arXiv:1505.00965 [pdf, other]

An Introduction to Multilevel Monte Carlo for Option Valuation

Authors: Desmond J. Higham

Abstract: Monte Carlo is a simple and flexible tool that is widely used in computational finance. In this context, it is common for the quantity of interest to be the expected value of a random variable defined via a stochastic differential equation. In 2008, Giles proposed a remarkable improvement to the approach of discretizing with a numerical method and applying standard Monte Carlo. His multilevel Mont… ▽ More Monte Carlo is a simple and flexible tool that is widely used in computational finance. In this context, it is common for the quantity of interest to be the expected value of a random variable defined via a stochastic differential equation. In 2008, Giles proposed a remarkable improvement to the approach of discretizing with a numerical method and applying standard Monte Carlo. His multilevel Monte Carlo method offers an order of speed up given by the inverse of epsilon, where epsilon is the required accuracy. So computations can run 100 times more quickly when two digits of accuracy are required. The multilevel philosophy has since been adopted by a range of researchers and a wealth of practically significant results has arisen, most of which have yet to make their way into the expository literature. In this work, we give a brief, accessible, introduction to multilevel Monte Carlo and summarize recent results applicable to the task of option evaluation. △ Less

Submitted 5 May, 2015; originally announced May 2015.

Comments: Submitted to International Journal of Computer Mathematics, special issue on Computational Methods in Finance

MSC Class: 65C30

arXiv:1409.1831 [pdf, other]

Opportunities at the Mathematics/Future Cities Interface

Authors: Peter Grindrod, Desmond J. Higham, Robert S. MacKay

Abstract: We make the case for mathematicians and statisticians to stake their claim in the fast-moving and high-impact research field that is becoming known as Future Cities. After assessing the Future Cities arena, we provide some illustrative challenges where mathematical scientists can make an impact. We make the case for mathematicians and statisticians to stake their claim in the fast-moving and high-impact research field that is becoming known as Future Cities. After assessing the Future Cities arena, we provide some illustrative challenges where mathematical scientists can make an impact. △ Less

Submitted 5 September, 2014; originally announced September 2014.

Comments: A revised version of this document will appear in SIAM News (Society of Industrial and Applied Mathematics)

Report number: University of Strathclyde Mathematics and Statistics Research Report 10 (2014) MSC Class: 05C82; 94C15; 65F50

arXiv:1406.2017 [pdf, other]

Anticipating Activity in Social Media Spikes

Authors: Desmond J. Higham, Peter Grindrod, Alexander V. Mantzaris, Amanda Otley, Peter Laflin

Abstract: We propose a novel mathematical model for the activity of microbloggers during an external, event-driven spike. The model leads to a testable prediction of who would become most active if a spike were to take place. This type of information is of great interest to commercial organisations, governments and charities, as it identifies key players who can be targeted with information in real time whe… ▽ More We propose a novel mathematical model for the activity of microbloggers during an external, event-driven spike. The model leads to a testable prediction of who would become most active if a spike were to take place. This type of information is of great interest to commercial organisations, governments and charities, as it identifies key players who can be targeted with information in real time when the network is most receptive. The model takes account of the fact that dynamic interactions evolve over an underlying, static network that records who listens to whom. The model is based on the assumption that, in the case where the entire community has become aware of an external news event, a key driver of activity is the motivation to participate by responding to incoming messages. We test the model on a large scale Twitter conversation concerning the appointment of a UK Premier League football club manager. We also present further results for a Bundesliga football match, a marketing event and a television programme. In each case we find that exploiting the underlying connectivity structure improves the prediction of who will be active during a spike. We also show how the half-life of a spike in activity can be quantified in terms of the network size and the typical response rate. △ Less

Submitted 8 June, 2014; originally announced June 2014.

Showing 1–30 of 30 results for author: Higham, D J