Search | arXiv e-print repository

Inference of Causal Networks using a Topological Threshold

Authors: Filipe Barroso, Diogo Gomes, Gareth J. Baxter

Abstract: We propose a constraint-based algorithm, which automatically determines causal relevance thresholds, to infer causal networks from data. We call these topological thresholds. We present two methods for determining the threshold: the first seeks a set of edges that leaves no disconnected nodes in the network; the second seeks a causal large connected component in the data. We tested these methods… ▽ More We propose a constraint-based algorithm, which automatically determines causal relevance thresholds, to infer causal networks from data. We call these topological thresholds. We present two methods for determining the threshold: the first seeks a set of edges that leaves no disconnected nodes in the network; the second seeks a causal large connected component in the data. We tested these methods both for discrete synthetic and real data, and compared the results with those obtained for the PC algorithm, which we took as the benchmark. We show that this novel algorithm is generally faster and more accurate than the PC algorithm. The algorithm for determining the thresholds requires choosing a measure of causality. We tested our methods for Fisher Correlations, commonly used in PC algorithm (for instance in \cite{kalisch2005}), and further proposed a discrete and asymmetric measure of causality, that we called Net Influence, which provided very good results when inferring causal networks from discrete data. This metric allows for inferring directionality of the edges in the process of applying the thresholds, speeding up the inference of causal DAGs. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 17 pages, 12 figures

arXiv:2207.05749 [pdf]

Towards Highly Expressive Machine Learning Models of Non-Melanoma Skin Cancer

Authors: Simon M. Thomas, James G. Lefevre, Glenn Baxter, Nicholas A. Hamilton

Abstract: Pathologists have a rich vocabulary with which they can describe all the nuances of cellular morphology. In their world, there is a natural pairing of images and words. Recent advances demonstrate that machine learning models can now be trained to learn high-quality image features and represent them as discrete units of information. This enables natural language, which is also discrete, to be join… ▽ More Pathologists have a rich vocabulary with which they can describe all the nuances of cellular morphology. In their world, there is a natural pairing of images and words. Recent advances demonstrate that machine learning models can now be trained to learn high-quality image features and represent them as discrete units of information. This enables natural language, which is also discrete, to be jointly modelled alongside the imaging, resulting in a description of the contents of the imaging. Here we present experiments in applying discrete modelling techniques to the problem domain of non-melanoma skin cancer, specifically, histological images of Intraepidermal Carcinoma (IEC). Implementing a VQ-GAN model to reconstruct high-resolution (256x256) images of IEC images, we trained a sequence-to-sequence transformer to generate natural language descriptions using pathologist terminology. Combined with the idea of interactive concept vectors available by using continuous generative methods, we demonstrate an additional angle of interpretability. The result is a promising means of working towards highly expressive machine learning systems which are not only useful as predictive/classification tools, but also means to further our scientific understanding of disease. △ Less

Submitted 9 July, 2022; originally announced July 2022.

Comments: 12 figures, 29 pages

ACM Class: I.2.7; I.2.10

arXiv:2112.01301 [pdf]

Machine Learning for Air Transport Planning and Management

Authors: Graham Wild, Glenn Baxter, Pannarat Srisaeng, Steven Richardson

Abstract: In this work we compare the performance of several machine learning algorithms applied to the problem of modelling air transport demand. Forecasting in the air transport industry is an essential part of planning and managing because of the economic and financial aspects of the industry. The traditional approach used in airline operations as specified by the International Civil Aviation Organizatio… ▽ More In this work we compare the performance of several machine learning algorithms applied to the problem of modelling air transport demand. Forecasting in the air transport industry is an essential part of planning and managing because of the economic and financial aspects of the industry. The traditional approach used in airline operations as specified by the International Civil Aviation Organization is the use of a multiple linear regression (MLR) model, utilizing cost variables and economic factors. Here, the performance of models utilizing an artificial neural network (ANN), an adaptive neuro-fuzzy inference system (ANFIS), a genetic algorithm, a support vector machine, and a regression tree are compared to MLR. The ANN and ANFIS had the best performance in terms of the lowest mean squared error. △ Less

Submitted 30 November, 2021; originally announced December 2021.

Comments: 9 pages, 8 figures

MSC Class: 62J05; 62J86; 62P20; 68T07; 68W50; 62A86 ACM Class: G.3

arXiv:1802.03992 [pdf, ps, other]

doi 10.1103/PhysRevE.98.032307

Targeted Damage to Interdependent Networks

Authors: G. J. Baxter, G. Timár, J. F. F. Mendes

Abstract: The giant mutually connected component (GMCC) of an interdependent or multiplex network collapses with a discontinuous hybrid transition under random damage to the network. If the nodes to be damaged are selected in a targeted way, the collapse of the GMCC may occur significantly sooner. Finding the minimal damage set which destroys the largest mutually connected component of a given interdependen… ▽ More The giant mutually connected component (GMCC) of an interdependent or multiplex network collapses with a discontinuous hybrid transition under random damage to the network. If the nodes to be damaged are selected in a targeted way, the collapse of the GMCC may occur significantly sooner. Finding the minimal damage set which destroys the largest mutually connected component of a given interdependent network is a computationally prohibitive simultaneous optimization problem. We introduce a simple heuristic strategy -- Effective Multiplex Degree -- for targeted attack on interdependent networks that leverages the indirect damage inherent in multiplex networks to achieve a damage set smaller than that found by any other non computationally intensive algorithm. We show that the intuition from single layer networks that decycling (damage of the $2$-core) is the most effective way to destroy the giant component, does not carry over to interdependent networks, and in fact such approaches are worse than simply removing the highest degree nodes. △ Less

Submitted 24 September, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

Comments: 9 pages, 9 figures

Journal ref: Phys. Rev. E 98, 032307 (2018)

arXiv:1312.3814 [pdf, other]

doi 10.1103/PhysRevE.89.042801

Weak percolation on multiplex networks

Authors: Gareth J. Baxter, Sergey N. Dorogovtsev, José F. F. Mendes, Davide Cellai

Abstract: Bootstrap percolation is a simple but non-trivial model. It has applications in many areas of science and has been explored on random networks for several decades. In single layer (simplex) networks, it has been recently observed that bootstrap percolation, which is defined as an incremental process, can be seen as the opposite of pruning percolation, where nodes are removed according to a connect… ▽ More Bootstrap percolation is a simple but non-trivial model. It has applications in many areas of science and has been explored on random networks for several decades. In single layer (simplex) networks, it has been recently observed that bootstrap percolation, which is defined as an incremental process, can be seen as the opposite of pruning percolation, where nodes are removed according to a connectivity rule. Here we propose models of both bootstrap and pruning percolation for multiplex networks. We collectively refer to these two models with the concept of "weak" percolation, to distinguish them from the somewhat classical concept of ordinary ("strong") percolation. While the two models coincide in simplex networks, we show that they decouple when considering multiplexes, giving rise to a wealth of critical phenomena. Our bootstrap model constitutes the simplest example of a contagion process on a multiplex network and has potential applications in critical infrastructure recovery and information security. Moreover, we show that our pruning percolation model may provide a way to diagnose missing layers in a multiplex network. Finally, our analytical approach allows us to calculate critical behavior and characterize critical clusters. △ Less

Submitted 13 April, 2014; v1 submitted 13 December, 2013; originally announced December 2013.

Comments: 14 pages, 12 figures

Journal ref: Phys. Rev. E 89, 042801 (2014)

arXiv:0802.2306 [pdf, ps, other]

Software graphs and programmer awareness

Authors: G. J. Baxter, M. R. Frean

Abstract: Dependencies between types in object-oriented software can be viewed as directed graphs, with types as nodes and dependencies as edges. The in-degree and out-degree distributions of such graphs have quite different forms, with the former resembling a power-law distribution and the latter an exponential distribution. This effect appears to be independent of application or type relationship. A sim… ▽ More Dependencies between types in object-oriented software can be viewed as directed graphs, with types as nodes and dependencies as edges. The in-degree and out-degree distributions of such graphs have quite different forms, with the former resembling a power-law distribution and the latter an exponential distribution. This effect appears to be independent of application or type relationship. A simple generative model is proposed to explore the proposition that the difference arises because the programmer is aware of the out-degree of a type but not of its in-degree. The model reproduces the two distributions, and compares reasonably well to those observed in 14 different type relationships across 12 different Java applications. △ Less

Submitted 15 February, 2008; originally announced February 2008.

Comments: 9 pages, 8 figures

Showing 1–6 of 6 results for author: Baxter, G