-
Inference of Causal Networks using a Topological Threshold
Authors:
Filipe Barroso,
Diogo Gomes,
Gareth J. Baxter
Abstract:
We propose a constraint-based algorithm, which automatically determines causal relevance thresholds, to infer causal networks from data. We call these topological thresholds. We present two methods for determining the threshold: the first seeks a set of edges that leaves no disconnected nodes in the network; the second seeks a causal large connected component in the data.
We tested these methods…
▽ More
We propose a constraint-based algorithm, which automatically determines causal relevance thresholds, to infer causal networks from data. We call these topological thresholds. We present two methods for determining the threshold: the first seeks a set of edges that leaves no disconnected nodes in the network; the second seeks a causal large connected component in the data.
We tested these methods both for discrete synthetic and real data, and compared the results with those obtained for the PC algorithm, which we took as the benchmark. We show that this novel algorithm is generally faster and more accurate than the PC algorithm.
The algorithm for determining the thresholds requires choosing a measure of causality. We tested our methods for Fisher Correlations, commonly used in PC algorithm (for instance in \cite{kalisch2005}), and further proposed a discrete and asymmetric measure of causality, that we called Net Influence, which provided very good results when inferring causal networks from discrete data. This metric allows for inferring directionality of the edges in the process of applying the thresholds, speeding up the inference of causal DAGs.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Towards Highly Expressive Machine Learning Models of Non-Melanoma Skin Cancer
Authors:
Simon M. Thomas,
James G. Lefevre,
Glenn Baxter,
Nicholas A. Hamilton
Abstract:
Pathologists have a rich vocabulary with which they can describe all the nuances of cellular morphology. In their world, there is a natural pairing of images and words. Recent advances demonstrate that machine learning models can now be trained to learn high-quality image features and represent them as discrete units of information. This enables natural language, which is also discrete, to be join…
▽ More
Pathologists have a rich vocabulary with which they can describe all the nuances of cellular morphology. In their world, there is a natural pairing of images and words. Recent advances demonstrate that machine learning models can now be trained to learn high-quality image features and represent them as discrete units of information. This enables natural language, which is also discrete, to be jointly modelled alongside the imaging, resulting in a description of the contents of the imaging. Here we present experiments in applying discrete modelling techniques to the problem domain of non-melanoma skin cancer, specifically, histological images of Intraepidermal Carcinoma (IEC). Implementing a VQ-GAN model to reconstruct high-resolution (256x256) images of IEC images, we trained a sequence-to-sequence transformer to generate natural language descriptions using pathologist terminology. Combined with the idea of interactive concept vectors available by using continuous generative methods, we demonstrate an additional angle of interpretability. The result is a promising means of working towards highly expressive machine learning systems which are not only useful as predictive/classification tools, but also means to further our scientific understanding of disease.
△ Less
Submitted 9 July, 2022;
originally announced July 2022.
-
Machine Learning for Air Transport Planning and Management
Authors:
Graham Wild,
Glenn Baxter,
Pannarat Srisaeng,
Steven Richardson
Abstract:
In this work we compare the performance of several machine learning algorithms applied to the problem of modelling air transport demand. Forecasting in the air transport industry is an essential part of planning and managing because of the economic and financial aspects of the industry. The traditional approach used in airline operations as specified by the International Civil Aviation Organizatio…
▽ More
In this work we compare the performance of several machine learning algorithms applied to the problem of modelling air transport demand. Forecasting in the air transport industry is an essential part of planning and managing because of the economic and financial aspects of the industry. The traditional approach used in airline operations as specified by the International Civil Aviation Organization is the use of a multiple linear regression (MLR) model, utilizing cost variables and economic factors. Here, the performance of models utilizing an artificial neural network (ANN), an adaptive neuro-fuzzy inference system (ANFIS), a genetic algorithm, a support vector machine, and a regression tree are compared to MLR. The ANN and ANFIS had the best performance in terms of the lowest mean squared error.
△ Less
Submitted 30 November, 2021;
originally announced December 2021.
-
Targeted Damage to Interdependent Networks
Authors:
G. J. Baxter,
G. Timár,
J. F. F. Mendes
Abstract:
The giant mutually connected component (GMCC) of an interdependent or multiplex network collapses with a discontinuous hybrid transition under random damage to the network. If the nodes to be damaged are selected in a targeted way, the collapse of the GMCC may occur significantly sooner. Finding the minimal damage set which destroys the largest mutually connected component of a given interdependen…
▽ More
The giant mutually connected component (GMCC) of an interdependent or multiplex network collapses with a discontinuous hybrid transition under random damage to the network. If the nodes to be damaged are selected in a targeted way, the collapse of the GMCC may occur significantly sooner. Finding the minimal damage set which destroys the largest mutually connected component of a given interdependent network is a computationally prohibitive simultaneous optimization problem. We introduce a simple heuristic strategy -- Effective Multiplex Degree -- for targeted attack on interdependent networks that leverages the indirect damage inherent in multiplex networks to achieve a damage set smaller than that found by any other non computationally intensive algorithm. We show that the intuition from single layer networks that decycling (damage of the $2$-core) is the most effective way to destroy the giant component, does not carry over to interdependent networks, and in fact such approaches are worse than simply removing the highest degree nodes.
△ Less
Submitted 24 September, 2018; v1 submitted 12 February, 2018;
originally announced February 2018.
-
Weak percolation on multiplex networks
Authors:
Gareth J. Baxter,
Sergey N. Dorogovtsev,
José F. F. Mendes,
Davide Cellai
Abstract:
Bootstrap percolation is a simple but non-trivial model. It has applications in many areas of science and has been explored on random networks for several decades. In single layer (simplex) networks, it has been recently observed that bootstrap percolation, which is defined as an incremental process, can be seen as the opposite of pruning percolation, where nodes are removed according to a connect…
▽ More
Bootstrap percolation is a simple but non-trivial model. It has applications in many areas of science and has been explored on random networks for several decades. In single layer (simplex) networks, it has been recently observed that bootstrap percolation, which is defined as an incremental process, can be seen as the opposite of pruning percolation, where nodes are removed according to a connectivity rule. Here we propose models of both bootstrap and pruning percolation for multiplex networks. We collectively refer to these two models with the concept of "weak" percolation, to distinguish them from the somewhat classical concept of ordinary ("strong") percolation. While the two models coincide in simplex networks, we show that they decouple when considering multiplexes, giving rise to a wealth of critical phenomena. Our bootstrap model constitutes the simplest example of a contagion process on a multiplex network and has potential applications in critical infrastructure recovery and information security. Moreover, we show that our pruning percolation model may provide a way to diagnose missing layers in a multiplex network. Finally, our analytical approach allows us to calculate critical behavior and characterize critical clusters.
△ Less
Submitted 13 April, 2014; v1 submitted 13 December, 2013;
originally announced December 2013.
-
Software graphs and programmer awareness
Authors:
G. J. Baxter,
M. R. Frean
Abstract:
Dependencies between types in object-oriented software can be viewed as directed graphs, with types as nodes and dependencies as edges. The in-degree and out-degree distributions of such graphs have quite different forms, with the former resembling a power-law distribution and the latter an exponential distribution. This effect appears to be independent of application or type relationship. A sim…
▽ More
Dependencies between types in object-oriented software can be viewed as directed graphs, with types as nodes and dependencies as edges. The in-degree and out-degree distributions of such graphs have quite different forms, with the former resembling a power-law distribution and the latter an exponential distribution. This effect appears to be independent of application or type relationship. A simple generative model is proposed to explore the proposition that the difference arises because the programmer is aware of the out-degree of a type but not of its in-degree. The model reproduces the two distributions, and compares reasonably well to those observed in 14 different type relationships across 12 different Java applications.
△ Less
Submitted 15 February, 2008;
originally announced February 2008.