-
An Interactive Human-Machine Learning Interface for Collecting and Learning from Complex Annotations
Authors:
Jonathan Erskine,
Matt Clifford,
Alexander Hepburn,
Raúl Santos-Rodríguez
Abstract:
Human-Computer Interaction has been shown to lead to improvements in machine learning systems by boosting model performance, accelerating learning and building user confidence. In this work, we aim to alleviate the expectation that human annotators adapt to the constraints imposed by traditional labels by allowing for extra flexibility in the form that supervision information is collected. For thi…
▽ More
Human-Computer Interaction has been shown to lead to improvements in machine learning systems by boosting model performance, accelerating learning and building user confidence. In this work, we aim to alleviate the expectation that human annotators adapt to the constraints imposed by traditional labels by allowing for extra flexibility in the form that supervision information is collected. For this, we propose a human-machine learning interface for binary classification tasks which enables human annotators to utilise counterfactual examples to complement standard binary labels as annotations for a dataset. Finally we discuss the challenges in future extensions of this work.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Safe and Robust Reinforcement Learning: Principles and Practice
Authors:
Taku Yamagata,
Raul Santos-Rodriguez
Abstract:
Reinforcement Learning (RL) has shown remarkable success in solving relatively complex tasks, yet the deployment of RL systems in real-world scenarios poses significant challenges related to safety and robustness. This paper aims to identify and further understand those challenges thorough the exploration of the main dimensions of the safe and robust RL landscape, encompassing algorithmic, ethical…
▽ More
Reinforcement Learning (RL) has shown remarkable success in solving relatively complex tasks, yet the deployment of RL systems in real-world scenarios poses significant challenges related to safety and robustness. This paper aims to identify and further understand those challenges thorough the exploration of the main dimensions of the safe and robust RL landscape, encompassing algorithmic, ethical, and practical considerations. We conduct a comprehensive review of methodologies and open problems that summarizes the efforts in recent years to address the inherent risks associated with RL applications.
After discussing and proposing definitions for both safe and robust RL, the paper categorizes existing research works into different algorithmic approaches that enhance the safety and robustness of RL agents. We examine techniques such as uncertainty estimation, optimisation methodologies, exploration-exploitation trade-offs, and adversarial training. Environmental factors, including sim-to-real transfer and domain adaptation, are also scrutinized to understand how RL systems can adapt to diverse and dynamic surroundings. Moreover, human involvement is an integral ingredient of the analysis, acknowledging the broad set of roles that humans can take in this context.
Importantly, to aid practitioners in navigating the complexities of safe and robust RL implementation, this paper introduces a practical checklist derived from the synthesized literature. The checklist encompasses critical aspects of algorithm design, training environment considerations, and ethical guidelines. It will serve as a resource for developers and policymakers alike to ensure the responsible deployment of RL systems in many application domains.
△ Less
Submitted 30 March, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Evaluating Perceptual Distances by Fitting Binomial Distributions to Two-Alternative Forced Choice Data
Authors:
Alexander Hepburn,
Raul Santos-Rodriguez,
Javier Portilla
Abstract:
The two-alternative forced choice (2AFC) experimental setup is popular in the visual perception literature, where practitioners aim to understand how human observers perceive distances within triplets that consist of a reference image and two distorted versions of that image. In the past, this had been conducted in controlled environments, with a tournament-style algorithm dictating which images a…
▽ More
The two-alternative forced choice (2AFC) experimental setup is popular in the visual perception literature, where practitioners aim to understand how human observers perceive distances within triplets that consist of a reference image and two distorted versions of that image. In the past, this had been conducted in controlled environments, with a tournament-style algorithm dictating which images are shown to each participant to rank the distorted images. Recently, crowd-sourced perceptual datasets have emerged, with no images shared between triplets, making ranking impossible. Evaluating perceptual distances using this data is non-trivial, relying on reducing the collection of judgements on a triplet to a binary decision -- which is suboptimal and prone to misleading conclusions. Instead, we statistically model the underlying decision-making process during 2AFC experiments using a binomial distribution. We use maximum likelihood estimation to fit a distribution to the perceptual judgements, conditioned on the perceptual distance to test and impose consistency and smoothness between our empirical estimates of the density. This way, we can evaluate a different number of judgements per triplet, and can calculate metrics such as likelihoods of judgements according to a set of distances -- key ingredients that neural network counterparts lack.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Monitoring Sustainable Global Development Along Shared Socioeconomic Pathways
Authors:
Michelle W. L. Wan,
Jeffrey N. Clark,
Edward A. Small,
Elena Fillola Mayoral,
Raúl Santos-Rodríguez
Abstract:
Sustainable global development is one of the most prevalent challenges facing the world today, hinging on the equilibrium between socioeconomic growth and environmental sustainability. We propose approaches to monitor and quantify sustainable development along the Shared Socioeconomic Pathways (SSPs), including mathematically derived scoring algorithms, and machine learning methods. These integrat…
▽ More
Sustainable global development is one of the most prevalent challenges facing the world today, hinging on the equilibrium between socioeconomic growth and environmental sustainability. We propose approaches to monitor and quantify sustainable development along the Shared Socioeconomic Pathways (SSPs), including mathematically derived scoring algorithms, and machine learning methods. These integrate socioeconomic and environmental datasets, to produce an interpretable metric for SSP alignment. An initial study demonstrates promising results, laying the groundwork for the application of different methods to the monitoring of sustainable global development.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Data is Overrated: Perceptual Metrics Can Lead Learning in the Absence of Training Data
Authors:
Tashi Namgyal,
Alexander Hepburn,
Raul Santos-Rodriguez,
Valero Laparra,
Jesus Malo
Abstract:
Perceptual metrics are traditionally used to evaluate the quality of natural signals, such as images and audio. They are designed to mimic the perceptual behaviour of human observers and usually reflect structures found in natural signals. This motivates their use as loss functions for training generative models such that models will learn to capture the structure held in the metric. We take this…
▽ More
Perceptual metrics are traditionally used to evaluate the quality of natural signals, such as images and audio. They are designed to mimic the perceptual behaviour of human observers and usually reflect structures found in natural signals. This motivates their use as loss functions for training generative models such that models will learn to capture the structure held in the metric. We take this idea to the extreme in the audio domain by training a compressive autoencoder to reconstruct uniform noise, in lieu of natural data. We show that training with perceptual losses improves the reconstruction of spectrograms and re-synthesized audio at test time over models trained with a standard Euclidean loss. This demonstrates better generalisation to unseen natural signals when using perceptual metrics.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
LL-VQ-VAE: Learnable Lattice Vector-Quantization For Efficient Representations
Authors:
Ahmed Khalil,
Robert Piechocki,
Raul Santos-Rodriguez
Abstract:
In this paper we introduce learnable lattice vector quantization and demonstrate its effectiveness for learning discrete representations. Our method, termed LL-VQ-VAE, replaces the vector quantization layer in VQ-VAE with lattice-based discretization. The learnable lattice imposes a structure over all discrete embeddings, acting as a deterrent against codebook collapse, leading to high codebook ut…
▽ More
In this paper we introduce learnable lattice vector quantization and demonstrate its effectiveness for learning discrete representations. Our method, termed LL-VQ-VAE, replaces the vector quantization layer in VQ-VAE with lattice-based discretization. The learnable lattice imposes a structure over all discrete embeddings, acting as a deterrent against codebook collapse, leading to high codebook utilization. Compared to VQ-VAE, our method obtains lower reconstruction errors under the same training conditions, trains in a fraction of the time, and with a constant number of parameters (equal to the embedding dimension $D$), making it a very scalable approach. We demonstrate these results on the FFHQ-1024 dataset and include FashionMNIST and Celeb-A.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Capturing Requirements for a Data Annotation Tool for Intensive Care: Experimental User-Centered Design Study
Authors:
Marceli Wac,
Raul Santos-Rodriguez,
Chris McWilliams,
Christopher Bourdeaux
Abstract:
Intensive care units (ICUs) are complex and data-rich environments. Data routinely collected in the ICUs provides tremendous opportunities for machine learning, but their use comes with significant challenges. Complex problems may require additional input from humans which can be provided through a process of data annotation. Annotation is a complex, time-consuming process that requires domain exp…
▽ More
Intensive care units (ICUs) are complex and data-rich environments. Data routinely collected in the ICUs provides tremendous opportunities for machine learning, but their use comes with significant challenges. Complex problems may require additional input from humans which can be provided through a process of data annotation. Annotation is a complex, time-consuming process that requires domain expertise and technical proficiency. Existing data annotation tools fail to provide an effective solution to this problem. In this study, we investigated clinicians' approach to the annotation task. We focused on establishing the characteristics of the annotation process in the context of clinical data and identifying differences in the annotation workflow between different staff roles. The overall goal was to elicit requirements for a software tool that could facilitate an effective and time-efficient data annotation. We conducted an experiment involving clinicians from the ICUs annotating printed sheets of data. The participants were observed during the task and their actions were analysed in the context of Norman's Interaction Cycle to establish the requirements for the digital tool. The annotation process followed a constant loop of annotation and evaluation, during which participants incrementally analysed and annotated the data. No distinguishable differences were identified between how different staff roles annotate data. We observed preferences towards different methods for applying annotation which varied between different participants and admissions. We established 11 requirements for the digital data annotation tool for the healthcare setting. We conducted a manual data annotation activity to establish the requirements for a digital data annotation tool, characterised the clinicians' approach to annotation and elicited 11 key requirements for effective data annotation software.
△ Less
Submitted 4 June, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
TraCE: Trajectory Counterfactual Explanation Scores
Authors:
Jeffrey N. Clark,
Edward A. Small,
Nawid Keshtmand,
Michelle W. L. Wan,
Elena Fillola Mayoral,
Enrico Werner,
Christopher P. Bourdeaux,
Raul Santos-Rodriguez
Abstract:
Counterfactual explanations, and their associated algorithmic recourse, are typically leveraged to understand, explain, and potentially alter a prediction coming from a black-box classifier. In this paper, we propose to extend the use of counterfactuals to evaluate progress in sequential decision making tasks. To this end, we introduce a model-agnostic modular framework, TraCE (Trajectory Counterf…
▽ More
Counterfactual explanations, and their associated algorithmic recourse, are typically leveraged to understand, explain, and potentially alter a prediction coming from a black-box classifier. In this paper, we propose to extend the use of counterfactuals to evaluate progress in sequential decision making tasks. To this end, we introduce a model-agnostic modular framework, TraCE (Trajectory Counterfactual Explanation) scores, which is able to distill and condense progress in highly complex scenarios into a single value. We demonstrate TraCE's utility across domains by showcasing its main properties in two case studies spanning healthcare and climate change.
△ Less
Submitted 26 January, 2024; v1 submitted 27 September, 2023;
originally announced September 2023.
-
Counterfactual Explanations via Locally-guided Sequential Algorithmic Recourse
Authors:
Edward A. Small,
Jeffrey N. Clark,
Christopher J. McWilliams,
Kacper Sokol,
Jeffrey Chan,
Flora D. Salim,
Raul Santos-Rodriguez
Abstract:
Counterfactuals operationalised through algorithmic recourse have become a powerful tool to make artificial intelligence systems explainable. Conceptually, given an individual classified as y -- the factual -- we seek actions such that their prediction becomes the desired class y' -- the counterfactual. This process offers algorithmic recourse that is (1) easy to customise and interpret, and (2) d…
▽ More
Counterfactuals operationalised through algorithmic recourse have become a powerful tool to make artificial intelligence systems explainable. Conceptually, given an individual classified as y -- the factual -- we seek actions such that their prediction becomes the desired class y' -- the counterfactual. This process offers algorithmic recourse that is (1) easy to customise and interpret, and (2) directly aligned with the goals of each individual. However, the properties of a "good" counterfactual are still largely debated; it remains an open challenge to effectively locate a counterfactual along with its corresponding recourse. Some strategies use gradient-driven methods, but these offer no guarantees on the feasibility of the recourse and are open to adversarial attacks on carefully created manifolds. This can lead to unfairness and lack of robustness. Other methods are data-driven, which mostly addresses the feasibility problem at the expense of privacy, security and secrecy as they require access to the entire training data set. Here, we introduce LocalFACE, a model-agnostic technique that composes feasible and actionable counterfactual explanations using locally-acquired information at each step of the algorithmic recourse. Our explainer preserves the privacy of users by only leveraging data that it specifically requires to construct actionable algorithmic recourse, and protects the model by offering transparency solely in the regions deemed necessary for the intervention.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Strategies for engaging clinical participants in the co-design of software for healthcare domains
Authors:
Marceli Wac,
Raul Santos-Rodriguez,
Chris McWilliams,
Christopher Bourdeaux
Abstract:
Co-design is an effective method for designing software, but implementing it within the clinical setting comes with a set of unique challenges. This makes recruitment and engagement of participants difficult, which has been demonstrated in our study. Our work focused on designing and evaluating a data annotation tool, however, different types of interventions had to be carried out due to poor enga…
▽ More
Co-design is an effective method for designing software, but implementing it within the clinical setting comes with a set of unique challenges. This makes recruitment and engagement of participants difficult, which has been demonstrated in our study. Our work focused on designing and evaluating a data annotation tool, however, different types of interventions had to be carried out due to poor engagement with the study. We evaluated the effectiveness and feasibility of each of these strategies, their applicability to different stages of co-design research and discussed the barriers to participation present among participants from a clinical background.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Privacy in Multimodal Federated Human Activity Recognition
Authors:
Alex Iacob,
Pedro P. B. Gusmão,
Nicholas D. Lane,
Armand K. Koupai,
Mohammud J. Bocus,
Raúl Santos-Rodríguez,
Robert J. Piechocki,
Ryan McConville
Abstract:
Human Activity Recognition (HAR) training data is often privacy-sensitive or held by non-cooperative entities. Federated Learning (FL) addresses such concerns by training ML models on edge clients. This work studies the impact of privacy in federated HAR at a user, environment, and sensor level. We show that the performance of FL for HAR depends on the assumed privacy level of the FL system and pr…
▽ More
Human Activity Recognition (HAR) training data is often privacy-sensitive or held by non-cooperative entities. Federated Learning (FL) addresses such concerns by training ML models on edge clients. This work studies the impact of privacy in federated HAR at a user, environment, and sensor level. We show that the performance of FL for HAR depends on the assumed privacy level of the FL system and primarily upon the colocation of data from different sensors. By avoiding data sharing and assuming privacy at the human or environment level, as prior works have done, the accuracy decreases by 5-7%. However, extending this to the modality level and strictly separating sensor data between multiple clients may decrease the accuracy by 19-42%. As this form of privacy is necessary for the ethical utilisation of passive sensing methods in HAR, we implement a system where clients mutually train both a general FL model and a group-level one per modality. Our evaluation shows that this method leads to only a 7-13% decrease in accuracy, making it possible to build HAR systems with diverse hardware.
△ Less
Submitted 2 June, 2023; v1 submitted 20 May, 2023;
originally announced May 2023.
-
MIDI-Draw: Sketching to Control Melody Generation
Authors:
Tashi Namgyal,
Peter Flach,
Raul Santos-Rodriguez
Abstract:
We describe a proof-of-principle implementation of a system for drawing melodies that abstracts away from a note-level input representation via melodic contours. The aim is to allow users to express their musical intentions without requiring prior knowledge of how notes fit together melodiously. Current approaches to controllable melody generation often require users to choose parameters that are…
▽ More
We describe a proof-of-principle implementation of a system for drawing melodies that abstracts away from a note-level input representation via melodic contours. The aim is to allow users to express their musical intentions without requiring prior knowledge of how notes fit together melodiously. Current approaches to controllable melody generation often require users to choose parameters that are static across a whole sequence, via buttons or sliders. In contrast, our method allows users to quickly specify how parameters should change over time by drawing a contour.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
What You Hear Is What You See: Audio Quality Metrics From Image Quality Metrics
Authors:
Tashi Namgyal,
Alexander Hepburn,
Raul Santos-Rodriguez,
Valero Laparra,
Jesus Malo
Abstract:
In this study, we investigate the feasibility of utilizing state-of-the-art image perceptual metrics for evaluating audio signals by representing them as spectrograms. The encouraging outcome of the proposed approach is based on the similarity between the neural mechanisms in the auditory and visual pathways. Furthermore, we customise one of the metrics which has a psychoacoustically plausible arc…
▽ More
In this study, we investigate the feasibility of utilizing state-of-the-art image perceptual metrics for evaluating audio signals by representing them as spectrograms. The encouraging outcome of the proposed approach is based on the similarity between the neural mechanisms in the auditory and visual pathways. Furthermore, we customise one of the metrics which has a psychoacoustically plausible architecture to account for the peculiarities of sound signals. We evaluate the effectiveness of our proposed metric and several baseline metrics using a music dataset, with promising results in terms of the correlation between the metrics and the perceived quality of audio as rated by human evaluators.
△ Less
Submitted 30 August, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
Disentangling the Link Between Image Statistics and Human Perception
Authors:
Alexander Hepburn,
Valero Laparra,
Raúl Santos-Rodriguez,
Jesús Malo
Abstract:
In the 1950s, Barlow and Attneave hypothesised a link between biological vision and information maximisation. Following Shannon, information was defined using the probability of natural images. A number of physiological and psychophysical phenomena have been derived ever since from principles like info-max, efficient coding, or optimal denoising. However, it remains unclear how this link is expres…
▽ More
In the 1950s, Barlow and Attneave hypothesised a link between biological vision and information maximisation. Following Shannon, information was defined using the probability of natural images. A number of physiological and psychophysical phenomena have been derived ever since from principles like info-max, efficient coding, or optimal denoising. However, it remains unclear how this link is expressed in mathematical terms from image probability. First, classical derivations were subjected to strong assumptions on the probability models and on the behaviour of the sensors. Moreover, the direct evaluation of the hypothesis was limited by the inability of the classical image models to deliver accurate estimates of the probability. In this work we directly evaluate image probabilities using an advanced generative model for natural images, and we analyse how probability-related factors can be combined to predict human perception via sensitivity of state-of-the-art subjective image quality metrics. We use information theory and regression analysis to find a combination of just two probability-related factors that achieves 0.8 correlation with subjective metrics. This probability-based sensitivity is psychophysically validated by reproducing the basic trends of the Contrast Sensitivity Function, its suprathreshold variation, and trends of the Weber-law and masking.
△ Less
Submitted 5 October, 2023; v1 submitted 17 March, 2023;
originally announced March 2023.
-
Two-step counterfactual generation for OOD examples
Authors:
Nawid Keshtmand,
Raul Santos-Rodriguez,
Jonathan Lawry
Abstract:
Two fundamental requirements for the deployment of machine learning models in safety-critical systems are to be able to detect out-of-distribution (OOD) data correctly and to be able to explain the prediction of the model. Although significant effort has gone into both OOD detection and explainable AI, there has been little work on explaining why a model predicts a certain data point is OOD. In th…
▽ More
Two fundamental requirements for the deployment of machine learning models in safety-critical systems are to be able to detect out-of-distribution (OOD) data correctly and to be able to explain the prediction of the model. Although significant effort has gone into both OOD detection and explainable AI, there has been little work on explaining why a model predicts a certain data point is OOD. In this paper, we address this question by introducing the concept of an OOD counterfactual, which is a perturbed data point that iteratively moves between different OOD categories. We propose a method for generating such counterfactuals, investigate its application on synthetic and benchmark data, and compare it to several benchmark methods using a range of metrics.
△ Less
Submitted 10 February, 2023;
originally announced February 2023.
-
When the Ground Truth is not True: Modelling Human Biases in Temporal Annotations
Authors:
Taku Yamagata,
Emma L. Tonkin,
Benjamin Arana Sanchez,
Ian Craddock,
Miquel Perello Nieto,
Raul Santos-Rodriguez,
Weisong Yang,
Peter Flach
Abstract:
In supervised learning, low quality annotations lead to poorly performing classification and detection models, while also rendering evaluation unreliable. This is particularly apparent on temporal data, where annotation quality is affected by multiple factors. For example, in the post-hoc self-reporting of daily activities, cognitive biases are one of the most common ingredients. In particular, re…
▽ More
In supervised learning, low quality annotations lead to poorly performing classification and detection models, while also rendering evaluation unreliable. This is particularly apparent on temporal data, where annotation quality is affected by multiple factors. For example, in the post-hoc self-reporting of daily activities, cognitive biases are one of the most common ingredients. In particular, reporting the start and duration of an activity after its finalisation may incorporate biases introduced by personal time perceptions, as well as the imprecision and lack of granularity due to time rounding. Here we propose a method to model human biases on temporal annotations and argue for the use of soft labels. Experimental results in synthetic data show that soft labels provide a better approximation of the ground truth for several metrics. We showcase the method on a real dataset of daily activities.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
Transfer Learning and Class Decomposition for Detecting the Cognitive Decline of Alzheimer Disease
Authors:
Maha M. Alwuthaynani,
Zahraa S. Abdallah,
Raul Santos-Rodriguez
Abstract:
Early diagnosis of Alzheimer's disease (AD) is essential in preventing the disease's progression. Therefore, detecting AD from neuroimaging data such as structural magnetic resonance imaging (sMRI) has been a topic of intense investigation in recent years. Deep learning has gained considerable attention in Alzheimer's detection. However, training a convolutional neural network from scratch is chal…
▽ More
Early diagnosis of Alzheimer's disease (AD) is essential in preventing the disease's progression. Therefore, detecting AD from neuroimaging data such as structural magnetic resonance imaging (sMRI) has been a topic of intense investigation in recent years. Deep learning has gained considerable attention in Alzheimer's detection. However, training a convolutional neural network from scratch is challenging since it demands more computational time and a significant amount of annotated data. By transferring knowledge learned from other image recognition tasks to medical image classification, transfer learning can provide a promising and effective solution. Irregularities in the dataset distribution present another difficulty. Class decomposition can tackle this issue by simplifying learning a dataset's class boundaries. Motivated by these approaches, this paper proposes a transfer learning method using class decomposition to detect Alzheimer's disease from sMRI images. We use two ImageNet-trained architectures: VGG19 and ResNet50, and an entropy-based technique to determine the most informative images. The proposed model achieved state-of-the-art performance in the Alzheimer's disease (AD) vs mild cognitive impairment (MCI) vs cognitively normal (CN) classification task with a 3\% increase in accuracy from what is reported in the literature.
△ Less
Submitted 31 January, 2023;
originally announced January 2023.
-
Interpretable Classification of Early Stage Parkinson's Disease from EEG
Authors:
Amarpal Sahota,
Amber Roguski,
Matthew W. Jones,
Michal Rolinski,
Alan Whone,
Raul Santos-Rodriguez,
Zahraa S. Abdallah
Abstract:
Detecting Parkinson's Disease in its early stages using EEG data presents a significant challenge. This paper introduces a novel approach, representing EEG data as a 15-variate series of bandpower and peak frequency values/coefficients. The hypothesis is that this representation captures essential information from the noisy EEG signal, improving disease detection. Statistical features extracted fr…
▽ More
Detecting Parkinson's Disease in its early stages using EEG data presents a significant challenge. This paper introduces a novel approach, representing EEG data as a 15-variate series of bandpower and peak frequency values/coefficients. The hypothesis is that this representation captures essential information from the noisy EEG signal, improving disease detection. Statistical features extracted from this representation are utilised as input for interpretable machine learning models, specifically Decision Tree and AdaBoost classifiers. Our classification pipeline is deployed within our proposed framework which enables high-importance data types and brain regions for classification to be identified. Interestingly, our analysis reveals that while there is no significant regional importance, the N1 sleep data type exhibits statistically significant predictive power (p < 0.01) for early-stage Parkinson's Disease classification. AdaBoost classifiers trained on the N1 data type consistently outperform baseline models, achieving over 80% accuracy and recall. Our classification pipeline statistically significantly outperforms baseline models indicating that the model has acquired useful information. Paired with the interpretability (ability to view feature importance's) of our pipeline this enables us to generate meaningful insights into the classification of early stage Parkinson's with our N1 models. In Future, these models could be deployed in the real world - the results presented in this paper indicate that more than 3 in 4 early-stage Parkinson's cases would be captured with our pipeline.
△ Less
Submitted 8 December, 2023; v1 submitted 20 January, 2023;
originally announced January 2023.
-
Identification, explanation and clinical evaluation of hospital patient subtypes
Authors:
Enrico Werner,
Jeffrey N. Clark,
Ranjeet S. Bhamber,
Michael Ambler,
Christopher P. Bourdeaux,
Alexander Hepburn,
Christopher J. McWilliams,
Raul Santos-Rodriguez
Abstract:
We present a pipeline in which unsupervised machine learning techniques are used to automatically identify subtypes of hospital patients admitted between 2017 and 2021 in a large UK teaching hospital. With the use of state-of-the-art explainability techniques, the identified subtypes are interpreted and assigned clinical meaning. In parallel, clinicians assessed intra-cluster similarities and inte…
▽ More
We present a pipeline in which unsupervised machine learning techniques are used to automatically identify subtypes of hospital patients admitted between 2017 and 2021 in a large UK teaching hospital. With the use of state-of-the-art explainability techniques, the identified subtypes are interpreted and assigned clinical meaning. In parallel, clinicians assessed intra-cluster similarities and inter-cluster differences of the identified patient subtypes within the context of their clinical knowledge. By confronting the outputs of both automatic and clinician-based explanations, we aim to highlight the mutual benefit of combining machine learning techniques with clinical expertise.
△ Less
Submitted 19 January, 2023;
originally announced January 2023.
-
Understanding the properties and limitations of contrastive learning for Out-of-Distribution detection
Authors:
Nawid Keshtmand,
Raul Santos-Rodriguez,
Jonathan Lawry
Abstract:
A recent popular approach to out-of-distribution (OOD) detection is based on a self-supervised learning technique referred to as contrastive learning. There are two main variants of contrastive learning, namely instance and class discrimination, targeting features that can discriminate between different instances for the former, and different classes for the latter.
In this paper, we aim to unde…
▽ More
A recent popular approach to out-of-distribution (OOD) detection is based on a self-supervised learning technique referred to as contrastive learning. There are two main variants of contrastive learning, namely instance and class discrimination, targeting features that can discriminate between different instances for the former, and different classes for the latter.
In this paper, we aim to understand the effectiveness and limitation of existing contrastive learning methods for OOD detection. We approach this in 3 ways. First, we systematically study the performance difference between the instance discrimination and supervised contrastive learning variants in different OOD detection settings. Second, we study which in-distribution (ID) classes OOD data tend to be classified into. Finally, we study the spectral decay property of the different contrastive learning approaches and examine how it correlates with OOD detection performance. In scenarios where the ID and OOD datasets are sufficiently different from one another, we see that instance discrimination, in the absence of fine-tuning, is competitive with supervised approaches in OOD detection. We see that OOD samples tend to be classified into classes that have a distribution similar to the distribution of the entire dataset. Furthermore, we show that contrastive learning learns a feature space that contains singular vectors containing several directions with a high variance which can be detrimental or beneficial to OOD detection depending on the inference approach used.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL
Authors:
Taku Yamagata,
Ahmed Khalil,
Raul Santos-Rodriguez
Abstract:
Recent works have shown that tackling offline reinforcement learning (RL) with a conditional policy produces promising results. The Decision Transformer (DT) combines the conditional policy approach and a transformer architecture, showing competitive performance against several benchmarks. However, DT lacks stitching ability -- one of the critical abilities for offline RL to learn the optimal poli…
▽ More
Recent works have shown that tackling offline reinforcement learning (RL) with a conditional policy produces promising results. The Decision Transformer (DT) combines the conditional policy approach and a transformer architecture, showing competitive performance against several benchmarks. However, DT lacks stitching ability -- one of the critical abilities for offline RL to learn the optimal policy from sub-optimal trajectories. This issue becomes particularly significant when the offline dataset only contains sub-optimal trajectories. On the other hand, the conventional RL approaches based on Dynamic Programming (such as Q-learning) do not have the same limitation; however, they suffer from unstable learning behaviours, especially when they rely on function approximation in an off-policy learning setting. In this paper, we propose the Q-learning Decision Transformer (QDT) to address the shortcomings of DT by leveraging the benefits of Dynamic Programming (Q-learning). It utilises the Dynamic Programming results to relabel the return-to-go in the training data to then train the DT with the relabelled data. Our approach efficiently exploits the benefits of these two approaches and compensates for each other's shortcomings to achieve better performance. We empirically show these in both simple toy environments and the more complex D4RL benchmark, showing competitive performance gains.
△ Less
Submitted 25 May, 2023; v1 submitted 8 September, 2022;
originally announced September 2022.
-
What and How of Machine Learning Transparency: Building Bespoke Explainability Tools with Interoperable Algorithmic Components
Authors:
Kacper Sokol,
Alexander Hepburn,
Raul Santos-Rodriguez,
Peter Flach
Abstract:
Explainability techniques for data-driven predictive models based on artificial intelligence and machine learning algorithms allow us to better understand the operation of such systems and help to hold them accountable. New transparency approaches are developed at breakneck speed, enabling us to peek inside these black boxes and interpret their decisions. Many of these techniques are introduced as…
▽ More
Explainability techniques for data-driven predictive models based on artificial intelligence and machine learning algorithms allow us to better understand the operation of such systems and help to hold them accountable. New transparency approaches are developed at breakneck speed, enabling us to peek inside these black boxes and interpret their decisions. Many of these techniques are introduced as monolithic tools, giving the impression of one-size-fits-all and end-to-end algorithms with limited customisability. Nevertheless, such approaches are often composed of multiple interchangeable modules that need to be tuned to the problem at hand to produce meaningful explanations. This paper introduces a collection of hands-on training materials -- slides, video recordings and Jupyter Notebooks -- that provide guidance through the process of building and evaluating bespoke modular surrogate explainers for tabular data. These resources cover the three core building blocks of this technique: interpretable representation composition, data sampling and explanation generation.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
FAT Forensics: A Python Toolbox for Implementing and Deploying Fairness, Accountability and Transparency Algorithms in Predictive Systems
Authors:
Kacper Sokol,
Alexander Hepburn,
Rafael Poyiadzi,
Matthew Clifford,
Raul Santos-Rodriguez,
Peter Flach
Abstract:
Predictive systems, in particular machine learning algorithms, can take important, and sometimes legally binding, decisions about our everyday life. In most cases, however, these systems and decisions are neither regulated nor certified. Given the potential harm that these algorithms can cause, their qualities such as fairness, accountability and transparency (FAT) are of paramount importance. To…
▽ More
Predictive systems, in particular machine learning algorithms, can take important, and sometimes legally binding, decisions about our everyday life. In most cases, however, these systems and decisions are neither regulated nor certified. Given the potential harm that these algorithms can cause, their qualities such as fairness, accountability and transparency (FAT) are of paramount importance. To ensure high-quality, fair, transparent and reliable predictive systems, we developed an open source Python package called FAT Forensics. It can inspect important fairness, accountability and transparency aspects of predictive algorithms to automatically and objectively report them back to engineers and users of such systems. Our toolbox can evaluate all elements of a predictive pipeline: data (and their features), models and predictions. Published under the BSD 3-Clause open source licence, FAT Forensics is opened up for personal and commercial usage.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Self-Supervised Multimodal Fusion Transformer for Passive Activity Recognition
Authors:
Armand K. Koupai,
Mohammud J. Bocus,
Raul Santos-Rodriguez,
Robert J. Piechocki,
Ryan McConville
Abstract:
The pervasiveness of Wi-Fi signals provides significant opportunities for human sensing and activity recognition in fields such as healthcare. The sensors most commonly used for passive Wi-Fi sensing are based on passive Wi-Fi radar (PWR) and channel state information (CSI) data, however current systems do not effectively exploit the information acquired through multiple sensors to recognise the d…
▽ More
The pervasiveness of Wi-Fi signals provides significant opportunities for human sensing and activity recognition in fields such as healthcare. The sensors most commonly used for passive Wi-Fi sensing are based on passive Wi-Fi radar (PWR) and channel state information (CSI) data, however current systems do not effectively exploit the information acquired through multiple sensors to recognise the different activities. In this paper, we explore new properties of the Transformer architecture for multimodal sensor fusion. We study different signal processing techniques to extract multiple image-based features from PWR and CSI data such as spectrograms, scalograms and Markov transition field (MTF). We first propose the Fusion Transformer, an attention-based model for multimodal and multi-sensor fusion. Experimental results show that our Fusion Transformer approach can achieve competitive results compared to a ResNet architecture but with much fewer resources. To further improve our model, we propose a simple and effective framework for multimodal and multi-sensor self-supervised learning (SSL). The self-supervised Fusion Transformer outperforms the baselines, achieving a F1-score of 95.9%. Finally, we show how this approach significantly outperforms the others when trained with as little as 1% (2 minutes) of labelled training data to 20% (40 minutes) of labelled training data.
△ Less
Submitted 15 August, 2022;
originally announced September 2022.
-
Sampling Based On Natural Image Statistics Improves Local Surrogate Explainers
Authors:
Ricardo Kleinlein,
Alexander Hepburn,
Raúl Santos-Rodríguez,
Fernando Fernández-Martínez
Abstract:
Many problems in computer vision have recently been tackled using models whose predictions cannot be easily interpreted, most commonly deep neural networks. Surrogate explainers are a popular post-hoc interpretability method to further understand how a model arrives at a particular prediction. By training a simple, more interpretable model to locally approximate the decision boundary of a non-inte…
▽ More
Many problems in computer vision have recently been tackled using models whose predictions cannot be easily interpreted, most commonly deep neural networks. Surrogate explainers are a popular post-hoc interpretability method to further understand how a model arrives at a particular prediction. By training a simple, more interpretable model to locally approximate the decision boundary of a non-interpretable system, we can estimate the relative importance of the input features on the prediction. Focusing on images, surrogate explainers, e.g., LIME, generate a local neighbourhood around a query image by sampling in an interpretable domain. However, these interpretable domains have traditionally been derived exclusively from the intrinsic features of the query image, not taking into consideration the manifold of the data the non-interpretable model has been exposed to in training (or more generally, the manifold of real images). This leads to suboptimal surrogates trained on potentially low probability images. We address this limitation by aligning the local neighbourhood on which the surrogate is trained with the original training data distribution, even when this distribution is not accessible. We propose two approaches to do so, namely (1) altering the method for sampling the local neighbourhood and (2) using perceptual metrics to convey some of the properties of the distribution of natural images.
△ Less
Submitted 8 August, 2022;
originally announced August 2022.
-
The Weak Supervision Landscape
Authors:
Rafael Poyiadzi,
Daniel Bacaicoa-Barber,
Jesus Cid-Sueiro,
Miquel Perello-Nieto,
Peter Flach,
Raul Santos-Rodriguez
Abstract:
Many ways of annotating a dataset for machine learning classification tasks that go beyond the usual class labels exist in practice. These are of interest as they can simplify or facilitate the collection of annotations, while not greatly affecting the resulting machine learning model. Many of these fall under the umbrella term of weak labels or annotations. However, it is not always clear how dif…
▽ More
Many ways of annotating a dataset for machine learning classification tasks that go beyond the usual class labels exist in practice. These are of interest as they can simplify or facilitate the collection of annotations, while not greatly affecting the resulting machine learning model. Many of these fall under the umbrella term of weak labels or annotations. However, it is not always clear how different alternatives are related. In this paper we propose a framework for categorising weak supervision settings with the aim of: (1) hel** the dataset owner or annotator navigate through the available options within weak supervision when prescribing an annotation process, and (2) describing existing annotations for a dataset to machine learning practitioners so that we allow them to understand the implications for the learning process. To this end, we identify the key elements that characterise weak supervision and devise a series of dimensions that categorise most of the existing approaches. We show how common settings in the literature fit within the framework and discuss its possible uses in practice.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
Classifier Calibration: A survey on how to assess and improve predicted class probabilities
Authors:
Telmo Silva Filho,
Hao Song,
Miquel Perello-Nieto,
Raul Santos-Rodriguez,
Meelis Kull,
Peter Flach
Abstract:
This paper provides both an introduction to and a detailed overview of the principles and practice of classifier calibration. A well-calibrated classifier correctly quantifies the level of uncertainty or confidence associated with its instance-wise predictions. This is essential for critical applications, optimal decision making, cost-sensitive classification, and for some types of context change.…
▽ More
This paper provides both an introduction to and a detailed overview of the principles and practice of classifier calibration. A well-calibrated classifier correctly quantifies the level of uncertainty or confidence associated with its instance-wise predictions. This is essential for critical applications, optimal decision making, cost-sensitive classification, and for some types of context change. Calibration research has a rich history which predates the birth of machine learning as an academic field by decades. However, a recent increase in the interest on calibration has led to new methods and the extension from binary to the multiclass setting. The space of options and issues to consider is large, and navigating it requires the right set of concepts and tools. We provide both introductory material and up-to-date technical details of the main concepts and methods, including proper scoring rules and other evaluation metrics, visualisation approaches, a comprehensive account of post-hoc calibration methods for binary and multiclass classification, and several advanced topics.
△ Less
Submitted 16 February, 2023; v1 submitted 19 December, 2021;
originally announced December 2021.
-
Multi-lingual agents through multi-headed neural networks
Authors:
J. D. Thomas,
R. Santos-Rodríguez,
R. Piechocki,
M. Anca
Abstract:
This paper considers cooperative Multi-Agent Reinforcement Learning, focusing on emergent communication in settings where multiple pairs of independent learners interact at varying frequencies. In this context, multiple distinct and incompatible languages can emerge. When an agent encounters a speaker of an alternative language, there is a requirement for a period of adaptation before they can eff…
▽ More
This paper considers cooperative Multi-Agent Reinforcement Learning, focusing on emergent communication in settings where multiple pairs of independent learners interact at varying frequencies. In this context, multiple distinct and incompatible languages can emerge. When an agent encounters a speaker of an alternative language, there is a requirement for a period of adaptation before they can efficiently converse. This adaptation results in the emergence of a new language and the forgetting of the previous language. In principle, this is an example of the Catastrophic Forgetting problem which can be mitigated by enabling the agents to learn and maintain multiple languages. We take inspiration from the Continual Learning literature and equip our agents with multi-headed neural networks which enable our agents to be multi-lingual. Our method is empirically validated within a referential MNIST based communication game and is shown to be able to maintain multiple languages where existing approaches cannot.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
Uncertainty Quantification of Surrogate Explanations: an Ordinal Consensus Approach
Authors:
Jonas Schulz,
Rafael Poyiadzi,
Raul Santos-Rodriguez
Abstract:
Explainability of black-box machine learning models is crucial, in particular when deployed in critical applications such as medicine or autonomous cars. Existing approaches produce explanations for the predictions of models, however, how to assess the quality and reliability of such explanations remains an open question. In this paper we take a step further in order to provide the practitioner wi…
▽ More
Explainability of black-box machine learning models is crucial, in particular when deployed in critical applications such as medicine or autonomous cars. Existing approaches produce explanations for the predictions of models, however, how to assess the quality and reliability of such explanations remains an open question. In this paper we take a step further in order to provide the practitioner with tools to judge the trustworthiness of an explanation. To this end, we produce estimates of the uncertainty of a given explanation by measuring the ordinal consensus amongst a set of diverse bootstrapped surrogate explainers. While we encourage diversity by using ensemble techniques, we propose and analyse metrics to aggregate the information contained within the set of explainers through a rating scheme. We empirically illustrate the properties of this approach through experiments on state-of-the-art Convolutional Neural Network ensembles. Furthermore, through tailored visualisations, we show specific examples of situations where uncertainty estimates offer concrete actionable insights to the user beyond those arising from standard surrogate explainers.
△ Less
Submitted 17 November, 2021;
originally announced November 2021.
-
Reinforcement Learning with Feedback from Multiple Humans with Diverse Skills
Authors:
Taku Yamagata,
Ryan McConville,
Raul Santos-Rodriguez
Abstract:
A promising approach to improve the robustness and exploration in Reinforcement Learning is collecting human feedback and that way incorporating prior knowledge of the target environment. It is, however, often too expensive to obtain enough feedback of good quality. To mitigate the issue, we aim to rely on a group of multiple experts (and non-experts) with different skill levels to generate enough…
▽ More
A promising approach to improve the robustness and exploration in Reinforcement Learning is collecting human feedback and that way incorporating prior knowledge of the target environment. It is, however, often too expensive to obtain enough feedback of good quality. To mitigate the issue, we aim to rely on a group of multiple experts (and non-experts) with different skill levels to generate enough feedback. Such feedback can therefore be inconsistent and infrequent. In this paper, we build upon prior work -- Advise, a Bayesian approach attempting to maximise the information gained from human feedback -- extending the algorithm to accept feedback from this larger group of humans, the trainers, while also estimating each trainer's reliability. We show how aggregating feedback from multiple trainers improves the total feedback's accuracy and make the collection process easier in two ways. Firstly, this approach addresses the case of some of the trainers being adversarial. Secondly, having access to the information about each trainer reliability provides a second layer of robustness and offers valuable information for people managing the whole system to improve the overall trust in the system. It offers an actionable tool for improving the feedback collection process or modifying the reward function design if needed. We empirically show that our approach can accurately learn the reliability of each trainer correctly and use it to maximise the information gained from the multiple trainers' feedback, even if some of the sources are adversarial.
△ Less
Submitted 16 November, 2021;
originally announced November 2021.
-
Understanding surrogate explanations: the interplay between complexity, fidelity and coverage
Authors:
Rafael Poyiadzi,
Xavier Renard,
Thibault Laugel,
Raul Santos-Rodriguez,
Marcin Detyniecki
Abstract:
This paper analyses the fundamental ingredients behind surrogate explanations to provide a better understanding of their inner workings. We start our exposition by considering global surrogates, describing the trade-off between complexity of the surrogate and fidelity to the black-box being modelled. We show that transitioning from global to local - reducing coverage - allows for more favourable c…
▽ More
This paper analyses the fundamental ingredients behind surrogate explanations to provide a better understanding of their inner workings. We start our exposition by considering global surrogates, describing the trade-off between complexity of the surrogate and fidelity to the black-box being modelled. We show that transitioning from global to local - reducing coverage - allows for more favourable conditions on the Pareto frontier of fidelity-complexity of a surrogate. We discuss the interplay between complexity, fidelity and coverage, and consider how different user needs can lead to problem formulations where these are either constraints or penalties. We also present experiments that demonstrate how the local surrogate interpretability procedure can be made interactive and lead to better explanations.
△ Less
Submitted 9 July, 2021;
originally announced July 2021.
-
On the overlooked issue of defining explanation objectives for local-surrogate explainers
Authors:
Rafael Poyiadzi,
Xavier Renard,
Thibault Laugel,
Raul Santos-Rodriguez,
Marcin Detyniecki
Abstract:
Local surrogate approaches for explaining machine learning model predictions have appealing properties, such as being model-agnostic and flexible in their modelling. Several methods exist that fit this description and share this goal. However, despite their shared overall procedure, they set out different objectives, extract different information from the black-box, and consequently produce divers…
▽ More
Local surrogate approaches for explaining machine learning model predictions have appealing properties, such as being model-agnostic and flexible in their modelling. Several methods exist that fit this description and share this goal. However, despite their shared overall procedure, they set out different objectives, extract different information from the black-box, and consequently produce diverse explanations, that are -- in general -- incomparable. In this work we review the similarities and differences amongst multiple methods, with a particular focus on what information they extract from the model, as this has large impact on the output: the explanation. We discuss the implications of the lack of agreement, and clarity, amongst the methods' objectives on the research and practice of explainability.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
On the relation between statistical learning and perceptual distances
Authors:
Alexander Hepburn,
Valero Laparra,
Raul Santos-Rodriguez,
Johannes Ballé,
Jesús Malo
Abstract:
It has been demonstrated many times that the behavior of the human visual system is connected to the statistics of natural images. Since machine learning relies on the statistics of training data as well, the above connection has interesting implications when using perceptual distances (which mimic the behavior of the human visual system) as a loss function. In this paper, we aim to unravel the no…
▽ More
It has been demonstrated many times that the behavior of the human visual system is connected to the statistics of natural images. Since machine learning relies on the statistics of training data as well, the above connection has interesting implications when using perceptual distances (which mimic the behavior of the human visual system) as a loss function. In this paper, we aim to unravel the non-trivial relationships between the probability distribution of the data, perceptual distances, and unsupervised machine learning. To this end, we show that perceptual sensitivity is correlated with the probability of an image in its close neighborhood. We also explore the relation between distances induced by autoencoders and the probability distribution of the training data, as well as how these induced distances are correlated with human perception. Finally, we find perceptual distances do not always lead to noticeable gains in performance over Euclidean distance in common image processing tasks, except when data is scarce and the perceptual distance provides regularization. We propose this may be due to a \emph{double-counting} effect of the image statistics, once in the perceptual distance and once in the training procedure.
△ Less
Submitted 16 March, 2022; v1 submitted 8 June, 2021;
originally announced June 2021.
-
Self-Supervised WiFi-Based Activity Recognition
Authors:
Hok-Shing Lau,
Ryan McConville,
Mohammud J. Bocus,
Robert J. Piechocki,
Raul Santos-Rodriguez
Abstract:
Traditional approaches to activity recognition involve the use of wearable sensors or cameras in order to recognise human activities. In this work, we extract fine-grained physical layer information from WiFi devices for the purpose of passive activity recognition in indoor environments. While such data is ubiquitous, few approaches are designed to utilise large amounts of unlabelled WiFi data. We…
▽ More
Traditional approaches to activity recognition involve the use of wearable sensors or cameras in order to recognise human activities. In this work, we extract fine-grained physical layer information from WiFi devices for the purpose of passive activity recognition in indoor environments. While such data is ubiquitous, few approaches are designed to utilise large amounts of unlabelled WiFi data. We propose the use of self-supervised contrastive learning to improve activity recognition performance when using multiple views of the transmitted WiFi signal captured by different synchronised receivers. We conduct experiments where the transmitters and receivers are arranged in different physical layouts so as to cover both Line-of-Sight (LoS) and non LoS (NLoS) conditions. We compare the proposed contrastive learning system with non-contrastive systems and observe a 17.7% increase in macro averaged F1 score on the task of WiFi based activity recognition, as well as significant improvements in one- and few-shot learning scenarios.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Self-play Learning Strategies for Resource Assignment in Open-RAN Networks
Authors:
Xiaoyang Wang,
Jonathan D Thomas,
Robert J Piechocki,
Shipra Kapoor,
Raul Santos-Rodriguez,
Arjun Parekh
Abstract:
Open Radio Access Network (ORAN) is being developed with an aim to democratise access and lower the cost of future mobile data networks, supporting network services with various QoS requirements, such as massive IoT and URLLC. In ORAN, network functionality is dis-aggregated into remote units (RUs), distributed units (DUs) and central units (CUs), which allows flexible software on Commercial-Off-T…
▽ More
Open Radio Access Network (ORAN) is being developed with an aim to democratise access and lower the cost of future mobile data networks, supporting network services with various QoS requirements, such as massive IoT and URLLC. In ORAN, network functionality is dis-aggregated into remote units (RUs), distributed units (DUs) and central units (CUs), which allows flexible software on Commercial-Off-The-Shelf (COTS) deployments. Furthermore, the map** of variable RU requirements to local mobile edge computing centres for future centralized processing would significantly reduce the power consumption in cellular networks. In this paper, we study the RU-DU resource assignment problem in an ORAN system, modelled as a 2D bin packing problem. A deep reinforcement learning-based self-play approach is proposed to achieve efficient RU-DU resource management, with AlphaGo Zero inspired neural Monte-Carlo Tree Search (MCTS). Experiments on representative 2D bin packing environment and real sites data show that the self-play learning strategy achieves intelligent RU-DU resource assignment for different network conditions.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
Hypothesis Testing for Class-Conditional Label Noise
Authors:
Rafael Poyiadzi,
Weisong Yang,
Niall Twomey,
Raul Santos-Rodriguez
Abstract:
In this paper we provide machine learning practitioners with tools to answer the question: is there class-conditional noise in my labels? In particular, we present hypothesis tests to check whether a given dataset of instance-label pairs has been corrupted with class-conditional label noise, as opposed to uniform label noise, with the former biasing learning, while the latter -- under mild conditi…
▽ More
In this paper we provide machine learning practitioners with tools to answer the question: is there class-conditional noise in my labels? In particular, we present hypothesis tests to check whether a given dataset of instance-label pairs has been corrupted with class-conditional label noise, as opposed to uniform label noise, with the former biasing learning, while the latter -- under mild conditions -- does not. The outcome of these tests can then be used in conjunction with other information to assess further steps. While previous works explore the direct estimation of the noise rates, this is known to be hard in practice and does not offer a real understanding of how trustworthy the estimates are. These methods typically require anchor points -- examples whose true posterior is either 0 or 1. Differently, in this paper we assume we have access to a set of anchor points whose true posterior is approximately 1/2. The proposed hypothesis tests are built upon the asymptotic properties of Maximum Likelihood Estimators for Logistic Regression models. We establish the main properties of the tests, including a theoretical and empirical analysis of the dependence of the power on the test on the training sample size, the number of anchor points, the difference of the noise rates and the use of relaxed anchors.
△ Less
Submitted 31 May, 2021; v1 submitted 3 March, 2021;
originally announced March 2021.
-
Explainers in the Wild: Making Surrogate Explainers Robust to Distortions through Perception
Authors:
Alexander Hepburn,
Raul Santos-Rodriguez
Abstract:
Explaining the decisions of models is becoming pervasive in the image processing domain, whether it is by using post-hoc methods or by creating inherently interpretable models. While the widespread use of surrogate explainers is a welcome addition to inspect and understand black-box models, assessing the robustness and reliability of the explanations is key for their success. Additionally, whilst…
▽ More
Explaining the decisions of models is becoming pervasive in the image processing domain, whether it is by using post-hoc methods or by creating inherently interpretable models. While the widespread use of surrogate explainers is a welcome addition to inspect and understand black-box models, assessing the robustness and reliability of the explanations is key for their success. Additionally, whilst existing work in the explainability field proposes various strategies to address this problem, the challenges of working with data in the wild is often overlooked. For instance, in image classification, distortions to images can not only affect the predictions assigned by the model, but also the explanation. Given a clean and a distorted version of an image, even if the prediction probabilities are similar, the explanation may still be different. In this paper we propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances that tailor the neighbourhoods used to training surrogate explainers. We also show that by operating in this way, we can make the explanations more robust to distortions. We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
△ Less
Submitted 16 June, 2021; v1 submitted 22 February, 2021;
originally announced February 2021.
-
Information Theory in Density Destructors
Authors:
J. Emmanuel Johnson,
Valero Laparra,
Gustau Camps-Valls,
Raul Santos-Rodríguez,
Jesús Malo
Abstract:
Density destructors are differentiable and invertible transforms that map multivariate PDFs of arbitrary structure (low entropy) into non-structured PDFs (maximum entropy). Multivariate Gaussianization and multivariate equalization are specific examples of this family, which break down the complexity of the original PDF through a set of elementary transforms that progressively remove the structure…
▽ More
Density destructors are differentiable and invertible transforms that map multivariate PDFs of arbitrary structure (low entropy) into non-structured PDFs (maximum entropy). Multivariate Gaussianization and multivariate equalization are specific examples of this family, which break down the complexity of the original PDF through a set of elementary transforms that progressively remove the structure of the data. We demonstrate how this property of density destructive flows is connected to classical information theory, and how density destructors can be used to get more accurate estimates of information theoretic quantities. Experiments with total correlation and mutual information inmultivariate sets illustrate the ability of density destructors compared to competing methods. These results suggest that information theoretic measures may be an alternative optimization criteria when learning density destructive flows.
△ Less
Submitted 2 December, 2020;
originally announced December 2020.
-
Model-Based Reinforcement Learning for Type 1Diabetes Blood Glucose Control
Authors:
Taku Yamagata,
Aisling O'Kane,
Amid Ayobi,
Dmitri Katz,
Katarzyna Stawarz,
Paul Marshall,
Peter Flach,
Raúl Santos-Rodríguez
Abstract:
In this paper we investigate the use of model-based reinforcement learning to assist people with Type 1 Diabetes with insulin dose decisions. The proposed architecture consists of multiple Echo State Networks to predict blood glucose levels combined with Model Predictive Controller for planning. Echo State Network is a version of recurrent neural networks which allows us to learn long term depende…
▽ More
In this paper we investigate the use of model-based reinforcement learning to assist people with Type 1 Diabetes with insulin dose decisions. The proposed architecture consists of multiple Echo State Networks to predict blood glucose levels combined with Model Predictive Controller for planning. Echo State Network is a version of recurrent neural networks which allows us to learn long term dependencies in the input of time series data in an online manner. Additionally, we address the quantification of uncertainty for a more robust control. Here, we used ensembles of Echo State Networks to capture model (epistemic) uncertainty. We evaluated the approach with the FDA-approved UVa/Padova Type 1 Diabetes simulator and compared the results against baseline algorithms such as Basal-Bolus controller and Deep Q-learning. The results suggest that the model-based reinforcement learning algorithm can perform equally or better than the baseline algorithms for the majority of virtual Type 1 Diabetes person profiles tested.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
Information Theory Measures via Multidimensional Gaussianization
Authors:
Valero Laparra,
J. Emmanuel Johnson,
Gustau Camps-Valls,
Raul Santos-Rodríguez,
Jesus Malo
Abstract:
Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems. It has several desirable properties for real world applications: it naturally deals with multivariate data, it can handle heterogeneous data types, and the measures can be interpreted in physical units. However, it has not been adopted by a wider audience because obtaining informati…
▽ More
Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems. It has several desirable properties for real world applications: it naturally deals with multivariate data, it can handle heterogeneous data types, and the measures can be interpreted in physical units. However, it has not been adopted by a wider audience because obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality. Here we propose an indirect way of computing information based on a multivariate Gaussianization transform. Our proposal mitigates the difficulty of multivariate density estimation by reducing it to a composition of tractable (marginal) operations and simple linear transformations, which can be interpreted as a particular deep neural network. We introduce specific Gaussianization-based methodologies to estimate total correlation, entropy, mutual information and Kullback-Leibler divergence. We compare them to recent estimators showing the accuracy on synthetic data generated from different multivariate distributions. We made the tools and datasets publicly available to provide a test-bed to analyze future methodologies. Results show that our proposal is superior to previous estimators particularly in high-dimensional scenarios; and that it leads to interesting insights in neuroscience, geoscience, computer vision, and machine learning.
△ Less
Submitted 25 November, 2020; v1 submitted 8 October, 2020;
originally announced October 2020.
-
Detecting Signatures of Early-stage Dementia with Behavioural Models Derived from Sensor Data
Authors:
Rafael Poyiadzi,
Weisong Yang,
Yoav Ben-Shlomo,
Ian Craddock,
Liz Coulthard,
Raul Santos-Rodriguez,
James Selwood,
Niall Twomey
Abstract:
There is a pressing need to automatically understand the state and progression of chronic neurological diseases such as dementia. The emergence of state-of-the-art sensing platforms offers unprecedented opportunities for indirect and automatic evaluation of disease state through the lens of behavioural monitoring. This paper specifically seeks to characterise behavioural signatures of mild cogniti…
▽ More
There is a pressing need to automatically understand the state and progression of chronic neurological diseases such as dementia. The emergence of state-of-the-art sensing platforms offers unprecedented opportunities for indirect and automatic evaluation of disease state through the lens of behavioural monitoring. This paper specifically seeks to characterise behavioural signatures of mild cognitive impairment (MCI) and Alzheimer's disease (AD) in the \textit{early} stages of the disease. We introduce bespoke behavioural models and analyses of key symptoms and deploy these on a novel dataset of longitudinal sensor data from persons with MCI and AD. We present preliminary findings that show the relationship between levels of sleep quality and wandering can be subtly different between patients in the early stages of dementia and healthy cohabiting controls.
△ Less
Submitted 3 July, 2020;
originally announced July 2020.
-
bLIMEy: Surrogate Prediction Explanations Beyond LIME
Authors:
Kacper Sokol,
Alexander Hepburn,
Raul Santos-Rodriguez,
Peter Flach
Abstract:
Surrogate explainers of black-box machine learning predictions are of paramount importance in the field of eXplainable Artificial Intelligence since they can be applied to any type of data (images, text and tabular), are model-agnostic and are post-hoc (i.e., can be retrofitted). The Local Interpretable Model-agnostic Explanations (LIME) algorithm is often mistakenly unified with a more general fr…
▽ More
Surrogate explainers of black-box machine learning predictions are of paramount importance in the field of eXplainable Artificial Intelligence since they can be applied to any type of data (images, text and tabular), are model-agnostic and are post-hoc (i.e., can be retrofitted). The Local Interpretable Model-agnostic Explanations (LIME) algorithm is often mistakenly unified with a more general framework of surrogate explainers, which may lead to a belief that it is the solution to surrogate explainability. In this paper we empower the community to "build LIME yourself" (bLIMEy) by proposing a principled algorithmic framework for building custom local surrogate explainers of black-box model predictions, including LIME itself. To this end, we demonstrate how to decompose the surrogate explainers family into algorithmically independent and interoperable modules and discuss the influence of these component choices on the functional capabilities of the resulting explainer, using the example of LIME.
△ Less
Submitted 28 October, 2019;
originally announced October 2019.
-
PerceptNet: A Human Visual System Inspired Neural Network for Estimating Perceptual Distance
Authors:
Alexander Hepburn,
Valero Laparra,
Jesús Malo,
Ryan McConville,
Raul Santos-Rodriguez
Abstract:
Traditionally, the vision community has devised algorithms to estimate the distance between an original image and images that have been subject to perturbations. Inspiration was usually taken from the human visual perceptual system and how the system processes different perturbations in order to replicate to what extent it determines our ability to judge image quality. While recent works have pres…
▽ More
Traditionally, the vision community has devised algorithms to estimate the distance between an original image and images that have been subject to perturbations. Inspiration was usually taken from the human visual perceptual system and how the system processes different perturbations in order to replicate to what extent it determines our ability to judge image quality. While recent works have presented deep neural networks trained to predict human perceptual quality, very few borrow any intuitions from the human visual system. To address this, we present PerceptNet, a convolutional neural network where the architecture has been chosen to reflect the structure and various stages in the human visual system. We evaluate PerceptNet on various traditional perception datasets and note strong performance on a number of them as compared with traditional image quality metrics. We also show that including a nonlinearity inspired by the human visual system in classical deep neural networks architectures can increase their ability to judge perceptual similarity. Compared to similar deep learning methods, the performance is similar, although our network has a number of parameters that is several orders of magnitude less.
△ Less
Submitted 17 November, 2020; v1 submitted 28 October, 2019;
originally announced October 2019.
-
FACE: Feasible and Actionable Counterfactual Explanations
Authors:
Rafael Poyiadzi,
Kacper Sokol,
Raul Santos-Rodriguez,
Tijl De Bie,
Peter Flach
Abstract:
Work in Counterfactual Explanations tends to focus on the principle of "the closest possible world" that identifies small changes leading to the desired outcome. In this paper we argue that while this approach might initially seem intuitively appealing it exhibits shortcomings not addressed in the current literature. First, a counterfactual example generated by the state-of-the-art systems is not…
▽ More
Work in Counterfactual Explanations tends to focus on the principle of "the closest possible world" that identifies small changes leading to the desired outcome. In this paper we argue that while this approach might initially seem intuitively appealing it exhibits shortcomings not addressed in the current literature. First, a counterfactual example generated by the state-of-the-art systems is not necessarily representative of the underlying data distribution, and may therefore prescribe unachievable goals(e.g., an unsuccessful life insurance applicant with severe disability may be advised to do more sports). Secondly, the counterfactuals may not be based on a "feasible path" between the current state of the subject and the suggested one, making actionable recourse infeasible (e.g., low-skilled unsuccessful mortgage applicants may be told to double their salary, which may be hard without first increasing their skill level). These two shortcomings may render counterfactual explanations impractical and sometimes outright offensive. To address these two major flaws, first of all, we propose a new line of Counterfactual Explanations research aimed at providing actionable and feasible paths to transform a selected instance into one that meets a certain goal. Secondly, we propose FACE: an algorithmically sound way of uncovering these "feasible paths" based on the shortest path distances defined via density-weighted metrics. Our approach generates counterfactuals that are coherent with the underlying data distribution and supported by the "feasible paths" of change, which are achievable and can be tailored to the problem at hand.
△ Less
Submitted 24 February, 2020; v1 submitted 20 September, 2019;
originally announced September 2019.
-
FAT Forensics: A Python Toolbox for Algorithmic Fairness, Accountability and Transparency
Authors:
Kacper Sokol,
Raul Santos-Rodriguez,
Peter Flach
Abstract:
Today, artificial intelligence systems driven by machine learning algorithms can be in a position to take important, and sometimes legally binding, decisions about our everyday lives. In many cases, however, these systems and their actions are neither regulated nor certified. To help counter the potential harm that such algorithms can cause we developed an open source toolbox that can analyse sele…
▽ More
Today, artificial intelligence systems driven by machine learning algorithms can be in a position to take important, and sometimes legally binding, decisions about our everyday lives. In many cases, however, these systems and their actions are neither regulated nor certified. To help counter the potential harm that such algorithms can cause we developed an open source toolbox that can analyse selected fairness, accountability and transparency aspects of the machine learning process: data (and their features), models and predictions, allowing to automatically and objectively report them to relevant stakeholders. In this paper we describe the design, scope, usage and impact of this Python package, which is published under the 3-Clause BSD open source licence.
△ Less
Submitted 25 August, 2022; v1 submitted 11 September, 2019;
originally announced September 2019.
-
Online Feature Selection for Activity Recognition using Reinforcement Learning with Multiple Feedback
Authors:
Taku Yamagata,
Raúl Santos-Rodríguez,
Ryan McConville,
Atis Elsts
Abstract:
Recent advances in both machine learning and Internet-of-Things have attracted attention to automatic Activity Recognition, where users wear a device with sensors and their outputs are mapped to a predefined set of activities. However, few studies have considered the balance between wearable power consumption and activity recognition accuracy. This is particularly important when part of the comput…
▽ More
Recent advances in both machine learning and Internet-of-Things have attracted attention to automatic Activity Recognition, where users wear a device with sensors and their outputs are mapped to a predefined set of activities. However, few studies have considered the balance between wearable power consumption and activity recognition accuracy. This is particularly important when part of the computational load happens on the wearable device. In this paper, we present a new methodology to perform feature selection on the device based on Reinforcement Learning (RL) to find the optimum balance between power consumption and accuracy. To accelerate the learning speed, we extend the RL algorithm to address multiple sources of feedback, and use them to tailor the policy in conjunction with estimating the feedback accuracy. We evaluated our system on the SPHERE challenge dataset, a publicly available research dataset. The results show that our proposed method achieves a good trade-off between wearable power consumption and activity recognition accuracy.
△ Less
Submitted 16 August, 2019;
originally announced August 2019.
-
N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding
Authors:
Ryan McConville,
Raul Santos-Rodriguez,
Robert J Piechocki,
Ian Craddock
Abstract:
Deep clustering has increasingly been demonstrating superiority over conventional shallow clustering algorithms. Deep clustering algorithms usually combine representation learning with deep neural networks to achieve this performance, typically optimizing a clustering and non-clustering loss. In such cases, an autoencoder is typically connected with a clustering network, and the final clustering i…
▽ More
Deep clustering has increasingly been demonstrating superiority over conventional shallow clustering algorithms. Deep clustering algorithms usually combine representation learning with deep neural networks to achieve this performance, typically optimizing a clustering and non-clustering loss. In such cases, an autoencoder is typically connected with a clustering network, and the final clustering is jointly learned by both the autoencoder and clustering network. Instead, we propose to learn an autoencoded embedding and then search this further for the underlying manifold. For simplicity, we then cluster this with a shallow clustering algorithm, rather than a deeper network. We study a number of local and global manifold learning methods on both the raw data and autoencoded embedding, concluding that UMAP in our framework is best able to find the most clusterable manifold in the embedding, suggesting local manifold learning on an autoencoded embedding is effective for discovering higher quality discovering clusters. We quantitatively show across a range of image and time-series datasets that our method has competitive performance against the latest deep clustering algorithms, including out-performing current state-of-the-art on several. We postulate that these results show a promising research direction for deep clustering. The code can be found at https://github.com/rymc/n2d
△ Less
Submitted 30 June, 2020; v1 submitted 16 August, 2019;
originally announced August 2019.
-
Enforcing Perceptual Consistency on Generative Adversarial Networks by Using the Normalised Laplacian Pyramid Distance
Authors:
Alexander Hepburn,
Valero Laparra,
Ryan McConville,
Raul Santos-Rodriguez
Abstract:
In recent years there has been a growing interest in image generation through deep learning. While an important part of the evaluation of the generated images usually involves visual inspection, the inclusion of human perception as a factor in the training process is often overlooked. In this paper we propose an alternative perceptual regulariser for image-to-image translation using conditional ge…
▽ More
In recent years there has been a growing interest in image generation through deep learning. While an important part of the evaluation of the generated images usually involves visual inspection, the inclusion of human perception as a factor in the training process is often overlooked. In this paper we propose an alternative perceptual regulariser for image-to-image translation using conditional generative adversarial networks (cGANs). To do so automatically (avoiding visual inspection), we use the Normalised Laplacian Pyramid Distance (NLPD) to measure the perceptual similarity between the generated image and the original image. The NLPD is based on the principle of normalising the value of coefficients with respect to a local estimate of mean energy at different scales and has already been successfully tested in different experiments involving human perception. We compare this regulariser with the originally proposed L1 distance and note that when using NLPD the generated images contain more realistic values for both local and global contrast. We found that using NLPD as a regulariser improves image segmentation accuracy on generated images as well as improving two no-reference image quality metrics.
△ Less
Submitted 17 November, 2020; v1 submitted 9 August, 2019;
originally announced August 2019.
-
Location Anomalies Detection for Connected and Autonomous Vehicles
Authors:
Xiaoyang Wang,
Ioannis Mavromatis,
Andrea Tassi,
Raul Santos-Rodriguez,
Robert J. Piechocki
Abstract:
Future Connected and Automated Vehicles (CAV), and more generally ITS, will form a highly interconnected system. Such a paradigm is referred to as the Internet of Vehicles (herein Internet of CAVs) and is a prerequisite to orchestrate traffic flows in cities. For optimal decision making and supervision, traffic centres will have access to suitably anonymized CAV mobility information. Safe and secu…
▽ More
Future Connected and Automated Vehicles (CAV), and more generally ITS, will form a highly interconnected system. Such a paradigm is referred to as the Internet of Vehicles (herein Internet of CAVs) and is a prerequisite to orchestrate traffic flows in cities. For optimal decision making and supervision, traffic centres will have access to suitably anonymized CAV mobility information. Safe and secure operations will then be contingent on early detection of anomalies. In this paper, a novel unsupervised learning model based on deep autoencoder is proposed to detect the self-reported location anomaly in CAVs, using vehicle locations and the Received Signal Strength Indicator (RSSI) as features. Quantitative experiments on simulation datasets show that the proposed approach is effective and robust in detecting self-reported location anomalies.
△ Less
Submitted 1 July, 2019;
originally announced July 2019.
-
Ordinal Regression as Structured Classification
Authors:
Niall Twomey,
Rafael Poyiadzi,
Callum Mann,
Raúl Santos-Rodríguez
Abstract:
This paper extends the class of ordinal regression models with a structured interpretation of the problem by applying a novel treatment of encoded labels. The net effect of this is to transform the underlying problem from an ordinal regression task to a (structured) classification task which we solve with conditional random fields, thereby achieving a coherent and probabilistic model in which all…
▽ More
This paper extends the class of ordinal regression models with a structured interpretation of the problem by applying a novel treatment of encoded labels. The net effect of this is to transform the underlying problem from an ordinal regression task to a (structured) classification task which we solve with conditional random fields, thereby achieving a coherent and probabilistic model in which all model parameters are jointly learnt. Importantly, we show that although we have cast ordinal regression to classification, our method still fall within the class of decomposition methods in the ordinal regression ontology. This is an important link since our experience is that many applications of machine learning to healthcare ignores completely the important nature of the label ordering, and hence these approaches should considered naive in this ontology. We also show that our model is flexible both in how it adapts to data manifolds and in terms of the operations that are available for practitioner to execute. Our empirical evaluation demonstrates that the proposed approach overwhelmingly produces superior and often statistically significant results over baseline approaches on forty popular ordinal regression models, and demonstrate that the proposed model significantly out-performs baselines on synthetic and real datasets. Our implementation, together with scripts to reproduce the results of this work, will be available on a public GitHub repository.
△ Less
Submitted 31 May, 2019;
originally announced May 2019.