Search | arXiv e-print repository

doi 10.1093/bib/bbac279

Implications of Topological Imbalance for Representation Learning on Biomedical Knowledge Graphs

Authors: Stephen Bonner, Ufuk Kirik, Ola Engkvist, Jian Tang, Ian P Barrett

Abstract: Adoption of recently developed methods from machine learning has given rise to creation of drug-discovery knowledge graphs (KG) that utilize the interconnected nature of the domain. Graph-based modelling of the data, combined with KG embedding (KGE) methods, are promising as they provide a more intuitive representation and are suitable for inference tasks such as predicting missing links. One comm… ▽ More Adoption of recently developed methods from machine learning has given rise to creation of drug-discovery knowledge graphs (KG) that utilize the interconnected nature of the domain. Graph-based modelling of the data, combined with KG embedding (KGE) methods, are promising as they provide a more intuitive representation and are suitable for inference tasks such as predicting missing links. One common application is to produce ranked lists of genes for a given disease, where the rank is based on the perceived likelihood of association between the gene and the disease. It is thus critical that these predictions are not only pertinent but also biologically meaningful. However, KGs can be biased either directly due to the underlying data sources that are integrated or due to modeling choices in the construction of the graph, one consequence of which is that certain entities can get topologically overrepresented. We demonstrate the effect of these inherent structural imbalances, resulting in densely-connected entities being highly ranked no matter the context. We provide support for this observation across different datasets, models as well as predictive tasks. Further, we present various graph perturbation experiments which yield more support to the observation that KGE models can be more influenced by the frequency of entities rather than any biological information encoded within the relations. Our results highlight the importance of data modeling choices, and emphasizes the need for practitioners to be mindful of these issues when interpreting model outputs and during KG composition. △ Less

Submitted 18 March, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

Comments: Briefings in Bioinformatics, 2022

arXiv:2111.02916 [pdf, other]

A Unified View of Relational Deep Learning for Drug Pair Scoring

Authors: Benedek Rozemberczki, Stephen Bonner, Andriy Nikolov, Michael Ughetto, Sebastian Nilsson, Eliseo Papa

Abstract: In recent years, numerous machine learning models which attempt to solve polypharmacy side effect identification, drug-drug interaction prediction and combination therapy design tasks have been proposed. Here, we present a unified theoretical view of relational machine learning models which can address these tasks. We provide fundamental definitions, compare existing model architectures and discus… ▽ More In recent years, numerous machine learning models which attempt to solve polypharmacy side effect identification, drug-drug interaction prediction and combination therapy design tasks have been proposed. Here, we present a unified theoretical view of relational machine learning models which can address these tasks. We provide fundamental definitions, compare existing model architectures and discuss performance metrics, datasets and evaluation protocols. In addition, we emphasize possible high impact applications and important future research directions in this domain. △ Less

Submitted 11 December, 2021; v1 submitted 4 November, 2021; originally announced November 2021.

arXiv:2105.10578 [pdf, other]

doi 10.1109/TCBB.2022.3197320

A Knowledge Graph-Enhanced Tensor Factorisation Model for Discovering Drug Targets

Authors: Cheng Ye, Rowan Swiers, Stephen Bonner, Ian Barrett

Abstract: The drug discovery and development process is a long and expensive one, costing over 1 billion USD on average per drug and taking 10-15 years. To reduce the high levels of attrition throughout the process, there has been a growing interest in applying machine learning methodologies to various stages of drug discovery and development in the recent decade, especially at the earliest stage identifica… ▽ More The drug discovery and development process is a long and expensive one, costing over 1 billion USD on average per drug and taking 10-15 years. To reduce the high levels of attrition throughout the process, there has been a growing interest in applying machine learning methodologies to various stages of drug discovery and development in the recent decade, especially at the earliest stage identification of druggable disease genes. In this paper, we have developed a new tensor factorisation model to predict potential drug targets (genes or proteins) for treating diseases. We created a three dimensional data tensor consisting of 1,048 gene targets, 860 diseases and 230,011 evidence attributes and clinical outcomes connecting them, using data extracted from the Open Targets and PharmaProjects databases. We enriched the data with gene target representations learned from a drug discovery oriented knowledge graph and applied our proposed method to predict the clinical outcomes for unseen gene target and disease pairs. We designed three evaluation strategies to measure the prediction performance and benchmarked several commonly used machine learning classifiers together with Bayesian matrix and tensor factorisation methods. The result shows that incorporating knowledge graph embeddings significantly improves the prediction accuracy and that training tensor factorisation alongside a dense neural network outperforms all other baselines. In summary, our framework combines two actively studied machine learning approaches to disease target identification, namely tensor factorisation and knowledge graph representation learning, which could be a promising avenue for further exploration in data driven drug discovery. △ Less

Submitted 19 August, 2022; v1 submitted 20 May, 2021; originally announced May 2021.

Comments: 16 pages, 4 figures, IEEE/ACM Transactions on Computational Biology and Bioinformatics

arXiv:2105.10488 [pdf, other]

doi 10.1016/j.ailsci.2022.100036

Understanding the Performance of Knowledge Graph Embeddings in Drug Discovery

Authors: Stephen Bonner, Ian P Barrett, Cheng Ye, Rowan Swiers, Ola Engkvist, Charles Tapley Hoyt, William L Hamilton

Abstract: Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being performed, or impact on other decisions, incurring sign… ▽ More Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being performed, or impact on other decisions, incurring significant time and financial costs and most importantly, ultimately influencing patient healthcare. For KGE models to have impact in this domain, a better understanding of not only of performance, but also the various factors which determine it, is required. In this study we investigate, over the course of many thousands of experiments, the predictive performance of five KGE models on two public drug discovery-oriented KGs. Our goal is not to focus on the best overall model or configuration, instead we take a deeper look at how performance can be affected by changes in the training setup, choice of hyperparameters, model parameter initialisation seed and different splits of the datasets. Our results highlight that these factors have significant impact on performance and can even affect the ranking of models. Indeed these factors should be reported along with model architectures to ensure complete reproducibility and fair comparisons of future work, and we argue this is critical for the acceptance of use, and impact of KGEs in a biomedical setting. △ Less

Submitted 23 May, 2022; v1 submitted 17 May, 2021; originally announced May 2021.

Journal ref: Artificial Intelligence in the Life Sciences (2022): 100036

arXiv:2102.10062 [pdf, other]

doi 10.1093/bib/bbac404

A Review of Biomedical Datasets Relating to Drug Discovery: A Knowledge Graph Perspective

Authors: Stephen Bonner, Ian P Barrett, Cheng Ye, Rowan Swiers, Ola Engkvist, Andreas Bender, Charles Tapley Hoyt, William L Hamilton

Abstract: Drug discovery and development is a complex and costly process. Machine learning approaches are being investigated to help improve the effectiveness and speed of multiple stages of the drug discovery pipeline. Of these, those that use Knowledge Graphs (KG) have promise in many tasks, including drug repurposing, drug toxicity prediction and target gene-disease prioritisation. In a drug discovery KG… ▽ More Drug discovery and development is a complex and costly process. Machine learning approaches are being investigated to help improve the effectiveness and speed of multiple stages of the drug discovery pipeline. Of these, those that use Knowledge Graphs (KG) have promise in many tasks, including drug repurposing, drug toxicity prediction and target gene-disease prioritisation. In a drug discovery KG, crucial elements including genes, diseases and drugs are represented as entities, whilst relationships between them indicate an interaction. However, to construct high-quality KGs, suitable data is required. In this review, we detail publicly available sources suitable for use in constructing drug discovery focused KGs. We aim to help guide machine learning and KG practitioners who are interested in applying new techniques to the drug discovery field, but who may be unfamiliar with the relevant data sources. The datasets are selected via strict criteria, categorised according to the primary type of information contained within and are considered based upon what information could be extracted to build a KG. We then present a comparative analysis of existing public drug discovery KGs and a evaluation of selected motivating case studies from the literature. Additionally, we raise numerous and unique challenges and issues associated with the domain and its datasets, whilst also highlighting key future research directions. We hope this review will motivate KGs use in solving key and emerging questions in the drug discovery domain. △ Less

Submitted 26 November, 2021; v1 submitted 19 February, 2021; originally announced February 2021.

Journal ref: Briefings in Bioinformatics, 2022

arXiv:2010.12635 [pdf, other]

Not Half Bad: Exploring Half-Precision in Graph Convolutional Neural Networks

Authors: John Brennan, Stephen Bonner, Amir Atapour-Abarghouei, Philip T Jackson, Boguslaw Obara, Andrew Stephen McGough

Abstract: With the growing significance of graphs as an effective representation of data in numerous applications, efficient graph analysis using modern machine learning is receiving a growing level of attention. Deep learning approaches often operate over the entire adjacency matrix -- as the input and intermediate network layers are all designed in proportion to the size of the adjacency matrix -- leading… ▽ More With the growing significance of graphs as an effective representation of data in numerous applications, efficient graph analysis using modern machine learning is receiving a growing level of attention. Deep learning approaches often operate over the entire adjacency matrix -- as the input and intermediate network layers are all designed in proportion to the size of the adjacency matrix -- leading to intensive computation and large memory requirements as the graph size increases. It is therefore desirable to identify efficient measures to reduce both run-time and memory requirements allowing for the analysis of the largest graphs possible. The use of reduced precision operations within the forward and backward passes of a deep neural network along with novel specialised hardware in modern GPUs can offer promising avenues towards efficiency. In this paper, we provide an in-depth exploration of the use of reduced-precision operations, easily integrable into the highly popular PyTorch framework, and an analysis of the effects of Tensor Cores on graph convolutional neural networks. We perform an extensive experimental evaluation of three GPU architectures and two widely-used graph analysis tasks (vertex classification and link prediction) using well-known benchmark and synthetically generated datasets. Thus allowing us to make important observations on the effects of reduced-precision operations and Tensor Cores on computational and memory usage of graph convolutional neural networks -- often neglected in the literature. △ Less

Submitted 23 October, 2020; originally announced October 2020.

arXiv:2009.05160 [pdf, other]

Rank over Class: The Untapped Potential of Ranking in Natural Language Processing

Authors: Amir Atapour-Abarghouei, Stephen Bonner, Andrew Stephen McGough

Abstract: Text classification has long been a staple within Natural Language Processing (NLP) with applications spanning across diverse areas such as sentiment analysis, recommender systems and spam detection. With such a powerful solution, it is often tempting to use it as the go-to tool for all NLP problems since when you are holding a hammer, everything looks like a nail. However, we argue here that many… ▽ More Text classification has long been a staple within Natural Language Processing (NLP) with applications spanning across diverse areas such as sentiment analysis, recommender systems and spam detection. With such a powerful solution, it is often tempting to use it as the go-to tool for all NLP problems since when you are holding a hammer, everything looks like a nail. However, we argue here that many tasks which are currently addressed using classification are in fact being shoehorned into a classification mould and that if we instead address them as a ranking problem, we not only improve the model, but we achieve better performance. We propose a novel end-to-end ranking approach consisting of a Transformer network responsible for producing representations for a pair of text sequences, which are in turn passed into a context aggregating network outputting ranking scores used to determine an ordering to the sequences based on some notion of relevance. We perform numerous experiments on publicly-available datasets and investigate the applications of ranking in problems often solved using classification. In an experiment on a heavily-skewed sentiment analysis dataset, converting ranking results to classification labels yields an approximately 22% improvement over state-of-the-art text classification, demonstrating the efficacy of text ranking over text classification in certain scenarios. △ Less

Submitted 3 December, 2021; v1 submitted 10 September, 2020; originally announced September 2020.

Comments: 2021 IEEE International Conference on Big Data (IEEE BigData 2021)

arXiv:2008.12504 [pdf, ps, other]

doi 10.1145/3394486.3403121

BLOB : A Probabilistic Model for Recommendation that Combines Organic and Bandit Signals

Authors: Otmane Sakhi, Stephen Bonner, David Rohde, Flavian Vasile

Abstract: A common task for recommender systems is to build a pro le of the interests of a user from items in their browsing history and later to recommend items to the user from the same catalog. The users' behavior consists of two parts: the sequence of items that they viewed without intervention (the organic part) and the sequences of items recommended to them and their outcome (the bandit part). In this… ▽ More A common task for recommender systems is to build a pro le of the interests of a user from items in their browsing history and later to recommend items to the user from the same catalog. The users' behavior consists of two parts: the sequence of items that they viewed without intervention (the organic part) and the sequences of items recommended to them and their outcome (the bandit part). In this paper, we propose Bayesian Latent Organic Bandit model (BLOB), a probabilistic approach to combine the 'or-ganic' and 'bandit' signals in order to improve the estimation of recommendation quality. The bandit signal is valuable as it gives direct feedback of recommendation performance, but the signal quality is very uneven, as it is highly concentrated on the recommendations deemed optimal by the past version of the recom-mender system. In contrast, the organic signal is typically strong and covers most items, but is not always relevant to the recommendation task. In order to leverage the organic signal to e ciently learn the bandit signal in a Bayesian model we identify three fundamental types of distances, namely action-history, action-action and history-history distances. We implement a scalable approximation of the full model using variational auto-encoders and the local re-paramerization trick. We show using extensive simulation studies that our method out-performs or matches the value of both state-of-the-art organic-based recommendation algorithms, and of bandit-based methods (both value and policy-based) both in organic and bandit-rich environments. △ Less

Submitted 28 August, 2020; originally announced August 2020.

Comments: 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Aug 2020, San Diego, United States

arXiv:2007.08574 [pdf, other]

Camera Bias in a Fine Grained Classification Task

Authors: Philip T. Jackson, Stephen Bonner, Ning Jia, Christopher Holder, Jon Stonehouse, Boguslaw Obara

Abstract: We show that correlations between the camera used to acquire an image and the class label of that image can be exploited by convolutional neural networks (CNN), resulting in a model that "cheats" at an image classification task by recognizing which camera took the image and inferring the class label from the camera. We show that models trained on a dataset with camera / label correlations do not g… ▽ More We show that correlations between the camera used to acquire an image and the class label of that image can be exploited by convolutional neural networks (CNN), resulting in a model that "cheats" at an image classification task by recognizing which camera took the image and inferring the class label from the camera. We show that models trained on a dataset with camera / label correlations do not generalize well to images in which those correlations are absent, nor to images from unencountered cameras. Furthermore, we investigate which visual features they are exploiting for camera recognition. Our experiments present evidence against the importance of global color statistics, lens deformation and chromatic aberration, and in favor of high frequency features, which may be introduced by image processing algorithms built into the cameras. △ Less

Submitted 16 July, 2020; originally announced July 2020.

arXiv:1911.08364 [pdf, other]

Volenti non fit injuria: Ransomware and its Victims

Authors: Amir Atapour-Abarghouei, Stephen Bonner, Andrew Stephen McGough

Abstract: With the recent growth in the number of malicious activities on the internet, cybersecurity research has seen a boost in the past few years. However, as certain variants of malware can provide highly lucrative opportunities for bad actors, significant resources are dedicated to innovations and improvements by vast criminal organisations. Among these forms of malware, ransomware has experienced a s… ▽ More With the recent growth in the number of malicious activities on the internet, cybersecurity research has seen a boost in the past few years. However, as certain variants of malware can provide highly lucrative opportunities for bad actors, significant resources are dedicated to innovations and improvements by vast criminal organisations. Among these forms of malware, ransomware has experienced a significant recent rise as it offers the perpetrators great financial incentive. Ransomware variants operate by removing system access from the user by either locking the system or encrypting some or all of the data, and subsequently demanding payment or ransom in exchange for returning system access or providing a decryption key to the victim. Due to the ubiquity of sensitive data in many aspects of modern life, many victims of such attacks, be they an individual home user or operators of a business, are forced to pay the ransom to regain access to their data, which in many cases does not happen as renormalisation of system operations is never guaranteed. As the problem of ransomware does not seem to be subsiding, it is very important to investigate the underlying forces driving and facilitating such attacks in order to create preventative measures. As such, in this paper, we discuss and provide further insight into variants of ransomware and their victims in order to understand how and why they have been targeted and what can be done to prevent or mitigate the effects of such attacks. △ Less

Submitted 19 November, 2019; originally announced November 2019.

Comments: 2019 IEEE International Conference on Big Data 2019

arXiv:1910.00877 [pdf, other]

Reconsidering Analytical Variational Bounds for Output Layers of Deep Networks

Authors: Otmane Sakhi, Stephen Bonner, David Rohde, Flavian Vasile

Abstract: The combination of the re-parameterization trick with the use of variational auto-encoders has caused a sensation in Bayesian deep learning, allowing the training of realistic generative models of images and has considerably increased our ability to use scalable latent variable models. The re-parameterization trick is necessary for models in which no analytical variational bound is available and a… ▽ More The combination of the re-parameterization trick with the use of variational auto-encoders has caused a sensation in Bayesian deep learning, allowing the training of realistic generative models of images and has considerably increased our ability to use scalable latent variable models. The re-parameterization trick is necessary for models in which no analytical variational bound is available and allows noisy gradients to be computed for arbitrary models. However, for certain standard output layers of a neural network, analytical bounds are available and the variational auto-encoder may be used both without the re-parameterization trick or the need for any Monte Carlo approximation. In this work, we show that using Jaakola and Jordan bound, we can produce a binary classification layer that allows a Bayesian output layer to be trained, using the standard stochastic gradient descent algorithm. We further demonstrate that a latent variable model utilizing the Bouchard bound for multi-class classification allows for fast training of a fully probabilistic latent factor model, even when the number of classes is very large. △ Less

Submitted 3 October, 2019; v1 submitted 2 October, 2019; originally announced October 2019.

Comments: 8 pages 2 figures

arXiv:1908.08402 [pdf, other]

Temporal Neighbourhood Aggregation: Predicting Future Links in Temporal Graphs via Recurrent Variational Graph Convolutions

Authors: Stephen Bonner, Amir Atapour-Abarghouei, Philip T Jackson, John Brennan, Ibad Kureshi, Georgios Theodoropoulos, Andrew Stephen McGough, Boguslaw Obara

Abstract: Graphs have become a crucial way to represent large, complex and often temporal datasets across a wide range of scientific disciplines. However, when graphs are used as input to machine learning models, this rich temporal information is frequently disregarded during the learning process, resulting in suboptimal performance on certain temporal infernce tasks. To combat this, we introduce Temporal N… ▽ More Graphs have become a crucial way to represent large, complex and often temporal datasets across a wide range of scientific disciplines. However, when graphs are used as input to machine learning models, this rich temporal information is frequently disregarded during the learning process, resulting in suboptimal performance on certain temporal infernce tasks. To combat this, we introduce Temporal Neighbourhood Aggregation (TNA), a novel vertex representation model architecture designed to capture both topological and temporal information to directly predict future graph states. Our model exploits hierarchical recurrence at different depths within the graph to enable exploration of changes in temporal neighbourhoods, whilst requiring no additional features or labels to be present. The final vertex representations are created using variational sampling and are optimised to directly predict the next graph in the sequence. Our claims are reinforced by extensive experimental evaluation on both real and synthetic benchmark datasets, where our approach demonstrates superior performance compared to competing methods, out-performing them at predicting new temporal edges by as much as 23% on real-world datasets, whilst also requiring fewer overall model parameters. △ Less

Submitted 21 November, 2019; v1 submitted 21 August, 2019; originally announced August 2019.

Comments: IEEE International Conference on Big Data 2019

arXiv:1908.06750 [pdf, other]

A Kings Ransom for Encryption: Ransomware Classification using Augmented One-Shot Learning and Bayesian Approximation

Authors: Amir Atapour-Abarghouei, Stephen Bonner, Andrew Stephen McGough

Abstract: Newly emerging variants of ransomware pose an ever-growing threat to computer systems governing every aspect of modern life through the handling and analysis of big data. While various recent security-based approaches have focused on detecting and classifying ransomware at the network or system level, easy-to-use post-infection ransomware classification for the lay user has not been attempted befo… ▽ More Newly emerging variants of ransomware pose an ever-growing threat to computer systems governing every aspect of modern life through the handling and analysis of big data. While various recent security-based approaches have focused on detecting and classifying ransomware at the network or system level, easy-to-use post-infection ransomware classification for the lay user has not been attempted before. In this paper, we investigate the possibility of classifying the ransomware a system is infected with simply based on a screenshot of the splash screen or the ransom note captured using a consumer camera commonly found in any modern mobile device. To train and evaluate our system, we create a sample dataset of the splash screens of 50 well-known ransomware variants. In our dataset, only a single training image is available per ransomware. Instead of creating a large training dataset of ransomware screenshots, we simulate screenshot capture conditions via carefully designed data augmentation techniques, enabling simple and efficient one-shot learning. Moreover, using model uncertainty obtained via Bayesian approximation, we ensure special input cases such as unrelated non-ransomware images and previously-unseen ransomware variants are correctly identified for special handling and not mis-classified. Extensive experimental evaluation demonstrates the efficacy of our work, with accuracy levels of up to 93.6% for ransomware classification. △ Less

Submitted 19 August, 2019; originally announced August 2019.

Comments: Submitted to 2019 IEEE International Conference on Big Data

arXiv:1904.10784 [pdf, other]

Latent Variable Session-Based Recommendation

Authors: David Rohde, Stephen Bonner

Abstract: Session based recommendation provides an attractive alternative to the traditional feature engineering approach to recommendation. Feature engineering approaches require hand tuned features of the users history to be created to produce a context vector. In contrast a session based approach is able to dynamically model the users state as they act. We present a probabilistic framework for session ba… ▽ More Session based recommendation provides an attractive alternative to the traditional feature engineering approach to recommendation. Feature engineering approaches require hand tuned features of the users history to be created to produce a context vector. In contrast a session based approach is able to dynamically model the users state as they act. We present a probabilistic framework for session based recommendation. A latent variable for the user state is updated as the user views more items and we learn more about their interests. The latent variable model is conceptually simple and elegant; yet requires sophisticated computational technique to approximate the integral over the latent variable. We provide computational solutions using both the re-parameterization trick and also using the Bouchard bound for the softmax function, we further explore employing a variational auto-encoder and a variational Expectation-Maximization algorithm for tightening the variational bound. The model performs well against a number of baselines. The intuitive nature of the model allows an elegant formulation combining correlations between items and their popularity and that sheds light on other popular recommendation methods. An attractive feature of the latent variable approach is that, as the user continues to act, the posterior on the user's state tightens reflecting the recommender system's increased knowledge about that user. △ Less

Submitted 17 September, 2019; v1 submitted 24 April, 2019; originally announced April 2019.

arXiv:1904.05165 [pdf, other]

Causal Embeddings for Recommendation: An Extended Abstract

Authors: Stephen Bonner, Flavian Vasile

Abstract: Recommendations are commonly used to modify user's natural behavior, for example, increasing product sales or the time spent on a website. This results in a gap between the ultimate business objective and the classical setup where recommendations are optimized to be coherent with past user behavior. To bridge this gap, we propose a new learning setup for recommendation that optimizes for the Incre… ▽ More Recommendations are commonly used to modify user's natural behavior, for example, increasing product sales or the time spent on a website. This results in a gap between the ultimate business objective and the classical setup where recommendations are optimized to be coherent with past user behavior. To bridge this gap, we propose a new learning setup for recommendation that optimizes for the Incremental Treatment Effect (ITE) of the policy. We show this is equivalent to learning to predict recommendation outcomes under a fully random recommendation policy and propose a new domain adaptation algorithm that learns from logged data containing outcomes from a biased recommendation policy and predicts recommendation outcomes according to random exposure. We compare our method against state-of-the-art factorization methods, in addition to new approaches of causal recommendation and show significant improvements. △ Less

Submitted 21 May, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

Comments: Accepted to the International Joint Conferences on Artificial Intelligence (IJCAI) Sister Conference Best Paper Track

arXiv:1811.11880 [pdf, other]

Predicting the Computational Cost of Deep Learning Models

Authors: Daniel Justus, John Brennan, Stephen Bonner, Andrew Stephen McGough

Abstract: Deep learning is rapidly becoming a go-to tool for many artificial intelligence problems due to its ability to outperform other approaches and even humans at many problems. Despite its popularity we are still unable to accurately predict the time it will take to train a deep learning network to solve a given problem. This training time can be seen as the product of the training time per epoch and… ▽ More Deep learning is rapidly becoming a go-to tool for many artificial intelligence problems due to its ability to outperform other approaches and even humans at many problems. Despite its popularity we are still unable to accurately predict the time it will take to train a deep learning network to solve a given problem. This training time can be seen as the product of the training time per epoch and the number of epochs which need to be performed to reach the desired level of accuracy. Some work has been carried out to predict the training time for an epoch -- most have been based around the assumption that the training time is linearly related to the number of floating point operations required. However, this relationship is not true and becomes exacerbated in cases where other activities start to dominate the execution time. Such as the time to load data from memory or loss of performance due to non-optimal parallel execution. In this work we propose an alternative approach in which we train a deep learning network to predict the execution time for parts of a deep learning network. Timings for these individual parts can then be combined to provide a prediction for the whole execution time. This has advantages over linear approaches as it can model more complex scenarios. But, also, it has the ability to predict execution times for scenarios unseen in the training data. Therefore, our approach can be used not only to infer the execution time for a batch, or entire epoch, but it can also support making a well-informed choice for the appropriate hardware and model. △ Less

Submitted 28 November, 2018; originally announced November 2018.

Comments: Accepted for publication at the IEEE International Conference on Big Data, (C) IEEE

arXiv:1811.08366 [pdf, other]

Temporal Graph Offset Reconstruction: Towards Temporally Robust Graph Representation Learning

Authors: Stephen Bonner, John Brennan, Ibad Kureshi, Georgios Theodoropoulos, Andrew Stephen McGough, Boguslaw Obara

Abstract: Graphs are a commonly used construct for representing relationships between elements in complex high dimensional datasets. Many real-world phenomenon are dynamic in nature, meaning that any graph used to represent them is inherently temporal. However, many of the machine learning models designed to capture knowledge about the structure of these graphs ignore this rich temporal information when cre… ▽ More Graphs are a commonly used construct for representing relationships between elements in complex high dimensional datasets. Many real-world phenomenon are dynamic in nature, meaning that any graph used to represent them is inherently temporal. However, many of the machine learning models designed to capture knowledge about the structure of these graphs ignore this rich temporal information when creating representations of the graph. This results in models which do not perform well when used to make predictions about the future state of the graph -- especially when the delta between time stamps is not small. In this work, we explore a novel training procedure and an associated unsupervised model which creates graph representations optimised to predict the future state of the graph. We make use of graph convolutional neural networks to encode the graph into a latent representation, which we then use to train our temporal offset reconstruction method, inspired by auto-encoders, to predict a later time point -- multiple time steps into the future. Using our method, we demonstrate superior performance for the task of future link prediction compared with none-temporal state-of-the-art baselines. We show our approach to be capable of outperforming non-temporal baselines by 38% on a real world dataset. △ Less

Submitted 20 November, 2018; originally announced November 2018.

Comments: Accepted as a workshop paper at IEEE Big Data 2018

arXiv:1810.08675 [pdf, other]

Using Machine Learning to reduce the energy wasted in Volunteer Computing Environments

Authors: A. Stephen McGough, Matthew Forshaw, John Brennan, Noura Al Moubayed, Stephen Bonner

Abstract: High Throughput Computing (HTC) provides a convenient mechanism for running thousands of tasks. Many HTC systems exploit computers which are provisioned for other purposes by utilising their idle time - volunteer computing. This has great advantages as it gives access to vast quantities of computational power for little or no cost. The downside is that running tasks are sacrificed if the computer… ▽ More High Throughput Computing (HTC) provides a convenient mechanism for running thousands of tasks. Many HTC systems exploit computers which are provisioned for other purposes by utilising their idle time - volunteer computing. This has great advantages as it gives access to vast quantities of computational power for little or no cost. The downside is that running tasks are sacrificed if the computer is needed for its primary use. Normally terminating the task which must be restarted on a different computer - leading to wasted energy and an increase in task completion time. We demonstrate, through the use of simulation, how we can reduce this wasted energy by targeting tasks at computers less likely to be needed for primary use, predicting this idle time through machine learning. By combining two machine learning approaches, namely Random Forest and MultiLayer Perceptron, we save 51.4% of the energy without significantly affecting the time to complete tasks. △ Less

Submitted 19 October, 2018; originally announced October 2018.

Comments: Accepted for publication at THE 9th international Green and sustainable computing Conference, Technically Co-sponsored by IEEE Computer Society & STC Sustainable Computing, October 22-24, Pittsburgh, PA, USA

arXiv:1809.05375 [pdf, other]

Style Augmentation: Data Augmentation via Style Randomization

Authors: Philip T. Jackson, Amir Atapour-Abarghouei, Stephen Bonner, Toby Breckon, Boguslaw Obara

Abstract: We introduce style augmentation, a new form of data augmentation based on random style transfer, for improving the robustness of convolutional neural networks (CNN) over both classification and regression based tasks. During training, our style augmentation randomizes texture, contrast and color, while preserving shape and semantic content. This is accomplished by adapting an arbitrary style trans… ▽ More We introduce style augmentation, a new form of data augmentation based on random style transfer, for improving the robustness of convolutional neural networks (CNN) over both classification and regression based tasks. During training, our style augmentation randomizes texture, contrast and color, while preserving shape and semantic content. This is accomplished by adapting an arbitrary style transfer network to perform style randomization, by sampling input style embeddings from a multivariate normal distribution instead of inferring them from a style image. In addition to standard classification experiments, we investigate the effect of style augmentation (and data augmentation generally) on domain transfer tasks. We find that data augmentation significantly improves robustness to domain shift, and can be used as a simple, domain agnostic alternative to domain adaptation. Comparing style augmentation against a mix of seven traditional augmentation techniques, we find that it can be readily combined with them to improve network performance. We validate the efficacy of our technique with domain transfer experiments in classification and monocular depth estimation, illustrating consistent improvements in generalization. △ Less

Submitted 12 April, 2019; v1 submitted 14 September, 2018; originally announced September 2018.

arXiv:1808.00720 [pdf, other]

RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising

Authors: David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile, Alexandros Karatzoglou

Abstract: Recommender Systems are becoming ubiquitous in many settings and take many forms, from product recommendation in e-commerce stores, to query suggestions in search engines, to friend recommendation in social networks. Current research directions which are largely based upon supervised learning from historical data appear to be showing diminishing returns with a lot of practitioners report a discrep… ▽ More Recommender Systems are becoming ubiquitous in many settings and take many forms, from product recommendation in e-commerce stores, to query suggestions in search engines, to friend recommendation in social networks. Current research directions which are largely based upon supervised learning from historical data appear to be showing diminishing returns with a lot of practitioners report a discrepancy between improvements in offline metrics for supervised learning and the online performance of the newly proposed models. One possible reason is that we are using the wrong paradigm: when looking at the long-term cycle of collecting historical performance data, creating a new version of the recommendation model, A/B testing it and then rolling it out. We see that there a lot of commonalities with the reinforcement learning (RL) setup, where the agent observes the environment and acts upon it in order to change its state towards better states (states with higher rewards). To this end we introduce RecoGym, an RL environment for recommendation, which is defined by a model of user traffic patterns on e-commerce and the users response to recommendations on the publisher websites. We believe that this is an important step forward for the field of recommendation systems research, that could open up an avenue of collaboration between the recommender systems and reinforcement learning communities and lead to better alignment between offline and online performance metrics. △ Less

Submitted 14 September, 2018; v1 submitted 2 August, 2018; originally announced August 2018.

Comments: Accepted at the REVEAL workshop at the Twelfth ACM Conference on Recommender Systems (RecSys '18), October 2--7, 2018, Vancouver, BC, Canada

arXiv:1806.07464 [pdf, other]

Exploring the Semantic Content of Unsupervised Graph Embeddings: An Empirical Study

Authors: Stephen Bonner, Ibad Kureshi, John Brennan, Georgios Theodoropoulos, Andrew Stephen McGough, Boguslaw Obara

Abstract: Graph embeddings have become a key and widely used technique within the field of graph mining, proving to be successful across a broad range of domains including social, citation, transportation and biological. Graph embedding techniques aim to automatically create a low-dimensional representation of a given graph, which captures key structural elements in the resulting embedding space. However, t… ▽ More Graph embeddings have become a key and widely used technique within the field of graph mining, proving to be successful across a broad range of domains including social, citation, transportation and biological. Graph embedding techniques aim to automatically create a low-dimensional representation of a given graph, which captures key structural elements in the resulting embedding space. However, to date, there has been little work exploring exactly which topological structures are being learned in the embeddings process. In this paper, we investigate if graph embeddings are approximating something analogous with traditional vertex level graph features. If such a relationship can be found, it could be used to provide a theoretical insight into how graph embedding approaches function. We perform this investigation by predicting known topological features, using supervised and unsupervised methods, directly from the embedding space. If a map** between the embeddings and topological features can be found, then we argue that the structural information encapsulated by the features is represented in the embedding space. To explore this, we present extensive experimental evaluation from five state-of-the-art unsupervised graph embedding techniques, across a range of empirical graph datasets, measuring a selection of topological features. We demonstrate that several topological features are indeed being approximated by the embedding space, allowing key insight into how graph embeddings create good representations. △ Less

Submitted 19 June, 2018; originally announced June 2018.

arXiv:1805.04157 [pdf, other]

doi 10.1109/SMC.2018.00631

On the Classification of SSVEP-Based Dry-EEG Signals via Convolutional Neural Networks

Authors: Nik Khadijah Nik Aznan, Stephen Bonner, Jason D. Connolly, Noura Al Moubayed, Toby P. Breckon

Abstract: In this paper, we propose a novel Convolutional Neural Network (CNN) approach for the classification of raw dry-EEG signals without any data pre-processing. To illustrate the effectiveness of our approach, we utilise the Steady State Visual Evoked Potential (SSVEP) paradigm as our use case. SSVEP can be utilised to allow people with severe physical disabilities such as Complete Locked-In Syndrome… ▽ More In this paper, we propose a novel Convolutional Neural Network (CNN) approach for the classification of raw dry-EEG signals without any data pre-processing. To illustrate the effectiveness of our approach, we utilise the Steady State Visual Evoked Potential (SSVEP) paradigm as our use case. SSVEP can be utilised to allow people with severe physical disabilities such as Complete Locked-In Syndrome or Amyotrophic Lateral Sclerosis to be aided via BCI applications, as it requires only the subject to fixate upon the sensory stimuli of interest. Here we utilise SSVEP flicker frequencies between 10 to 30 Hz, which we record as subject cortical waveforms via the dry-EEG headset. Our proposed end-to-end CNN allows us to automatically and accurately classify SSVEP stimulation directly from the dry-EEG waveforms. Our CNN architecture utilises a common SSVEP Convolutional Unit (SCU), comprising of a 1D convolutional layer, batch normalization and max pooling. Furthermore, we compare several deep learning neural network variants with our primary CNN architecture, in addition to traditional machine learning classification approaches. Experimental evaluation shows our CNN architecture to be significantly better than competing approaches, achieving a classification accuracy of 96% whilst demonstrating superior cross-subject performance and even being able to generalise well to unseen subjects whose data is entirely absent from the training process. △ Less

Submitted 2 August, 2018; v1 submitted 10 May, 2018; originally announced May 2018.

Comments: Accepted as a full paper at the 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC2018)

arXiv:1706.07639 [pdf, other]

doi 10.1145/3240323.3240360

Causal Embeddings for Recommendation

Authors: Stephen Bonner, Flavian Vasile

Abstract: Many current applications use recommendations in order to modify the natural user behavior, such as to increase the number of sales or the time spent on a website. This results in a gap between the final recommendation objective and the classical setup where recommendation candidates are evaluated by their coherence with past user behavior, by predicting either the missing entries in the user-item… ▽ More Many current applications use recommendations in order to modify the natural user behavior, such as to increase the number of sales or the time spent on a website. This results in a gap between the final recommendation objective and the classical setup where recommendation candidates are evaluated by their coherence with past user behavior, by predicting either the missing entries in the user-item matrix, or the most likely next event. To bridge this gap, we optimize a recommendation policy for the task of increasing the desired outcome versus the organic user behavior. We show this is equivalent to learning to predict recommendation outcomes under a fully random recommendation policy. To this end, we propose a new domain adaptation algorithm that learns from logged data containing outcomes from a biased recommendation policy and predicts recommendation outcomes according to random exposure. We compare our method against state-of-the-art factorization methods, in addition to new approaches of causal recommendation and show significant improvements. △ Less

Submitted 3 August, 2018; v1 submitted 23 June, 2017; originally announced June 2017.

Comments: Accepted as a long paper at the Twelfth ACM Conference on Recommender Systems (RecSys '18), October 2--7, 2018, Vancouver, BC, Canada

Showing 1–23 of 23 results for author: Bonner, S