-
Improving Antibody Humanness Prediction using Patent Data
Authors:
Talip Ucar,
Aubin Ramon,
Dino Oglic,
Rebecca Croasdale-Wood,
Tom Diethe,
Pietro Sormanni
Abstract:
We investigate the potential of patent data for improving the antibody humanness prediction using a multi-stage, multi-loss training process. Humanness serves as a proxy for the immunogenic response to antibody therapeutics, one of the major causes of attrition in drug discovery and a challenging obstacle for their use in clinical settings. We pose the initial learning stage as a weakly-supervised…
▽ More
We investigate the potential of patent data for improving the antibody humanness prediction using a multi-stage, multi-loss training process. Humanness serves as a proxy for the immunogenic response to antibody therapeutics, one of the major causes of attrition in drug discovery and a challenging obstacle for their use in clinical settings. We pose the initial learning stage as a weakly-supervised contrastive-learning problem, where each antibody sequence is associated with possibly multiple identifiers of function and the objective is to learn an encoder that groups them according to their patented properties. We then freeze a part of the contrastive encoder and continue training it on the patent data using the cross-entropy loss to predict the humanness score of a given antibody sequence. We illustrate the utility of the patent data and our approach by performing inference on three different immunogenicity datasets, unseen during training. Our empirical results demonstrate that the learned model consistently outperforms the alternative baselines and establishes new state-of-the-art on five out of six inference tasks, irrespective of the used metric.
△ Less
Submitted 8 June, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Continual Density Ratio Estimation in an Online Setting
Authors:
Yu Chen,
Song Liu,
Tom Diethe,
Peter Flach
Abstract:
In online applications with streaming data, awareness of how far the training or test set has shifted away from the original dataset can be crucial to the performance of the model. However, we may not have access to historical samples in the data stream. To cope with such situations, we propose a novel method, Continual Density Ratio Estimation (CDRE), for estimating density ratios between the ini…
▽ More
In online applications with streaming data, awareness of how far the training or test set has shifted away from the original dataset can be crucial to the performance of the model. However, we may not have access to historical samples in the data stream. To cope with such situations, we propose a novel method, Continual Density Ratio Estimation (CDRE), for estimating density ratios between the initial and current distributions ($p/q_t$) of a data stream in an iterative fashion without the need of storing past samples, where $q_t$ is shifting away from $p$ over time $t$. We demonstrate that CDRE can be more accurate than standard DRE in terms of estimating divergences between distributions, despite not requiring samples from the original distribution. CDRE can be applied in scenarios of online learning, such as importance weighted covariate shift, tracing dataset changes for better decision making. In addition, (CDRE) enables the evaluation of generative models under the setting of continual learning. To the best of our knowledge, there is no existing method that can evaluate generative models in continual learning without storing samples from the original distribution.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
Interpretable Anomaly Detection with Mondrian P{ó}lya Forests on Data Streams
Authors:
Charlie Dickens,
Eric Meissner,
Pablo G. Moreno,
Tom Diethe
Abstract:
Anomaly detection at scale is an extremely challenging problem of great practicality. When data is large and high-dimensional, it can be difficult to detect which observations do not fit the expected behaviour. Recent work has coalesced on variations of (random) $k$\emph{d-trees} to summarise data for anomaly detection. However, these methods rely on ad-hoc score functions that are not easy to int…
▽ More
Anomaly detection at scale is an extremely challenging problem of great practicality. When data is large and high-dimensional, it can be difficult to detect which observations do not fit the expected behaviour. Recent work has coalesced on variations of (random) $k$\emph{d-trees} to summarise data for anomaly detection. However, these methods rely on ad-hoc score functions that are not easy to interpret, making it difficult to asses the severity of the detected anomalies or select a reasonable threshold in the absence of labelled anomalies. To solve these issues, we contextualise these methods in a probabilistic framework which we call the Mondrian \Polya{} Forest for estimating the underlying probability density function generating the data and enabling greater interpretability than prior work. In addition, we develop a memory efficient variant able to operate in the modern streaming environments. Our experiments show that these methods achieves state-of-the-art performance while providing statistically interpretable anomaly scores.
△ Less
Submitted 4 August, 2020;
originally announced August 2020.
-
Semi-Discriminative Representation Loss for Online Continual Learning
Authors:
Yu Chen,
Tom Diethe,
Peter Flach
Abstract:
The use of episodic memory in continual learning has demonstrated effectiveness for alleviating catastrophic forgetting. In recent studies, gradient-based approaches have been developed to make more efficient use of compact episodic memory. Such approaches refine the gradients resulting from new samples by those from memorized samples, aiming to reduce the diversity of gradients from different tas…
▽ More
The use of episodic memory in continual learning has demonstrated effectiveness for alleviating catastrophic forgetting. In recent studies, gradient-based approaches have been developed to make more efficient use of compact episodic memory. Such approaches refine the gradients resulting from new samples by those from memorized samples, aiming to reduce the diversity of gradients from different tasks. In this paper, we clarify the relation between diversity of gradients and discriminativeness of representations, showing shared as well as conflicting interests between Deep Metric Learning and continual learning, thus demonstrating pros and cons of learning discriminative representations in continual learning. Based on these findings, we propose a simple method -- Semi-Discriminative Representation Loss (SDRL) -- for continual learning. In comparison with state-of-the-art methods, SDRL shows better performance with low computational cost on multiple benchmark tasks in the setting of online continual learning.
△ Less
Submitted 14 April, 2022; v1 submitted 19 June, 2020;
originally announced June 2020.
-
Optimal Continual Learning has Perfect Memory and is NP-hard
Authors:
Jeremias Knoblauch,
Hisham Husain,
Tom Diethe
Abstract:
Continual Learning (CL) algorithms incrementally learn a predictor or representation across multiple sequentially observed tasks. Designing CL algorithms that perform reliably and avoid so-called catastrophic forgetting has proven a persistent challenge. The current paper develops a theoretical approach that explains why. In particular, we derive the computational properties which CL algorithms wo…
▽ More
Continual Learning (CL) algorithms incrementally learn a predictor or representation across multiple sequentially observed tasks. Designing CL algorithms that perform reliably and avoid so-called catastrophic forgetting has proven a persistent challenge. The current paper develops a theoretical approach that explains why. In particular, we derive the computational properties which CL algorithms would have to possess in order to avoid catastrophic forgetting. Our main finding is that such optimal CL algorithms generally solve an NP-hard problem and will require perfect memory to do so. The findings are of theoretical interest, but also explain the excellent performance of CL algorithms using experience replay, episodic memory and core sets relative to regularization-based approaches.
△ Less
Submitted 9 June, 2020;
originally announced June 2020.
-
Similarity of Neural Networks with Gradients
Authors:
Shuai Tang,
Wesley J. Maddox,
Charlie Dickens,
Tom Diethe,
Andreas Damianou
Abstract:
A suitable similarity index for comparing learnt neural networks plays an important role in understanding the behaviour of the highly-nonlinear functions, and can provide insights on further theoretical analysis and empirical studies. We define two key steps when comparing models: firstly, the representation abstracted from the learnt model, where we propose to leverage both feature vectors and gr…
▽ More
A suitable similarity index for comparing learnt neural networks plays an important role in understanding the behaviour of the highly-nonlinear functions, and can provide insights on further theoretical analysis and empirical studies. We define two key steps when comparing models: firstly, the representation abstracted from the learnt model, where we propose to leverage both feature vectors and gradient ones (which are largely ignored in prior work) into designing the representation of a neural network. Secondly, we define the employed similarity index which gives desired invariance properties, and we facilitate the chosen ones with sketching techniques for comparing various datasets efficiently. Empirically, we show that the proposed approach provides a state-of-the-art method for computing similarity of neural networks that are trained independently on different datasets and the tasks defined by the datasets.
△ Less
Submitted 25 March, 2020;
originally announced March 2020.
-
Leveraging Hierarchical Representations for Preserving Privacy and Utility in Text
Authors:
Oluwaseyi Feyisetan,
Tom Diethe,
Thomas Drake
Abstract:
Guaranteeing a certain level of user privacy in an arbitrary piece of text is a challenging issue. However, with this challenge comes the potential of unlocking access to vast data stores for training machine learning models and supporting data driven decisions. We address this problem through the lens of dx-privacy, a generalization of Differential Privacy to non Hamming distance metrics. In this…
▽ More
Guaranteeing a certain level of user privacy in an arbitrary piece of text is a challenging issue. However, with this challenge comes the potential of unlocking access to vast data stores for training machine learning models and supporting data driven decisions. We address this problem through the lens of dx-privacy, a generalization of Differential Privacy to non Hamming distance metrics. In this work, we explore word representations in Hyperbolic space as a means of preserving privacy in text. We provide a proof satisfying dx-privacy, then we define a probability distribution in Hyperbolic space and describe a way to sample from it in high dimensions. Privacy is provided by perturbing vector representations of words in high dimensional Hyperbolic space to obtain a semantic generalization. We conduct a series of experiments to demonstrate the tradeoff between privacy and utility. Our privacy experiments illustrate protections against an authorship attribution algorithm while our utility experiments highlight the minimal impact of our perturbations on several downstream machine learning models. Compared to the Euclidean baseline, we observe > 20x greater guarantees on expected privacy against comparable worst case statistics.
△ Less
Submitted 20 October, 2019;
originally announced October 2019.
-
Privacy- and Utility-Preserving Textual Analysis via Calibrated Multivariate Perturbations
Authors:
Oluwaseyi Feyisetan,
Borja Balle,
Thomas Drake,
Tom Diethe
Abstract:
Accurately learning from user data while providing quantifiable privacy guarantees provides an opportunity to build better ML models while maintaining user trust. This paper presents a formal approach to carrying out privacy preserving text perturbation using the notion of dx-privacy designed to achieve geo-indistinguishability in location data. Our approach applies carefully calibrated noise to v…
▽ More
Accurately learning from user data while providing quantifiable privacy guarantees provides an opportunity to build better ML models while maintaining user trust. This paper presents a formal approach to carrying out privacy preserving text perturbation using the notion of dx-privacy designed to achieve geo-indistinguishability in location data. Our approach applies carefully calibrated noise to vector representation of words in a high dimension space as defined by word embedding models. We present a privacy proof that satisfies dx-privacy where the privacy parameter epsilon provides guarantees with respect to a distance metric defined by the word embedding space. We demonstrate how epsilon can be selected by analyzing plausible deniability statistics backed up by large scale analysis on GloVe and fastText embeddings. We conduct privacy audit experiments against 2 baseline models and utility experiments on 3 datasets to demonstrate the tradeoff between privacy and utility for varying values of epsilon on different task types. Our results demonstrate practical utility (< 2% utility loss for training binary classifiers) while providing better privacy guarantees than baseline models.
△ Less
Submitted 20 October, 2019;
originally announced October 2019.
-
HyperStream: a Workflow Engine for Streaming Data
Authors:
Tom Diethe,
Meelis Kull,
Niall Twomey,
Kacper Sokol,
Hao Song,
Miquel Perello-Nieto,
Emma Tonkin,
Peter Flach
Abstract:
This paper describes HyperStream, a large-scale, flexible and robust software package, written in the Python language, for processing streaming data with workflow creation capabilities. HyperStream overcomes the limitations of other computational engines and provides high-level interfaces to execute complex nesting, fusion, and prediction both in online and offline forms in streaming environments.…
▽ More
This paper describes HyperStream, a large-scale, flexible and robust software package, written in the Python language, for processing streaming data with workflow creation capabilities. HyperStream overcomes the limitations of other computational engines and provides high-level interfaces to execute complex nesting, fusion, and prediction both in online and offline forms in streaming environments. HyperStream is a general purpose tool that is well-suited for the design, development, and deployment of Machine Learning algorithms and predictive models in a wide space of sequential predictive problems.
Source code, installation instructions, examples, and documentation can be found at: https://github.com/IRC-SPHERE/HyperStream.
△ Less
Submitted 7 August, 2019;
originally announced August 2019.
-
Automatic Discovery of Privacy-Utility Pareto Fronts
Authors:
Brendan Avent,
Javier Gonzalez,
Tom Diethe,
Andrei Paleyes,
Borja Balle
Abstract:
Differential privacy is a mathematical framework for privacy-preserving data analysis. Changing the hyperparameters of a differentially private algorithm allows one to trade off privacy and utility in a principled way. Quantifying this trade-off in advance is essential to decision-makers tasked with deciding how much privacy can be provided in a particular application while maintaining acceptable…
▽ More
Differential privacy is a mathematical framework for privacy-preserving data analysis. Changing the hyperparameters of a differentially private algorithm allows one to trade off privacy and utility in a principled way. Quantifying this trade-off in advance is essential to decision-makers tasked with deciding how much privacy can be provided in a particular application while maintaining acceptable utility. Analytical utility guarantees offer a rigorous tool to reason about this trade-off, but are generally only available for relatively simple problems. For more complex tasks, such as training neural networks under differential privacy, the utility achieved by a given algorithm can only be measured empirically. This paper presents a Bayesian optimization methodology for efficiently characterizing the privacy--utility trade-off of any differentially private algorithm using only empirical measurements of its utility. The versatility of our method is illustrated on a number of machine learning tasks involving multiple models, optimizers, and datasets.
△ Less
Submitted 21 July, 2020; v1 submitted 26 May, 2019;
originally announced May 2019.
-
Distribution Calibration for Regression
Authors:
Hao Song,
Tom Diethe,
Meelis Kull,
Peter Flach
Abstract:
We are concerned with obtaining well-calibrated output distributions from regression models. Such distributions allow us to quantify the uncertainty that the model has regarding the predicted target value. We introduce the novel concept of distribution calibration, and demonstrate its advantages over the existing definition of quantile calibration. We further propose a post-hoc approach to improvi…
▽ More
We are concerned with obtaining well-calibrated output distributions from regression models. Such distributions allow us to quantify the uncertainty that the model has regarding the predicted target value. We introduce the novel concept of distribution calibration, and demonstrate its advantages over the existing definition of quantile calibration. We further propose a post-hoc approach to improving the predictions from previously trained regression models, using multi-output Gaussian Processes with a novel Beta link function. The proposed method is experimentally verified on a set of common regression models and shows improvements for both distribution-level and quantile-level calibration.
△ Less
Submitted 15 May, 2019;
originally announced May 2019.
-
Facilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients
Authors:
Yu Chen,
Tom Diethe,
Neil Lawrence
Abstract:
Continual learning aims to enable machine learning models to learn a general solution space for past and future tasks in a sequential manner. Conventional models tend to forget the knowledge of previous tasks while learning a new task, a phenomenon known as catastrophic forgetting. When using Bayesian models in continual learning, knowledge from previous tasks can be retained in two ways: 1). post…
▽ More
Continual learning aims to enable machine learning models to learn a general solution space for past and future tasks in a sequential manner. Conventional models tend to forget the knowledge of previous tasks while learning a new task, a phenomenon known as catastrophic forgetting. When using Bayesian models in continual learning, knowledge from previous tasks can be retained in two ways: 1). posterior distributions over the parameters, containing the knowledge gained from inference in previous tasks, which then serve as the priors for the following task; 2). coresets, containing knowledge of data distributions of previous tasks. Here, we show that Bayesian continual learning can be facilitated in terms of these two means through the use of natural gradients and Stein gradients respectively.
△ Less
Submitted 24 April, 2019;
originally announced April 2019.
-
Privacy-preserving Active Learning on Sensitive Data for User Intent Classification
Authors:
Oluwaseyi Feyisetan,
Thomas Drake,
Borja Balle,
Tom Diethe
Abstract:
Active learning holds promise of significantly reducing data annotation costs while maintaining reasonable model performance. However, it requires sending data to annotators for labeling. This presents a possible privacy leak when the training set includes sensitive user data. In this paper, we describe an approach for carrying out privacy preserving active learning with quantifiable guarantees. W…
▽ More
Active learning holds promise of significantly reducing data annotation costs while maintaining reasonable model performance. However, it requires sending data to annotators for labeling. This presents a possible privacy leak when the training set includes sensitive user data. In this paper, we describe an approach for carrying out privacy preserving active learning with quantifiable guarantees. We evaluate our approach by showing the tradeoff between privacy, utility and annotation budget on a binary classification task in a active learning setting.
△ Less
Submitted 26 March, 2019;
originally announced March 2019.
-
Continual Learning in Practice
Authors:
Tom Diethe,
Tom Borchert,
Eno Thereska,
Borja Balle,
Neil Lawrence
Abstract:
This paper describes a reference architecture for self-maintaining systems that can learn continually, as data arrives. In environments where data evolves, we need architectures that manage Machine Learning (ML) models in production, adapt to shifting data distributions, cope with outliers, retrain when necessary, and adapt to new tasks. This represents continual AutoML or Automatically Adaptive M…
▽ More
This paper describes a reference architecture for self-maintaining systems that can learn continually, as data arrives. In environments where data evolves, we need architectures that manage Machine Learning (ML) models in production, adapt to shifting data distributions, cope with outliers, retrain when necessary, and adapt to new tasks. This represents continual AutoML or Automatically Adaptive Machine Learning. We describe the challenges and proposes a reference architecture.
△ Less
Submitted 18 March, 2019; v1 submitted 12 March, 2019;
originally announced March 2019.
-
$β^3$-IRT: A New Item Response Model and its Applications
Authors:
Yu Chen,
Telmo Silva Filho,
Ricardo B. C. Prudêncio,
Tom Diethe,
Peter Flach
Abstract:
Item Response Theory (IRT) aims to assess latent abilities of respondents based on the correctness of their answers in aptitude test items with different difficulty levels. In this paper, we propose the $β^3$-IRT model, which models continuous responses and can generate a much enriched family of Item Characteristic Curve (ICC). In experiments we applied the proposed model to data from an online ex…
▽ More
Item Response Theory (IRT) aims to assess latent abilities of respondents based on the correctness of their answers in aptitude test items with different difficulty levels. In this paper, we propose the $β^3$-IRT model, which models continuous responses and can generate a much enriched family of Item Characteristic Curve (ICC). In experiments we applied the proposed model to data from an online exam platform, and show our model outperforms a more standard 2PL-ND model on all datasets. Furthermore, we show how to apply $β^3$-IRT to assess the ability of machine learning classifiers. This novel application results in a new metric for evaluating the quality of the classifier's probability estimates, based on the inferred difficulty and discrimination of data instances.
△ Less
Submitted 3 June, 2019; v1 submitted 10 March, 2019;
originally announced March 2019.
-
Probabilistic Sensor Fusion for Ambient Assisted Living
Authors:
Tom Diethe,
Niall Twomey,
Meelis Kull,
Peter Flach,
Ian Craddock
Abstract:
There is a widely-accepted need to revise current forms of health-care provision, with particular interest in sensing systems in the home. Given a multiple-modality sensor platform with heterogeneous network connectivity, as is under development in the Sensor Platform for HEalthcare in Residential Environment (SPHERE) Interdisciplinary Research Collaboration (IRC), we face specific challenges rela…
▽ More
There is a widely-accepted need to revise current forms of health-care provision, with particular interest in sensing systems in the home. Given a multiple-modality sensor platform with heterogeneous network connectivity, as is under development in the Sensor Platform for HEalthcare in Residential Environment (SPHERE) Interdisciplinary Research Collaboration (IRC), we face specific challenges relating to the fusion of the heterogeneous sensor modalities.
We introduce Bayesian models for sensor fusion, which aims to address the challenges of fusion of heterogeneous sensor modalities. Using this approach we are able to identify the modalities that have most utility for each particular activity, and simultaneously identify which features within that activity are most relevant for a given activity.
We further show how the two separate tasks of location prediction and activity recognition can be fused into a single model, which allows for simultaneous learning an prediction for both tasks.
We analyse the performance of this model on data collected in the SPHERE house, and show its utility. We also compare against some benchmark models which do not have the full structure,and show how the proposed model compares favourably to these methods
△ Less
Submitted 3 February, 2017;
originally announced February 2017.
-
A Note on the Kullback-Leibler Divergence for the von Mises-Fisher distribution
Authors:
Tom Diethe
Abstract:
We present a derivation of the Kullback Leibler (KL)-Divergence (also known as Relative Entropy) for the von Mises Fisher (VMF) Distribution in $d$-dimensions.
We present a derivation of the Kullback Leibler (KL)-Divergence (also known as Relative Entropy) for the von Mises Fisher (VMF) Distribution in $d$-dimensions.
△ Less
Submitted 25 February, 2015;
originally announced February 2015.
-
Convex Multiview Fisher Discriminant Analysis
Authors:
Tom Diethe,
John Shawe-Taylor
Abstract:
Section 1.3 was incorrect, and 2.1 will be removed from further submissions. A rewritten version will be posted in the future.
Section 1.3 was incorrect, and 2.1 will be removed from further submissions. A rewritten version will be posted in the future.
△ Less
Submitted 29 October, 2009; v1 submitted 18 August, 2009;
originally announced August 2009.