Search | arXiv e-print repository

Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness

Authors: Adam Foster, Árpi Vezér, Craig A Glastonbury, Páidí Creed, Sam Abujudeh, Aaron Sim

Abstract: Learning meaningful representations of data that can address challenges such as batch effect correction and counterfactual inference is a central problem in many domains including computational biology. Adopting a Conditional VAE framework, we show that marginal independence between the representation and a condition variable plays a key role in both of these challenges. We propose the Contrastive… ▽ More Learning meaningful representations of data that can address challenges such as batch effect correction and counterfactual inference is a central problem in many domains including computational biology. Adopting a Conditional VAE framework, we show that marginal independence between the representation and a condition variable plays a key role in both of these challenges. We propose the Contrastive Mixture of Posteriors (CoMP) method that uses a novel misalignment penalty defined in terms of mixtures of the variational posteriors to enforce this independence in latent space. We show that CoMP has attractive theoretical properties compared to previous approaches, and we prove counterfactual identifiability of CoMP under additional assumptions. We demonstrate state-of-the-art performance on a set of challenging tasks including aligning human tumour samples with cancer cell-lines, predicting transcriptome-level perturbation responses, and batch correction on single-cell RNA sequencing data. We also find parallels to fair representation learning and demonstrate that CoMP is competitive on a common task in the field. △ Less

Submitted 26 June, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

Comments: Published as a conference paper (long presentation) at ICML 2022

arXiv:1811.06498 [pdf, other]

Adjusting for Confounding in Unsupervised Latent Representations of Images

Authors: Craig A. Glastonbury, Michael Ferlaino, Christoffer Nellåker, Cecilia M. Lindgren

Abstract: Biological imaging data are often partially confounded or contain unwanted variability. Examples of such phenomena include variable lighting across microscopy image captures, stain intensity variation in histological slides, and batch effects for high throughput drug screening assays. Therefore, to develop "fair" models which generalise well to unseen examples, it is crucial to learn data represen… ▽ More Biological imaging data are often partially confounded or contain unwanted variability. Examples of such phenomena include variable lighting across microscopy image captures, stain intensity variation in histological slides, and batch effects for high throughput drug screening assays. Therefore, to develop "fair" models which generalise well to unseen examples, it is crucial to learn data representations that are insensitive to nuisance factors of variation. In this paper, we present a strategy based on adversarial training, capable of learning unsupervised representations invariant to confounders. As an empirical validation of our method, we use deep convolutional autoencoders to learn unbiased cellular representations from microscopy imaging. △ Less

Submitted 26 November, 2018; v1 submitted 15 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

arXiv:1804.03270 [pdf, other]

Towards Deep Cellular Phenoty** in Placental Histology

Authors: Michael Ferlaino, Craig A. Glastonbury, Carolina Motta-Mejia, Manu Vatish, Ingrid Granne, Stephen Kennedy, Cecilia M. Lindgren, Christoffer Nellåker

Abstract: The placenta is a complex organ, playing multiple roles during fetal development. Very little is known about the association between placental morphological abnormalities and fetal physiology. In this work, we present an open sourced, computationally tractable deep learning pipeline to analyse placenta histology at the level of the cell. By utilising two deep Convolutional Neural Network architect… ▽ More The placenta is a complex organ, playing multiple roles during fetal development. Very little is known about the association between placental morphological abnormalities and fetal physiology. In this work, we present an open sourced, computationally tractable deep learning pipeline to analyse placenta histology at the level of the cell. By utilising two deep Convolutional Neural Network architectures and transfer learning, we can robustly localise and classify placental cells within five classes with an accuracy of 89%. Furthermore, we learn deep embeddings encoding phenotypic knowledge that is capable of both stratifying five distinct cell populations and learn intraclass phenotypic variance. We envisage that the automation of this pipeline to population scale studies of placenta histology has the potential to improve our understanding of basic cellular placental biology and its variations, particularly its role in predicting adverse birth outcomes. △ Less

Submitted 25 May, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

Comments: Updated MRC funding material. Corrected typo that suggested ensembling and Inception accuracy were the same (updated to reflect the fact the ensemble model is 1% better than previously reported)

arXiv:1703.08710 [pdf, other]

Count-ception: Counting by Fully Convolutional Redundant Counting

Authors: Joseph Paul Cohen, Genevieve Boucher, Craig A. Glastonbury, Henry Z. Lo, Yoshua Bengio

Abstract: Counting objects in digital images is a process that should be replaced by machines. This tedious task is time consuming and prone to errors due to fatigue of human annotators. The goal is to have a system that takes as input an image and returns a count of the objects inside and justification for the prediction in the form of object localization. We repose a problem, originally posed by Lempitsky… ▽ More Counting objects in digital images is a process that should be replaced by machines. This tedious task is time consuming and prone to errors due to fatigue of human annotators. The goal is to have a system that takes as input an image and returns a count of the objects inside and justification for the prediction in the form of object localization. We repose a problem, originally posed by Lempitsky and Zisserman, to instead predict a count map which contains redundant counts based on the receptive field of a smaller regression network. The regression network predicts a count of the objects that exist inside this frame. By processing the image in a fully convolutional way each pixel is going to be accounted for some number of times, the number of windows which include it, which is the size of each window, (i.e., 32x32 = 1024). To recover the true count we take the average over the redundant predictions. Our contribution is redundant counting instead of predicting a density map in order to average over errors. We also propose a novel deep neural network architecture adapted from the Inception family of networks called the Count-ception network. Together our approach results in a 20% relative improvement (2.9 to 2.3 MAE) over the state of the art method by Xie, Noble, and Zisserman in 2016. △ Less

Submitted 23 July, 2017; v1 submitted 25 March, 2017; originally announced March 2017.

Comments: Under Review

Showing 1–4 of 4 results for author: Glastonbury, C A