Search | arXiv e-print repository

arXiv:2305.19779 [pdf, other]

Deep learning and MCMC with aggVAE for shifting administrative boundaries: map** malaria prevalence in Kenya

Authors: Elizaveta Semenova, Swapnil Mishra, Samir Bhatt, Seth Flaxman, H Juliette T Unwin

Abstract: Model-based disease map** remains a fundamental policy-informing tool in the fields of public health and disease surveillance. Hierarchical Bayesian models have emerged as the state-of-the-art approach for disease map** since they are able to both capture structure in the data and robustly characterise uncertainty. When working with areal data, e.g.~aggregates at the administrative unit level… ▽ More Model-based disease map** remains a fundamental policy-informing tool in the fields of public health and disease surveillance. Hierarchical Bayesian models have emerged as the state-of-the-art approach for disease map** since they are able to both capture structure in the data and robustly characterise uncertainty. When working with areal data, e.g.~aggregates at the administrative unit level such as district or province, current models rely on the adjacency structure of areal units to account for spatial correlations and perform shrinkage. The goal of disease surveillance systems is to track disease outcomes over time. This task is especially challenging in crisis situations which often lead to redrawn administrative boundaries, meaning that data collected before and after the crisis are no longer directly comparable. Moreover, the adjacency-based approach ignores the continuous nature of spatial processes and cannot solve the change-of-support problem, i.e.~when estimates are required to be produced at different administrative levels or levels of aggregation. We present a novel, practical, and easy to implement solution to solve these problems relying on a methodology combining deep generative modelling and fully Bayesian inference: we build on the recently proposed PriorVAE method able to encode spatial priors over small areas with variational autoencoders by encoding aggregates over administrative units. We map malaria prevalence in Kenya, a country in which administrative boundaries changed in 2010. △ Less

Submitted 15 July, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

arXiv:2304.04307 [pdf, other]

PriorCVAE: scalable MCMC parameter inference with Bayesian deep generative modelling

Authors: Elizaveta Semenova, Prakhar Verma, Max Cairney-Leeming, Arno Solin, Samir Bhatt, Seth Flaxman

Abstract: Recent advances have shown that GP priors, or their finite realisations, can be encoded using deep generative models such as variational autoencoders (VAEs). These learned generators can serve as drop-in replacements for the original priors during MCMC inference. While this approach enables efficient inference, it loses information about the hyperparameters of the original models, and consequently… ▽ More Recent advances have shown that GP priors, or their finite realisations, can be encoded using deep generative models such as variational autoencoders (VAEs). These learned generators can serve as drop-in replacements for the original priors during MCMC inference. While this approach enables efficient inference, it loses information about the hyperparameters of the original models, and consequently makes inference over hyperparameters impossible and the learned priors indistinct. To overcome this limitation, we condition the VAE on stochastic process hyperparameters. This allows the joint encoding of hyperparameters with GP realizations and their subsequent estimation during inference. Further, we demonstrate that our proposed method, PriorCVAE, is agnostic to the nature of the models which it approximates, and can be used, for instance, to encode solutions of ODEs. It provides a practical tool for approximate inference and shows potential in real-life spatial and spatiotemporal applications. △ Less

Submitted 10 November, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

arXiv:2211.12139 [pdf, other]

City-Wide Perceptions of Neighbourhood Quality using Street View Images

Authors: Emily Muller, Emily Gemmell, Ishmam Choudhury, Ricky Nathvani, Antje Barbara Metzler, James Bennett, Emily Denton, Seth Flaxman, Majid Ezzati

Abstract: The interactions of individuals with city neighbourhoods is determined, in part, by the perceived quality of urban environments. Perceived neighbourhood quality is a core component of urban vitality, influencing social cohesion, sense of community, safety, activity and mental health of residents. Large-scale assessment of perceptions of neighbourhood quality was pioneered by the Place Pulse projec… ▽ More The interactions of individuals with city neighbourhoods is determined, in part, by the perceived quality of urban environments. Perceived neighbourhood quality is a core component of urban vitality, influencing social cohesion, sense of community, safety, activity and mental health of residents. Large-scale assessment of perceptions of neighbourhood quality was pioneered by the Place Pulse projects. Researchers demonstrated the efficacy of crowd-sourcing perception ratings of image pairs across 56 cities and training a model to predict perceptions from street-view images. Variation across cities may limit Place Pulse's usefulness for assessing within-city perceptions. In this paper, we set forth a protocol for city-specific dataset collection for the perception: 'On which street would you prefer to walk?'. This paper describes our methodology, based in London, including collection of images and ratings, web development, model training and map**. Assessment of within-city perceptions of neighbourhoods can identify inequities, inform planning priorities, and identify temporal dynamics. Code available: https://emilymuller1991.github.io/urban-perceptions/. △ Less

Submitted 24 November, 2022; v1 submitted 22 November, 2022; originally announced November 2022.

arXiv:2210.11844 [pdf, other]

Cox-Hawkes: doubly stochastic spatiotemporal Poisson processes

Authors: Xenia Miscouridou, Samir Bhatt, George Mohler, Seth Flaxman, Swapnil Mishra

Abstract: Hawkes processes are point process models that have been used to capture self-excitatory behavior in social interactions, neural activity, earthquakes and viral epidemics. They can model the occurrence of the times and locations of events. Here we develop a new class of spatiotemporal Hawkes processes that can capture both triggering and clustering behavior and we provide an efficient method for p… ▽ More Hawkes processes are point process models that have been used to capture self-excitatory behavior in social interactions, neural activity, earthquakes and viral epidemics. They can model the occurrence of the times and locations of events. Here we develop a new class of spatiotemporal Hawkes processes that can capture both triggering and clustering behavior and we provide an efficient method for performing inference. We use a log-Gaussian Cox process (LGCP) as prior for the background rate of the Hawkes process which gives arbitrary flexibility to capture a wide range of underlying background effects (for infectious diseases these are called endemic effects). The Hawkes process and LGCP are computationally expensive due to the former having a likelihood with quadratic complexity in the number of observations and the latter involving inversion of the precision matrix which is cubic in observations. Here we propose a novel approach to perform MCMC sampling for our Hawkes process with LGCP background, using pre-trained Gaussian Process generators which provide direct and cheap access to samples during inference. We show the efficacy and flexibility of our approach in experiments on simulated data and use our methods to uncover the trends in a dataset of reported crimes in the US. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: 8 Figures, 27 pages without references, 3 pages of references

arXiv:2210.07893 [pdf, other]

Numerically Stable Sparse Gaussian Processes via Minimum Separation using Cover Trees

Authors: Alexander Terenin, David R. Burt, Artem Artemev, Seth Flaxman, Mark van der Wilk, Carl Edward Rasmussen, Hong Ge

Abstract: Gaussian processes are frequently deployed as part of larger machine learning and decision-making systems, for instance in geospatial modeling, Bayesian optimization, or in latent Gaussian models. Within a system, the Gaussian process model needs to perform in a stable and reliable manner to ensure it interacts correctly with other parts of the system. In this work, we study the numerical stabilit… ▽ More Gaussian processes are frequently deployed as part of larger machine learning and decision-making systems, for instance in geospatial modeling, Bayesian optimization, or in latent Gaussian models. Within a system, the Gaussian process model needs to perform in a stable and reliable manner to ensure it interacts correctly with other parts of the system. In this work, we study the numerical stability of scalable sparse approximations based on inducing points. To do so, we first review numerical stability, and illustrate typical situations in which Gaussian process models can be unstable. Building on stability theory originally developed in the interpolation literature, we derive sufficient and in certain cases necessary conditions on the inducing points for the computations performed to be numerically stable. For low-dimensional tasks such as geospatial modeling, we propose an automated method for computing inducing points satisfying these conditions. This is done via a modification of the cover tree data structure, which is of independent interest. We additionally propose an alternative sparse approximation for regression with a Gaussian likelihood which trades off a small amount of performance to further improve stability. We provide illustrative examples showing the relationship between stability of calculations and predictive performance of inducing point methods on spatial tasks. △ Less

Submitted 16 January, 2024; v1 submitted 14 October, 2022; originally announced October 2022.

Journal ref: Journal of Machine Learning Research, 2024

arXiv:2209.09617 [pdf, other]

Seq2Seq Surrogates of Epidemic Models to Facilitate Bayesian Inference

Authors: Giovanni Charles, Timothy M. Wolock, Peter Winskill, Azra Ghani, Samir Bhatt, Seth Flaxman

Abstract: Epidemic models are powerful tools in understanding infectious disease. However, as they increase in size and complexity, they can quickly become computationally intractable. Recent progress in modelling methodology has shown that surrogate models can be used to emulate complex epidemic models with a high-dimensional parameter space. We show that deep sequence-to-sequence (seq2seq) models can serv… ▽ More Epidemic models are powerful tools in understanding infectious disease. However, as they increase in size and complexity, they can quickly become computationally intractable. Recent progress in modelling methodology has shown that surrogate models can be used to emulate complex epidemic models with a high-dimensional parameter space. We show that deep sequence-to-sequence (seq2seq) models can serve as accurate surrogates for complex epidemic models with sequence based model parameters, effectively replicating seasonal and long-term transmission dynamics. Once trained, our surrogate can predict scenarios a several thousand times faster than the original model, making them ideal for policy exploration. We demonstrate that replacing a traditional epidemic model with a learned simulator facilitates robust Bayesian inference. △ Less

Submitted 10 March, 2023; v1 submitted 20 September, 2022; originally announced September 2022.

arXiv:2112.15571 [pdf, other]

PCACE: A Statistical Approach to Ranking Neurons for CNN Interpretability

Authors: Sílvia Casacuberta, Esra Suel, Seth Flaxman

Abstract: In this paper we introduce a new problem within the growing literature of interpretability for convolution neural networks (CNNs). While previous work has focused on the question of how to visually interpret CNNs, we ask what it is that we care to interpret, that is, which layers and neurons are worth our attention? Due to the vast size of modern deep learning network architectures, automated, qua… ▽ More In this paper we introduce a new problem within the growing literature of interpretability for convolution neural networks (CNNs). While previous work has focused on the question of how to visually interpret CNNs, we ask what it is that we care to interpret, that is, which layers and neurons are worth our attention? Due to the vast size of modern deep learning network architectures, automated, quantitative methods are needed to rank the relative importance of neurons so as to provide an answer to this question. We present a new statistical method for ranking the hidden neurons in any convolutional layer of a network. We define importance as the maximal correlation between the activation maps and the class score. We provide different ways in which this method can be used for visualization purposes with MNIST and ImageNet, and show a real-world application of our method to air pollution prediction with street-level images. △ Less

Submitted 31 December, 2021; originally announced December 2021.

Journal ref: Responsible AI and DeepSpatial workshops at the 27th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021)

arXiv:2110.10422 [pdf, other]

doi 10.1098/rsif.2022.0094

PriorVAE: Encoding spatial priors with VAEs for small-area estimation

Authors: Elizaveta Semenova, Yidan Xu, Adam Howes, Theo Rashid, Samir Bhatt, Swapnil Mishra, Seth Flaxman

Abstract: Gaussian processes (GPs), implemented through multivariate Gaussian distributions for a finite collection of data, are the most popular approach in small-area spatial statistical modelling. In this context they are used to encode correlation structures over space and can generalise well in interpolation tasks. Despite their flexibility, off-the-shelf GPs present serious computational challenges wh… ▽ More Gaussian processes (GPs), implemented through multivariate Gaussian distributions for a finite collection of data, are the most popular approach in small-area spatial statistical modelling. In this context they are used to encode correlation structures over space and can generalise well in interpolation tasks. Despite their flexibility, off-the-shelf GPs present serious computational challenges which limit their scalability and practical usefulness in applied settings. Here, we propose a novel, deep generative modelling approach to tackle this challenge, termed PriorVAE: for a particular spatial setting, we approximate a class of GP priors through prior sampling and subsequent fitting of a variational autoencoder (VAE). Given a trained VAE, the resultant decoder allows spatial inference to become incredibly efficient due to the low dimensional, independently distributed latent Gaussian space representation of the VAE. Once trained, inference using the VAE decoder replaces the GP within a Bayesian sampling framework. This approach provides tractable and easy-to-implement means of approximately encoding spatial priors and facilitates efficient statistical inference. We demonstrate the utility of our VAE two stage approach on Bayesian, small-area estimation tasks. △ Less

Submitted 16 May, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

arXiv:2009.02264 [pdf, other]

Improving axial resolution in SIM using deep learning

Authors: Miguel Boland, Edward A. K. Cohen, Seth Flaxman, Mark A. A. Neil

Abstract: Structured Illumination Microscopy is a widespread methodology to image live and fixed biological structures smaller than the diffraction limits of conventional optical microscopy. Using recent advances in image up-scaling through deep learning models, we demonstrate a method to reconstruct 3D SIM image stacks with twice the axial resolution attainable through conventional SIM reconstructions. We… ▽ More Structured Illumination Microscopy is a widespread methodology to image live and fixed biological structures smaller than the diffraction limits of conventional optical microscopy. Using recent advances in image up-scaling through deep learning models, we demonstrate a method to reconstruct 3D SIM image stacks with twice the axial resolution attainable through conventional SIM reconstructions. We further evaluate our method for robustness to noise & generalisability to varying observed specimens, and discuss potential adaptions of the method to further improvements in resolution. △ Less

Submitted 18 February, 2021; v1 submitted 4 September, 2020; originally announced September 2020.

ACM Class: I.4.5; I.2.10

arXiv:2007.06566 [pdf, other]

A unified machine learning approach to time series forecasting applied to demand at emergency departments

Authors: Michaela A. C. Vollmer, Ben Glampson, Thomas A. Mellan, Swapnil Mishra, Luca Mercuri, Ceire Costello, Robert Klaber, Graham Cooke, Seth Flaxman, Samir Bhatt

Abstract: There were 25.6 million attendances at Emergency Departments (EDs) in England in 2019 corresponding to an increase of 12 million attendances over the past ten years. The steadily rising demand at EDs creates a constant challenge to provide adequate quality of care while maintaining standards and productivity. Managing hospital demand effectively requires an adequate knowledge of the future rate of… ▽ More There were 25.6 million attendances at Emergency Departments (EDs) in England in 2019 corresponding to an increase of 12 million attendances over the past ten years. The steadily rising demand at EDs creates a constant challenge to provide adequate quality of care while maintaining standards and productivity. Managing hospital demand effectively requires an adequate knowledge of the future rate of admission. Using 8 years of electronic admissions data from two major acute care hospitals in London, we develop a novel ensemble methodology that combines the outcomes of the best performing time series and machine learning approaches in order to make highly accurate forecasts of demand, 1, 3 and 7 days in the future. Both hospitals face an average daily demand of 208 and 106 attendances respectively and experience considerable volatility around this mean. However, our approach is able to predict attendances at these emergency departments one day in advance up to a mean absolute error of +/- 14 and +/- 10 patients corresponding to a mean absolute percentage error of 6.8% and 8.6% respectively. Our analysis compares machine learning algorithms to more traditional linear models. We find that linear models often outperform machine learning methods and that the quality of our predictions for any of the forecasting horizons of 1, 3 or 7 days are comparable as measured in MAE. In addition to comparing and combining state-of-the-art forecasting methods to predict hospital demand, we consider two different hyperparameter tuning methods, enabling a faster deployment of our models without compromising performance. We believe our framework can readily be used to forecast a wide range of policy relevant indicators. △ Less

Submitted 13 July, 2020; originally announced July 2020.

arXiv:2006.05371 [pdf, other]

Bayesian Probabilistic Numerical Integration with Tree-Based Models

Authors: Harrison Zhu, Xing Liu, Ruya Kang, Zhichao Shen, Seth Flaxman, François-Xavier Briol

Abstract: Bayesian quadrature (BQ) is a method for solving numerical integration problems in a Bayesian manner, which allows users to quantify their uncertainty about the solution. The standard approach to BQ is based on a Gaussian process (GP) approximation of the integrand. As a result, BQ is inherently limited to cases where GP approximations can be done in an efficient manner, thus often prohibiting ver… ▽ More Bayesian quadrature (BQ) is a method for solving numerical integration problems in a Bayesian manner, which allows users to quantify their uncertainty about the solution. The standard approach to BQ is based on a Gaussian process (GP) approximation of the integrand. As a result, BQ is inherently limited to cases where GP approximations can be done in an efficient manner, thus often prohibiting very high-dimensional or non-smooth target functions. This paper proposes to tackle this issue with a new Bayesian numerical integration algorithm based on Bayesian Additive Regression Trees (BART) priors, which we call BART-Int. BART priors are easy to tune and well-suited for discontinuous functions. We demonstrate that they also lend themselves naturally to a sequential design setting and that explicit convergence rates can be obtained in a variety of settings. The advantages and disadvantages of this new methodology are highlighted on a set of benchmark tests including the Genz functions, and on a Bayesian survey design problem. △ Less

Submitted 2 December, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

arXiv:2005.07927 [pdf, other]

BART-based inference for Poisson processes

Authors: Stamatina Lamprinakou, Mauricio Barahona, Seth Flaxman, Sarah Filippi, Axel Gandy, Emma McCoy

Abstract: The effectiveness of Bayesian Additive Regression Trees (BART) has been demonstrated in a variety of contexts including non-parametric regression and classification. A BART scheme for estimating the intensity of inhomogeneous Poisson processes is introduced. Poisson intensity estimation is a vital task in various applications including medical imaging, astrophysics and network traffic analysis. Th… ▽ More The effectiveness of Bayesian Additive Regression Trees (BART) has been demonstrated in a variety of contexts including non-parametric regression and classification. A BART scheme for estimating the intensity of inhomogeneous Poisson processes is introduced. Poisson intensity estimation is a vital task in various applications including medical imaging, astrophysics and network traffic analysis. The new approach enables full posterior inference of the intensity in a non-parametric regression setting. The performance of the novel scheme is demonstrated through simulation studies on synthetic and real datasets up to five dimensions, and the new scheme is compared with alternative approaches. △ Less

Submitted 12 November, 2022; v1 submitted 16 May, 2020; originally announced May 2020.

Comments: Accepted version including Supplementary Material. To appear in Computational Statistics & Data Analysis (CSDA)

arXiv:2002.06873 [pdf, other]

$π$VAE: a stochastic process prior for Bayesian deep learning with MCMC

Authors: Swapnil Mishra, Seth Flaxman, Tresnia Berah, Harrison Zhu, Mikko Pakkanen, Samir Bhatt

Abstract: Stochastic processes provide a mathematically elegant way model complex data. In theory, they provide flexible priors over function classes that can encode a wide range of interesting assumptions. In practice, however, efficient inference by optimisation or marginalisation is difficult, a problem further exacerbated with big data and high dimensional input spaces. We propose a novel variational au… ▽ More Stochastic processes provide a mathematically elegant way model complex data. In theory, they provide flexible priors over function classes that can encode a wide range of interesting assumptions. In practice, however, efficient inference by optimisation or marginalisation is difficult, a problem further exacerbated with big data and high dimensional input spaces. We propose a novel variational autoencoder (VAE) called the prior encoding variational autoencoder ($π$VAE). The $π$VAE is finitely exchangeable and Kolmogorov consistent, and thus is a continuous stochastic process. We use $π$VAE to learn low dimensional embeddings of function classes. We show that our framework can accurately learn expressive function classes such as Gaussian processes, but also properties of functions to enable statistical inference (such as the integral of a log Gaussian process). For popular tasks, such as spatial interpolation, $π$VAE achieves state-of-the-art performance both in terms of accuracy and computational efficiency. Perhaps most usefully, we demonstrate that the low dimensional independently distributed latent space representation learnt provides an elegant and scalable means of performing Bayesian inference for stochastic processes within probabilistic programming languages such as Stan. △ Less

Submitted 13 September, 2022; v1 submitted 17 February, 2020; originally announced February 2020.

arXiv:1906.09230 [pdf, other]

Modeling and Forecasting Art Movements with CGANs

Authors: Edoardo Lisi, Mohammad Malekzadeh, Hamed Haddadi, F. Din-Houn Lau, Seth Flaxman

Abstract: Conditional Generative Adversarial Networks~(CGAN) are a recent and popular method for generating samples from a probability distribution conditioned on latent information. The latent information often comes in the form of a discrete label from a small set. We propose a novel method for training CGANs which allows us to condition on a sequence of continuous latent distributions… ▽ More Conditional Generative Adversarial Networks~(CGAN) are a recent and popular method for generating samples from a probability distribution conditioned on latent information. The latent information often comes in the form of a discrete label from a small set. We propose a novel method for training CGANs which allows us to condition on a sequence of continuous latent distributions $f^{(1)}, \ldots, f^{(K)}$. This training allows CGANs to generate samples from a sequence of distributions. We apply our method to paintings from a sequence of artistic movements, where each movement is considered to be its own distribution. Exploiting the temporal aspect of the data, a vector autoregressive (VAR) model is fitted to the means of the latent distributions that we learn, and used for one-step-ahead forecasting, to predict the latent distribution of a future art movement $f^{(K+1)}$. Realisations from this distribution can be used by the CGAN to generate "future" paintings. In experiments, this novel methodology generates accurate predictions of the evolution of art. The training set consists of a large dataset of past paintings. While there is no agreement on exactly what current art period we find ourselves in, we test on plausible candidate sets of present art, and show that the mean distance to our predictions is small. △ Less

Submitted 18 March, 2020; v1 submitted 21 June, 2019; originally announced June 2019.

Comments: 15 pages, 6 figures

Journal ref: Royal Society Open Science, 2020

arXiv:1901.09839 [pdf, other]

Interpreting Deep Neural Networks Through Variable Importance

Authors: Jonathan Ish-Horowicz, Dana Udwin, Seth Flaxman, Sarah Filippi, Lorin Crawford

Abstract: While the success of deep neural networks (DNNs) is well-established across a variety of domains, our ability to explain and interpret these methods is limited. Unlike previously proposed local methods which try to explain particular classification decisions, we focus on global interpretability and ask a universally applicable question: given a trained model, which features are the most important?… ▽ More While the success of deep neural networks (DNNs) is well-established across a variety of domains, our ability to explain and interpret these methods is limited. Unlike previously proposed local methods which try to explain particular classification decisions, we focus on global interpretability and ask a universally applicable question: given a trained model, which features are the most important? In the context of neural networks, a feature is rarely important on its own, so our strategy is specifically designed to leverage partial covariance structures and incorporate variable dependence into feature ranking. Our methodological contributions in this paper are two-fold. First, we propose an effect size analogue for DNNs that is appropriate for applications with highly collinear predictors (ubiquitous in computer vision). Second, we extend the recently proposed "RelATive cEntrality" (RATE) measure (Crawford et al., 2019) to the Bayesian deep learning setting. RATE applies an information theoretic criterion to the posterior distribution of effect sizes to assess feature significance. We apply our framework to three broad application areas: computer vision, natural language processing, and social science. △ Less

Submitted 28 April, 2020; v1 submitted 28 January, 2019; originally announced January 2019.

arXiv:1805.10205 [pdf, other]

doi 10.1145/3219819.3219853

Multimodal Sentiment Analysis To Explore the Structure of Emotions

Authors: Anthony Hu, Seth Flaxman

Abstract: We propose a novel approach to multimodal sentiment analysis using deep neural networks combining visual analysis and natural language processing. Our goal is different than the standard sentiment analysis goal of predicting whether a sentence expresses positive or negative sentiment; instead, we aim to infer the latent emotional state of the user. Thus, we focus on predicting the emotion word tag… ▽ More We propose a novel approach to multimodal sentiment analysis using deep neural networks combining visual analysis and natural language processing. Our goal is different than the standard sentiment analysis goal of predicting whether a sentence expresses positive or negative sentiment; instead, we aim to infer the latent emotional state of the user. Thus, we focus on predicting the emotion word tags attached by users to their Tumblr posts, treating these as "self-reported emotions." We demonstrate that our multimodal model combining both text and image features outperforms separate models based solely on either images or text. Our model's results are interpretable, automatically yielding sensible word lists associated with emotions. We explore the structure of emotions implied by our model and compare it to what has been posited in the psychology literature, and validate our model on a set of images that have been used in psychology studies. Finally, our work also provides a useful tool for the growing academic study of images - both photographs and memes - on social networks. △ Less

Submitted 25 May, 2018; originally announced May 2018.

Comments: Accepted as a conference paper at KDD 2018

arXiv:1805.08463 [pdf, other]

Variational Learning on Aggregate Outputs with Gaussian Processes

Authors: Ho Chung Leon Law, Dino Sejdinovic, Ewan Cameron, Tim CD Lucas, Seth Flaxman, Katherine Battle, Kenji Fukumizu

Abstract: While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global map** of disease, only have access to outputs at a much coarser level than that of the inputs. Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on varia… ▽ More While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global map** of disease, only have access to outputs at a much coarser level than that of the inputs. Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on variational learning with a model of output aggregation and Gaussian processes, where aggregation leads to intractability of the standard evidence lower bounds. We propose new bounds and tractable approximations, leading to improved prediction accuracy and scalability to large datasets, while explicitly taking uncertainty into account. We develop a framework which extends to several types of likelihoods, including the Poisson model for aggregated count data. We apply our framework to a challenging and important problem, the fine-scale spatial modelling of malaria incidence, with over 1 million observations. △ Less

Submitted 22 May, 2018; originally announced May 2018.

arXiv:1705.04293 [pdf, other]

Bayesian Approaches to Distribution Regression

Authors: Ho Chung Leon Law, Danica J. Sutherland, Dino Sejdinovic, Seth Flaxman

Abstract: Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level. Current approaches, however, do not propagate the uncertainty in observations due to sampling variability in the groups. This effectively assumes that small and large groups are estimated equally… ▽ More Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level. Current approaches, however, do not propagate the uncertainty in observations due to sampling variability in the groups. This effectively assumes that small and large groups are estimated equally well, and should have equal weight in the final regression. We account for this uncertainty with a Bayesian distribution regression formalism, improving the robustness and performance of the model when group sizes vary. We frame our models in a neural network style, allowing for simple MAP inference using backpropagation to learn the parameters, as well as MCMC-based inference which can fully propagate uncertainty. We demonstrate our approach on illustrative toy datasets, as well as on a challenging problem of predicting age from images. △ Less

Submitted 14 January, 2021; v1 submitted 11 May, 2017; originally announced May 2017.

Journal ref: Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics (AISTATS 2018), PMLR 84:1167-1176

arXiv:1606.08813 [pdf, other]

doi 10.1609/aimag.v38i3.2741

European Union regulations on algorithmic decision-making and a "right to explanation"

Authors: Bryce Goodman, Seth Flaxman

Abstract: We summarize the potential impact that the European Union's new General Data Protection Regulation will have on the routine use of machine learning algorithms. Slated to take effect as law across the EU in 2018, it will restrict automated individual decision-making (that is, algorithms that make decisions based on user-level predictors) which "significantly affect" users. The law will also effecti… ▽ More We summarize the potential impact that the European Union's new General Data Protection Regulation will have on the routine use of machine learning algorithms. Slated to take effect as law across the EU in 2018, it will restrict automated individual decision-making (that is, algorithms that make decisions based on user-level predictors) which "significantly affect" users. The law will also effectively create a "right to explanation," whereby a user can ask for an explanation of an algorithmic decision that was made about them. We argue that while this law will pose large challenges for industry, it highlights opportunities for computer scientists to take the lead in designing algorithms and evaluation frameworks which avoid discrimination and enable explanation. △ Less

Submitted 31 August, 2016; v1 submitted 28 June, 2016; originally announced June 2016.

Comments: presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Journal ref: AI Magazine, Vol 38, No 3, 2017

arXiv:1605.07025 [pdf, other]

Collaborative Filtering with Side Information: a Gaussian Process Perspective

Authors: Hyunjik Kim, Xiaoyu Lu, Seth Flaxman, Yee Whye Teh

Abstract: We tackle the problem of collaborative filtering (CF) with side information, through the lens of Gaussian Process (GP) regression. Driven by the idea of using the kernel to explicitly model user-item similarities, we formulate the GP in a way that allows the incorporation of low-rank matrix factorisation, arriving at our model, the Tucker Gaussian Process (TGP). Consequently, TGP generalises class… ▽ More We tackle the problem of collaborative filtering (CF) with side information, through the lens of Gaussian Process (GP) regression. Driven by the idea of using the kernel to explicitly model user-item similarities, we formulate the GP in a way that allows the incorporation of low-rank matrix factorisation, arriving at our model, the Tucker Gaussian Process (TGP). Consequently, TGP generalises classical Bayesian matrix factorisation models, and goes beyond them to give a natural and elegant method for incorporating side information, giving enhanced predictive performance for CF problems. Moreover we show that it is a novel model for regression, especially well-suited to grid-structured data and problems where the dependence on covariates is close to being separable. △ Less

Submitted 8 June, 2017; v1 submitted 23 May, 2016; originally announced May 2016.

Showing 1–20 of 20 results for author: Flaxman, S