-
Using transfer learning to study burned area dynamics: A case study of refugee settlements in West Nile, Northern Uganda
Authors:
Robert Huppertz,
Catherine Nakalembe,
Hannah Kerner,
Ramani Lachyan,
Maxime Rischard
Abstract:
With the global refugee crisis at a historic high, there is a growing need to assess the impact of refugee settlements on their hosting countries and surrounding environments. Because fires are an important land management practice in smallholder agriculture in sub-Saharan Africa, burned area (BA) map**s can help provide information about the impacts of land management practices on local environ…
▽ More
With the global refugee crisis at a historic high, there is a growing need to assess the impact of refugee settlements on their hosting countries and surrounding environments. Because fires are an important land management practice in smallholder agriculture in sub-Saharan Africa, burned area (BA) map**s can help provide information about the impacts of land management practices on local environments. However, a lack of BA ground-truth data in much of sub-Saharan Africa limits the use of highly scalable deep learning (DL) techniques for such BA map**s. In this work, we propose a scalable transfer learning approach to study BA dynamics in areas with little to no ground-truth data such as the West Nile region in Northern Uganda. We train a deep learning model on BA ground-truth data in Portugal and propose the application of that model on refugee-hosting districts in West Nile between 2015 and 2020. By comparing the district-level BA dynamic with the wider West Nile region, we aim to add understanding of the land management impacts of refugee settlements on their surrounding environments.
△ Less
Submitted 29 July, 2021;
originally announced July 2021.
-
Aggregated Gaussian Processes with Multiresolution Earth Observation Covariates
Authors:
Harrison Zhu,
Adam Howes,
Owen van Eer,
Maxime Rischard,
Yingzhen Li,
Dino Sejdinovic,
Seth Flaxman
Abstract:
For many survey-based spatial modelling problems, responses are observed as spatially aggregated over survey regions due to limited resources. Covariates, from weather models and satellite imageries, can be observed at many different spatial resolutions, making the pre-processing of covariates a key challenge for any spatial modelling task. We propose a Gaussian process regression model to flexibl…
▽ More
For many survey-based spatial modelling problems, responses are observed as spatially aggregated over survey regions due to limited resources. Covariates, from weather models and satellite imageries, can be observed at many different spatial resolutions, making the pre-processing of covariates a key challenge for any spatial modelling task. We propose a Gaussian process regression model to flexibly handle multiresolution covariates by employing an additive kernel that can efficiently aggregate features across resolutions. Compared to existing approaches that rely on resolution matching, our approach better maintains distributional information across resolutions, leading to better performance and interpretability. Our model yields stronger predictive performance and interpretability on both simulated and crop yield datasets.
△ Less
Submitted 1 April, 2022; v1 submitted 4 May, 2021;
originally announced May 2021.
-
Predicting Landsat Reflectance with Deep Generative Fusion
Authors:
Shahine Bouabid,
Maxim Chernetskiy,
Maxime Rischard,
Jevgenij Gamper
Abstract:
Public satellite missions are commonly bound to a trade-off between spatial and temporal resolution as no single sensor provides fine-grained acquisitions with frequent coverage. This hinders their potential to assist vegetation monitoring or humanitarian actions, which require detecting rapid and detailed terrestrial surface changes. In this work, we probe the potential of deep generative models…
▽ More
Public satellite missions are commonly bound to a trade-off between spatial and temporal resolution as no single sensor provides fine-grained acquisitions with frequent coverage. This hinders their potential to assist vegetation monitoring or humanitarian actions, which require detecting rapid and detailed terrestrial surface changes. In this work, we probe the potential of deep generative models to produce high-resolution optical imagery by fusing products with different spatial and temporal characteristics. We introduce a dataset of co-registered Moderate Resolution Imaging Spectroradiometer (MODIS) and Landsat surface reflectance time series and demonstrate the ability of our generative model to blend coarse daily reflectance information into low-paced finer acquisitions. We benchmark our proposed model against state-of-the-art reflectance fusion algorithms.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
GaussianProcesses.jl: A Nonparametric Bayes package for the Julia Language
Authors:
Jamie Fairbrother,
Christopher Nemeth,
Maxime Rischard,
Johanni Brea,
Thomas Pinder
Abstract:
Gaussian processes are a class of flexible nonparametric Bayesian tools that are widely used across the sciences, and in industry, to model complex data sources. Key to applying Gaussian process models is the availability of well-developed open source software, which is available in many programming languages. In this paper, we present a tutorial of the GaussianProcesses.jl package that has been d…
▽ More
Gaussian processes are a class of flexible nonparametric Bayesian tools that are widely used across the sciences, and in industry, to model complex data sources. Key to applying Gaussian process models is the availability of well-developed open source software, which is available in many programming languages. In this paper, we present a tutorial of the GaussianProcesses.jl package that has been developed for the Julia programming language. GaussianProcesses.jl utilises the inherent computational benefits of the Julia language, including multiple dispatch and just-in-time compilation, to produce a fast, flexible and user-friendly Gaussian processes package. The package provides many mean and kernel functions with supporting inference tools to fit exact Gaussian process models, as well as a range of alternative likelihood functions to handle non-Gaussian data (e.g. binary classification models) and sparse approximations for scalable Gaussian processes. The package makes efficient use of existing Julia packages to provide users with a range of optimization and plotting tools.
△ Less
Submitted 30 June, 2019; v1 submitted 21 December, 2018;
originally announced December 2018.
-
Unbiased estimation of log normalizing constants with applications to Bayesian cross-validation
Authors:
Maxime Rischard,
Pierre E. Jacob,
Natesh Pillai
Abstract:
Posterior distributions often feature intractable normalizing constants, called marginal likelihoods or evidence, that are useful for model comparison via Bayes factors. This has motivated a number of methods for estimating ratios of normalizing constants in statistics. In computational physics the logarithm of these ratios correspond to free energy differences. Combining unbiased Markov chain Mon…
▽ More
Posterior distributions often feature intractable normalizing constants, called marginal likelihoods or evidence, that are useful for model comparison via Bayes factors. This has motivated a number of methods for estimating ratios of normalizing constants in statistics. In computational physics the logarithm of these ratios correspond to free energy differences. Combining unbiased Markov chain Monte Carlo estimators with path sampling, also called thermodynamic integration, we propose new unbiased estimators of the logarithm of ratios of normalizing constants. As a by-product, we propose unbiased estimators of the Bayesian cross-validation criterion. The proposed estimators are consistent, asymptotically Normal and can easily benefit from parallel processing devices. Various examples are considered for illustration.
△ Less
Submitted 2 October, 2018;
originally announced October 2018.
-
A Bayesian Nonparametric Approach to Geographic Regression Discontinuity Designs: Do School Districts Affect NYC House Prices?
Authors:
Maxime Rischard,
Zach Branson,
Luke Miratrix,
Luke Bornn
Abstract:
Most research on regression discontinuity designs (RDDs) has focused on univariate cases, where only those units with a "forcing" variable on one side of a threshold value receive a treatment. Geographical regression discontinuity designs (GeoRDDs) extend the RDD to multivariate settings with spatial forcing variables. We propose a framework for analysing GeoRDDs, which we implement using Gaussian…
▽ More
Most research on regression discontinuity designs (RDDs) has focused on univariate cases, where only those units with a "forcing" variable on one side of a threshold value receive a treatment. Geographical regression discontinuity designs (GeoRDDs) extend the RDD to multivariate settings with spatial forcing variables. We propose a framework for analysing GeoRDDs, which we implement using Gaussian process regression. This yields a Bayesian posterior distribution of the treatment effect at every point along the border. We address nuances of having a functional estimand defind on a border with potentially intricate topology, particularly when defining and estimating causal estimands of the local average treatment effect (LATE). The Bayesian estimate of the LATE can also be used as a test statistic in a hypothesis test with good frequentist properties, which we validate using simulations and placebo tests. We demonstrate our methodology with a dataset of property sales in New York City, to assess whether there is a discontinuity in housing prices at the border between two school district. We find a statistically significant difference in price across the border between the districts with $p$=0.002, and estimate a 20% higher price on average for a house on the more desirable side.
△ Less
Submitted 11 July, 2018;
originally announced July 2018.
-
Bias correction in daily maximum and minimum temperature measurements through Gaussian process modeling
Authors:
Maxime Rischard,
Natesh Pillai,
Karen A. McKinnon
Abstract:
The Global Historical Climatology Network-Daily database contains, among other variables, daily maximum and minimum temperatures from weather stations around the globe. It is long known that climatological summary statistics based on daily temperature minima and maxima will not be accurate, if the bias due to the time at which the observations were collected is not accounted for. Despite some prev…
▽ More
The Global Historical Climatology Network-Daily database contains, among other variables, daily maximum and minimum temperatures from weather stations around the globe. It is long known that climatological summary statistics based on daily temperature minima and maxima will not be accurate, if the bias due to the time at which the observations were collected is not accounted for. Despite some previous work, to our knowledge, there does not exist a satisfactory solution to this important problem. In this paper, we carefully detail the problem and develop a novel approach to address it. Our idea is to impute the hourly temperatures at the location of the measurements by borrowing information from the nearby stations that record hourly temperatures, which then can be used to create accurate summaries of temperature extremes. The key difficulty is that these imputations of the temperature curves must satisfy the constraint of falling between the observed daily minima and maxima, and attaining those values at least once in a twenty-four hour period. We develop a spatiotemporal Gaussian process model for imputing the hourly measurements from the nearby stations, and then develop a novel and easy to implement Markov Chain Monte Carlo technique to sample from the posterior distribution satisfying the above constraints. We validate our imputation model using hourly temperature data from four meteorological stations in Iowa, of which one is hidden and the data replaced with daily minima and maxima, and show that the imputed temperatures recover the hidden temperatures well. We also demonstrate that our model can exploit information contained in the data to infer the time of daily measurements.
△ Less
Submitted 29 May, 2018; v1 submitted 25 May, 2018;
originally announced May 2018.
-
A Nonparametric Bayesian Methodology for Regression Discontinuity Designs
Authors:
Zach Branson,
Maxime Rischard,
Luke Bornn,
Luke Miratrix
Abstract:
One of the most popular methodologies for estimating the average treatment effect at the threshold in a regression discontinuity design is local linear regression (LLR), which places larger weight on units closer to the threshold. We propose a Gaussian process regression methodology that acts as a Bayesian analog to LLR for regression discontinuity designs. Our methodology provides a flexible fit…
▽ More
One of the most popular methodologies for estimating the average treatment effect at the threshold in a regression discontinuity design is local linear regression (LLR), which places larger weight on units closer to the threshold. We propose a Gaussian process regression methodology that acts as a Bayesian analog to LLR for regression discontinuity designs. Our methodology provides a flexible fit for treatment and control responses by placing a general prior on the mean response functions. Furthermore, unlike LLR, our methodology can incorporate uncertainty in how units are weighted when estimating the treatment effect. We prove our method is consistent in estimating the average treatment effect at the threshold. Furthermore, we find via simulation that our method exhibits promising coverage, interval length, and mean squared error properties compared to standard LLR and state-of-the-art LLR methodologies. Finally, we explore the performance of our method on a real-world example by studying the impact of being a first-round draft pick on the performance and playing time of basketball players in the National Basketball Association.
△ Less
Submitted 30 September, 2018; v1 submitted 16 April, 2017;
originally announced April 2017.
-
On Machine-Learned Classification of Variable Stars with Sparse and Noisy Time-Series Data
Authors:
Joseph W. Richards,
Dan L. Starr,
Nathaniel R. Butler,
Joshua S. Bloom,
John M. Brewer,
Arien Crellin-Quick,
Justin Higgins,
Rachel Kennedy,
Maxime Rischard
Abstract:
With the coming data deluge from synoptic surveys, there is a growing need for frameworks that can quickly and automatically produce calibrated classification probabilities for newly-observed variables based on a small number of time-series measurements. In this paper, we introduce a methodology for variable-star classification, drawing from modern machine-learning techniques. We describe how to h…
▽ More
With the coming data deluge from synoptic surveys, there is a growing need for frameworks that can quickly and automatically produce calibrated classification probabilities for newly-observed variables based on a small number of time-series measurements. In this paper, we introduce a methodology for variable-star classification, drawing from modern machine-learning techniques. We describe how to homogenize the information gleaned from light curves by selection and computation of real-numbered metrics ("feature"), detail methods to robustly estimate periodic light-curve features, introduce tree-ensemble methods for accurate variable star classification, and show how to rigorously evaluate the classification results using cross validation. On a 25-class data set of 1542 well-studied variable stars, we achieve a 22.8% overall classification error using the random forest classifier; this represents a 24% improvement over the best previous classifier on these data. This methodology is effective for identifying samples of specific science classes: for pulsational variables used in Milky Way tomography we obtain a discovery efficiency of 98.2% and for eclipsing systems we find an efficiency of 99.1%, both at 95% purity. We show that the random forest (RF) classifier is superior to other machine-learned methods in terms of accuracy, speed, and relative immunity to features with no useful class information; the RF classifier can also be used to estimate the importance of each feature in classification. Additionally, we present the first astronomical use of hierarchical classification methods to incorporate a known class taxonomy in the classifier, which further reduces the catastrophic error rate to 7.8%. Excluding low-amplitude sources, our overall error rate improves to 14%, with a catastrophic error rate of 3.5%.
△ Less
Submitted 10 January, 2011;
originally announced January 2011.
-
Towards a Real-time Transient Classification Engine
Authors:
J. S. Bloom,
D. L. Starr,
N. R. Butler,
P. Nugent,
M. Rischard,
D. Eads,
D. Poznanski
Abstract:
Temporal sampling does more than add another axis to the vector of observables. Instead, under the recognition that how objects change (and move) in time speaks directly to the physics underlying astronomical phenomena, next-generation wide-field synoptic surveys are poised to revolutionize our understanding of just about anything that goes bump in the night (which is just about everything at so…
▽ More
Temporal sampling does more than add another axis to the vector of observables. Instead, under the recognition that how objects change (and move) in time speaks directly to the physics underlying astronomical phenomena, next-generation wide-field synoptic surveys are poised to revolutionize our understanding of just about anything that goes bump in the night (which is just about everything at some level). Still, even the most ambitious surveys will require targeted spectroscopic follow-up to fill in the physical details of newly discovered transients. We are now building a new system intended to ingest and classify transient phenomena in near real-time from high-throughput imaging data streams. Described herein, the Transient Classification Project at Berkeley will be making use of classification techniques operating on ``features'' extracted from time series and contextual (static) information. We also highlight the need for a community adoption of a standard representation of astronomical time series data (i.e., ``VOTimeseries'').
△ Less
Submitted 15 February, 2008;
originally announced February 2008.