-
Bayesian Inference for Evidence Accumulation Models with Regressors
Authors:
Viet Hung Dao,
David Gunawan,
Robert Kohn,
Minh-Ngoc Tran,
Guy E. Hawkins,
Scott D. Brown
Abstract:
Evidence accumulation models (EAMs) are an important class of cognitive models used to analyze both response time and response choice data recorded from decision-making tasks. Developments in estimation procedures have helped EAMs become important both in basic scientific applications and solution-focussed applied work. Hierarchical Bayesian estimation frameworks for the linear ballistic accumulat…
▽ More
Evidence accumulation models (EAMs) are an important class of cognitive models used to analyze both response time and response choice data recorded from decision-making tasks. Developments in estimation procedures have helped EAMs become important both in basic scientific applications and solution-focussed applied work. Hierarchical Bayesian estimation frameworks for the linear ballistic accumulator model (LBA) and the diffusion decision model (DDM) have been widely used, but still suffer from some key limitations, particularly for large sample sizes, for models with many parameters, and when linking decision-relevant covariates to model parameters. We extend upon previous work with methods for estimating the LBA and DDM in hierarchical Bayesian frameworks that include random effects which are correlated between people, and include regression-model links between decision-relevant covariates and model parameters. Our methods work equally well in cases where the covariates are measured once per person (e.g., personality traits or psychological tests) or once per decision (e.g., neural or physiological data). We provide methods for exact Bayesian inference, using particle-based MCMC, and also approximate methods based on variational Bayesian (VB) inference. The VB methods are sufficiently fast and efficient that they can address large-scale estimation problems, such as with very large data sets. We evaluate the performance of these methods in applications to data from three existing experiments. Detailed algorithmic implementations and code are freely available for all methods.
△ Less
Submitted 31 May, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
Analysis of sloppiness in model simulations: unveiling parameter uncertainty when mathematical models are fitted to data
Authors:
Gloria M. Monsalve-Bravo,
Brodie A. J. Lawson,
Christopher Drovandi,
Kevin Burrage,
Kevin S. Brown,
Christopher M. Baker,
Sarah A. Vollert,
Kerrie Mengersen,
Eve McDonald-Madden,
Matthew P. Adams
Abstract:
This work introduces a comprehensive approach to assess the sensitivity of model outputs to changes in parameter values, constrained by the combination of prior beliefs and data. This novel approach identifies stiff parameter combinations strongly affecting the quality of the model-data fit while simultaneously revealing which of these key parameter combinations are informed primarily by the data…
▽ More
This work introduces a comprehensive approach to assess the sensitivity of model outputs to changes in parameter values, constrained by the combination of prior beliefs and data. This novel approach identifies stiff parameter combinations strongly affecting the quality of the model-data fit while simultaneously revealing which of these key parameter combinations are informed primarily by the data or are also substantively influenced by the priors. We focus on the very common context in complex systems where the amount and quality of data are low compared to the number of model parameters to be collectively estimated, and showcase the benefits of this technique for applications in biochemistry, ecology, and cardiac electrophysiology. We also show how stiff parameter combinations, once identified, uncover controlling mechanisms underlying the system being modeled and inform which of the model parameters need to be prioritized in future experiments for improved parameter inference from collective model-data fitting.
△ Less
Submitted 21 September, 2022; v1 submitted 28 March, 2022;
originally announced March 2022.
-
Joint Estimation of Extreme Spatially Aggregated Precipitation at Different Scales through Mixture Modelling
Authors:
Jordan Richards,
Jonathan A. Tawn,
Simon Brown
Abstract:
Although most models for rainfall extremes focus on point-wise values, it is aggregated precipitation over areas up to river catchment scale that is of the most interest. To capture the joint behaviour of precipitation aggregates evaluated at different spatial scales, parsimonious and effective models must be built with knowledge of the underlying spatial process. Precipitation is driven by a mixt…
▽ More
Although most models for rainfall extremes focus on point-wise values, it is aggregated precipitation over areas up to river catchment scale that is of the most interest. To capture the joint behaviour of precipitation aggregates evaluated at different spatial scales, parsimonious and effective models must be built with knowledge of the underlying spatial process. Precipitation is driven by a mixture of processes acting at different scales and intensities, e.g., convective and frontal, with extremes of aggregates for typical catchment sizes arising from extremes of only one of these processes, rather than a combination of them. High-intensity convective events cause extreme spatial aggregates at small scales but the contribution of lower-intensity large-scale fronts is likely to increase as the area aggregated increases. Thus, to capture small to large scale spatial aggregates within a single approach requires a model that can accurately capture the extremal properties of both convective and frontal events. Previous extreme value methods have ignored this mixture structure; we propose a spatial extreme value model which is a mixture of two components with different marginal and dependence models that are able to capture the extremal behaviour of convective and frontal rainfall and more faithfully reproduces spatial aggregates for a wide range of scales. Modelling extremes of the frontal component raises new challenges due to it exhibiting strong long-range extremal spatial dependence. Our modelling approach is applied to fine-scale, high-dimensional, gridded precipitation data. We show that accounting for the mixture structure improves the joint inference on extremes of spatial aggregates over regions of different sizes.
△ Less
Submitted 2 January, 2023; v1 submitted 16 November, 2021;
originally announced November 2021.
-
Weak Convergence of Non-neutral Genealogies to Kingman's Coalescent
Authors:
Suzie Brown,
Paul A. Jenkins,
Adam M. Johansen,
Jere Koskela
Abstract:
Interacting particle systems undergoing repeated mutation and selection steps model genetic evolution, and also describe a broad class of sequential Monte Carlo methods. The genealogical tree embedded into the system is important in both applications. Under neutrality, when fitnesses of particles are independent from those of their parents, rescaled genealogies are known to converge to Kingman's c…
▽ More
Interacting particle systems undergoing repeated mutation and selection steps model genetic evolution, and also describe a broad class of sequential Monte Carlo methods. The genealogical tree embedded into the system is important in both applications. Under neutrality, when fitnesses of particles are independent from those of their parents, rescaled genealogies are known to converge to Kingman's coalescent. Recent work has established convergence under non-neutrality, but only for finite-dimensional distributions. We prove weak convergence of non-neutral genealogies on the space of càdlàg paths under standard assumptions, enabling analysis of the whole genealogical tree.
△ Less
Submitted 19 April, 2023; v1 submitted 11 October, 2021;
originally announced October 2021.
-
Modelling Extremes of Spatial Aggregates of Precipitation using Conditional Methods
Authors:
Jordan Richards,
Jonathan A. Tawn,
Simon Brown
Abstract:
Inference on the extremal behaviour of spatial aggregates of precipitation is important for quantifying river flood risk. There are two classes of previous approach, with one failing to ensure self-consistency in inference across different regions of aggregation and the other imposing highly restrictive assumptions. To overcome these issues, we propose a model for high-resolution precipitation dat…
▽ More
Inference on the extremal behaviour of spatial aggregates of precipitation is important for quantifying river flood risk. There are two classes of previous approach, with one failing to ensure self-consistency in inference across different regions of aggregation and the other imposing highly restrictive assumptions. To overcome these issues, we propose a model for high-resolution precipitation data, from which we can simulate realistic fields and explore the behaviour of spatial aggregates. Recent developments have seen spatial extensions of the Heffernan and Tawn (2004) model for conditional multivariate extremes, which can handle a wide range of dependence structures. Our contribution is twofold: extensions and improvements of this approach and its model inference for high-dimensional data; and a novel framework for deriving aggregates addressing edge effects and sub-regions without rain. We apply our modelling approach to gridded East-Anglia, UK precipitation data. Return-level curves for spatial aggregates over different regions of various sizes are estimated and shown to fit very well to the data.
△ Less
Submitted 21 June, 2022; v1 submitted 22 February, 2021;
originally announced February 2021.
-
Efficient Selection Between Hierarchical Cognitive Models: Cross-validation With Variational Bayes
Authors:
Viet-Hung Dao,
David Gunawan,
Minh-Ngoc Tran,
Robert Kohn,
Guy E. Hawkins,
Scott D. Brown
Abstract:
Model comparison is the cornerstone of theoretical progress in psychological research. Common practice overwhelmingly relies on tools that evaluate competing models by balancing in-sample descriptive adequacy against model flexibility, with modern approaches advocating the use of marginal likelihood for hierarchical cognitive models. Cross-validation is another popular approach but its implementat…
▽ More
Model comparison is the cornerstone of theoretical progress in psychological research. Common practice overwhelmingly relies on tools that evaluate competing models by balancing in-sample descriptive adequacy against model flexibility, with modern approaches advocating the use of marginal likelihood for hierarchical cognitive models. Cross-validation is another popular approach but its implementation has remained out of reach for cognitive models evaluated in a Bayesian hierarchical framework, with the major hurdle being prohibitive computational cost. To address this issue, we develop novel algorithms that make variational Bayes (VB) inference for hierarchical models feasible and computationally efficient for complex cognitive models of substantive theoretical interest. It is well known that VB produces good estimates of the first moments of the parameters which gives good predictive densities estimates. We thus develop a novel VB algorithm with Bayesian prediction as a tool to perform model comparison by cross-validation, which we refer to as CVVB. In particular, the CVVB can be used as a model screening device that quickly identifies bad models. We demonstrate the utility of CVVB by revisiting a classic question in decision making research: what latent components of processing drive the ubiquitous speed-accuracy tradeoff? We demonstrate that CVVB strongly agrees with model comparison via marginal likelihood yet achieves the outcome in much less time. Our approach brings cross-validation within reach of theoretically important psychological models, and makes it feasible to compare much larger families of hierarchically specified cognitive models than has previously been possible.
△ Less
Submitted 8 October, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Bayesian Robust Optimization for Imitation Learning
Authors:
Daniel S. Brown,
Scott Niekum,
Marek Petrik
Abstract:
One of the main challenges in imitation learning is determining what action an agent should take when outside the state distribution of the demonstrations. Inverse reinforcement learning (IRL) can enable generalization to new states by learning a parameterized reward function, but these approaches still face uncertainty over the true reward function and corresponding optimal policy. Existing safe…
▽ More
One of the main challenges in imitation learning is determining what action an agent should take when outside the state distribution of the demonstrations. Inverse reinforcement learning (IRL) can enable generalization to new states by learning a parameterized reward function, but these approaches still face uncertainty over the true reward function and corresponding optimal policy. Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function. While completely ignoring risk can lead to overly aggressive and unsafe policies, optimizing in a fully adversarial sense is also problematic as it can lead to overly conservative policies that perform poorly in practice. To provide a bridge between these two extremes, we propose Bayesian Robust Optimization for Imitation Learning (BROIL). BROIL leverages Bayesian reward function inference and a user specific risk tolerance to efficiently optimize a robust policy that balances expected return and conditional value at risk. Our empirical results show that BROIL provides a natural way to interpolate between return-maximizing and risk-minimizing behaviors and outperforms existing risk-sensitive and risk-neutral inverse reinforcement learning algorithms. Code is available at https://github.com/dsbrown1331/broil.
△ Less
Submitted 29 February, 2024; v1 submitted 23 July, 2020;
originally announced July 2020.
-
Simple conditions for convergence of sequential Monte Carlo genealogies with applications
Authors:
Suzie Brown,
Paul A. Jenkins,
Adam M. Johansen,
Jere Koskela
Abstract:
We present simple conditions under which the limiting genealogical process associated with a class of interacting particle systems with non-neutral selection mechanisms, as the number of particles grows, is a time-rescaled Kingman coalescent. Sequential Monte Carlo algorithms are popular methods for approximating integrals in problems such as non-linear filtering and smoothing which employ this ty…
▽ More
We present simple conditions under which the limiting genealogical process associated with a class of interacting particle systems with non-neutral selection mechanisms, as the number of particles grows, is a time-rescaled Kingman coalescent. Sequential Monte Carlo algorithms are popular methods for approximating integrals in problems such as non-linear filtering and smoothing which employ this type of particle system. Their performance depends strongly on the properties of the induced genealogical process. We verify the conditions of our main result for standard sequential Monte Carlo algorithms with a broad class of low-variance resampling schemes, as well as for conditional sequential Monte Carlo with multinomial resampling.
△ Less
Submitted 7 December, 2020; v1 submitted 30 June, 2020;
originally announced July 2020.
-
Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences
Authors:
Daniel S. Brown,
Russell Coleman,
Ravi Srinivasan,
Scott Niekum
Abstract:
Bayesian reward learning from demonstrations enables rigorous safety and uncertainty analysis when performing imitation learning. However, Bayesian reward learning methods are typically computationally intractable for complex control problems. We propose Bayesian Reward Extrapolation (Bayesian REX), a highly efficient Bayesian reward learning algorithm that scales to high-dimensional imitation lea…
▽ More
Bayesian reward learning from demonstrations enables rigorous safety and uncertainty analysis when performing imitation learning. However, Bayesian reward learning methods are typically computationally intractable for complex control problems. We propose Bayesian Reward Extrapolation (Bayesian REX), a highly efficient Bayesian reward learning algorithm that scales to high-dimensional imitation learning problems by pre-training a low-dimensional feature encoding via self-supervised tasks and then leveraging preferences over demonstrations to perform fast Bayesian inference. Bayesian REX can learn to play Atari games from demonstrations, without access to the game score and can generate 100,000 samples from the posterior over reward functions in only 5 minutes on a personal laptop. Bayesian REX also results in imitation learning performance that is competitive with or better than state-of-the-art methods that only learn point estimates of the reward function. Finally, Bayesian REX enables efficient high-confidence policy evaluation without having access to samples of the reward function. These high-confidence performance bounds can be used to rank the performance and risk of a variety of evaluation policies and provide a way to detect reward hacking behaviors.
△ Less
Submitted 17 December, 2020; v1 submitted 20 February, 2020;
originally announced February 2020.
-
Deep Bayesian Reward Learning from Preferences
Authors:
Daniel S. Brown,
Scott Niekum
Abstract:
Bayesian inverse reinforcement learning (IRL) methods are ideal for safe imitation learning, as they allow a learning agent to reason about reward uncertainty and the safety of a learned policy. However, Bayesian IRL is computationally intractable for high-dimensional problems because each sample from the posterior requires solving an entire Markov Decision Process (MDP). While there exist non-Bay…
▽ More
Bayesian inverse reinforcement learning (IRL) methods are ideal for safe imitation learning, as they allow a learning agent to reason about reward uncertainty and the safety of a learned policy. However, Bayesian IRL is computationally intractable for high-dimensional problems because each sample from the posterior requires solving an entire Markov Decision Process (MDP). While there exist non-Bayesian deep IRL methods, these methods typically infer point estimates of reward functions, precluding rigorous safety and uncertainty analysis. We propose Bayesian Reward Extrapolation (B-REX), a highly efficient, preference-based Bayesian reward learning algorithm that scales to high-dimensional, visual control tasks. Our approach uses successor feature representations and preferences over demonstrations to efficiently generate samples from the posterior distribution over the demonstrator's reward function without requiring an MDP solver. Using samples from the posterior, we demonstrate how to calculate high-confidence bounds on policy performance in the imitation learning setting, in which the ground-truth reward function is unknown. We evaluate our proposed approach on the task of learning to play Atari games via imitation learning from pixel inputs, with no access to the game score. We demonstrate that B-REX learns imitation policies that are competitive with a state-of-the-art deep imitation learning method that only learns a point estimate of the reward function. Furthermore, we demonstrate that samples from the posterior generated via B-REX can be used to compute high-confidence performance bounds for a variety of evaluation policies. We show that high-confidence performance bounds are useful for accurately ranking different evaluation policies when the reward function is unknown. We also demonstrate that high-confidence performance bounds may be useful for detecting reward hacking.
△ Less
Submitted 9 December, 2019;
originally announced December 2019.
-
Identifying relationships between cognitive processes across tasks, contexts, and time
Authors:
Laura Wall,
David Gunawan,
Scott D. Brown,
Minh-Ngoc Tran,
Robert Kohn,
Guy E. Hawkins
Abstract:
It is commonly assumed that a specific testing occasion (task, design, procedure, etc.) provides insights that generalise beyond that occasion. This assumption is infrequently carefully tested in data. We develop a statistically principled method to directly estimate the correlation between latent components of cognitive processing across tasks, contexts, and time. This method simultaneously estim…
▽ More
It is commonly assumed that a specific testing occasion (task, design, procedure, etc.) provides insights that generalise beyond that occasion. This assumption is infrequently carefully tested in data. We develop a statistically principled method to directly estimate the correlation between latent components of cognitive processing across tasks, contexts, and time. This method simultaneously estimates individual-participant parameters of a cognitive model at each testing occasion, group-level parameters representing across-participant parameter averages and variances, and across-task correlations. The approach provides a natural way to "borrow" strength across testing occasions, which can increase the precision of parameter estimates across all testing occasions. Two example applications demonstrate that the method is practical in standard designs. The examples, and a simulation study, also provide evidence about the reliability and validity of parameter estimates from the linear ballistic accumulator model. We conclude by highlighting the potential of the parameter-correlation method to provide an "assumption-light" tool for estimating the relatedness of cognitive processes across tasks, contexts, and time.
△ Less
Submitted 26 March, 2020; v1 submitted 16 October, 2019;
originally announced October 2019.
-
Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations
Authors:
Daniel S. Brown,
Wonjoon Goo,
Scott Niekum
Abstract:
The performance of imitation learning is typically upper-bounded by the performance of the demonstrator. While recent empirical results demonstrate that ranked demonstrations allow for better-than-demonstrator performance, preferences over demonstrations may be difficult to obtain, and little is known theoretically about when such methods can be expected to successfully extrapolate beyond the perf…
▽ More
The performance of imitation learning is typically upper-bounded by the performance of the demonstrator. While recent empirical results demonstrate that ranked demonstrations allow for better-than-demonstrator performance, preferences over demonstrations may be difficult to obtain, and little is known theoretically about when such methods can be expected to successfully extrapolate beyond the performance of the demonstrator. To address these issues, we first contribute a sufficient condition for better-than-demonstrator imitation learning and provide theoretical results showing why preferences over demonstrations can better reduce reward function ambiguity when performing inverse reinforcement learning. Building on this theory, we introduce Disturbance-based Reward Extrapolation (D-REX), a ranking-based imitation learning method that injects noise into a policy learned through behavioral cloning to automatically generate ranked demonstrations. These ranked demonstrations are used to efficiently learn a reward function that can then be optimized using reinforcement learning. We empirically validate our approach on simulated robot and Atari imitation learning benchmarks and show that D-REX outperforms standard imitation learning approaches and can significantly surpass the performance of the demonstrator. D-REX is the first imitation learning approach to achieve significant extrapolation beyond the demonstrator's performance without additional side-information or supervision, such as rewards or human preferences. By generating rankings automatically, we show that preference-based inverse reinforcement learning can be applied in traditional imitation learning settings where only unlabeled demonstrations are available.
△ Less
Submitted 14 October, 2019; v1 submitted 9 July, 2019;
originally announced July 2019.
-
Time-evolving psychological processes over repeated decisions
Authors:
David Gunawan,
Guy E. Hawkins,
Robert Kohn,
Minh-Ngoc Tran,
Scott D. Brown
Abstract:
Many psychological experiments have subjects repeat a task to gain the statistical precision required to test quantitative theories of psychological performance. In such experiments, time-on-task can have sizable effects on performance, changing the psychological processes under investigation. Most research has either ignored these changes, treating the underlying process as static, or sacrificed…
▽ More
Many psychological experiments have subjects repeat a task to gain the statistical precision required to test quantitative theories of psychological performance. In such experiments, time-on-task can have sizable effects on performance, changing the psychological processes under investigation. Most research has either ignored these changes, treating the underlying process as static, or sacrificed some psychological content of the models for statistical simplicity. We use particle Markov chain Monte-Carlo methods to study psychologically plausible time-varying changes in model parameters. Using data from three highly-cited experiments we find strong evidence in favor of a hidden Markov switching process as an explanation of time-varying effects. This embodies the psychological assumption of "regime switching", with subjects alternating between different cognitive states representing different modes of decision-making. The switching model explains key long- and short-term dynamic effects in the data. The central idea of our approach can be applied quite generally to quantitative psychological theories, beyond the models and data sets that we investigate.
△ Less
Submitted 3 November, 2021; v1 submitted 26 June, 2019;
originally announced June 2019.
-
Robustly estimating the marginal likelihood for cognitive models via importance sampling
Authors:
Minh-Ngoc Tran,
Marcel Scharth,
David Gunawan,
Robert Kohn,
Scott D. Brown,
Guy E. Hawkins
Abstract:
Recent advances in Markov chain Monte Carlo (MCMC) extend the scope of Bayesian inference to models for which the likelihood function is intractable. Although these developments allow us to estimate model parameters, other basic problems such as estimating the marginal likelihood, a fundamental tool in Bayesian model selection, remain challenging. This is an important scientific limitation because…
▽ More
Recent advances in Markov chain Monte Carlo (MCMC) extend the scope of Bayesian inference to models for which the likelihood function is intractable. Although these developments allow us to estimate model parameters, other basic problems such as estimating the marginal likelihood, a fundamental tool in Bayesian model selection, remain challenging. This is an important scientific limitation because testing psychological hypotheses with hierarchical models has proven difficult with current model selection methods. We propose an efficient method for estimating the marginal likelihood for models where the likelihood is intractable, but can be estimated unbiasedly. It is based on first running a sampling method such as MCMC to obtain samples for the model parameters, and then using these samples to construct the proposal density in an importance sampling (IS) framework with an unbiased estimate of the likelihood. Our method has several attractive properties: it generates an unbiased estimate of the marginal likelihood, it is robust to the quality and target of the sampling method used to form the IS proposals, and it is computationally cheap to estimate the variance of the marginal likelihood estimator. We also obtain the convergence properties of the method and provide guidelines on maximizing computational efficiency. The method is illustrated in two challenging cases involving hierarchical models: identifying the form of individual differences in an applied choice scenario, and evaluating the best parameterization of a cognitive model in a speeded decision making context. Freely available code to implement the methods is provided. Extensions to posterior moment estimation and parallelization are also discussed.
△ Less
Submitted 11 December, 2019; v1 submitted 14 June, 2019;
originally announced June 2019.
-
Modelling the spatial extent and severity of extreme European windstorms
Authors:
Paul Sharkey,
Jonathan A. Tawn,
Simon J. Brown
Abstract:
Windstorms are a primary natural hazard affecting Europe that are commonly linked to substantial property and infrastructural damage and are responsible for the largest spatially aggregated financial losses. Such extreme winds are typically generated by extratropical cyclone systems originating in the North Atlantic and passing over Europe. Previous statistical studies tend to model extreme winds…
▽ More
Windstorms are a primary natural hazard affecting Europe that are commonly linked to substantial property and infrastructural damage and are responsible for the largest spatially aggregated financial losses. Such extreme winds are typically generated by extratropical cyclone systems originating in the North Atlantic and passing over Europe. Previous statistical studies tend to model extreme winds at a given set of sites, corresponding to inference in a Eulerian framework. Such inference cannot incorporate knowledge of the life cycle and progression of extratropical cyclones across the region and is forced to make restrictive assumptions about the extremal dependence structure. We take an entirely different approach which overcomes these limitations by working in a Lagrangian framework. Specifically, we model the development of windstorms over time, preserving the physical characteristics linking the windstorm and the cyclone track, the path of local vorticity maxima, and make a key finding that the spatial extent of extratropical windstorms becomes more localised as its magnitude increases irrespective of the location of the storm track. Our model allows simulation of synthetic windstorm events to derive the joint distributional features over any set of sites giving physically consistent extrapolations to rarer events. From such simulations improved estimates of this hazard can be achieved both in terms of intensity and area affected.
△ Less
Submitted 7 June, 2019;
originally announced June 2019.
-
A stochastic model for the lifecycle and track of extreme extratropical cyclones in the North Atlantic
Authors:
Paul Sharkey,
Jonathan A. Tawn,
Simon J. Brown
Abstract:
Extratropical cyclones are large-scale weather systems which are often the source of extreme weather events in Northern Europe, often leading to mass infrastructural damage and casualties. Such systems create a local vorticity maxima which tracks across the Atlantic Ocean and from which can be determined a climatology for the region. While there have been considerable advances in develo** algori…
▽ More
Extratropical cyclones are large-scale weather systems which are often the source of extreme weather events in Northern Europe, often leading to mass infrastructural damage and casualties. Such systems create a local vorticity maxima which tracks across the Atlantic Ocean and from which can be determined a climatology for the region. While there have been considerable advances in develo** algorithms for extracting the track and evolution of cyclones from reanalysis datasets, the data record is relatively short. This justifies the need for a statistical model to represent the more extreme characteristics of these weather systems, specifically their intensity and the spatial variability in their tracks. This paper presents a novel simulation-based approach to modelling the lifecycle of extratropical cyclones in terms of both their tracks and vorticity, incorporating various aspects of cyclone evolution and movement. By drawing on methods from extreme value analysis, we can simulate more extreme storms than those observed, representing a useful tool for practitioners concerned with risk assessment with regard to these weather systems.
△ Less
Submitted 21 May, 2019;
originally announced May 2019.
-
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
Authors:
Daniel S. Brown,
Wonjoon Goo,
Prabhat Nagarajan,
Scott Niekum
Abstract:
A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator. This is because IRL typically seeks a reward function that makes the demonstrator appear near-optimal, rather than inferring the underlying intentions of the demonstrator that may have been poorly executed in practice. In this paper, we introduce a novel reward-…
▽ More
A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator. This is because IRL typically seeks a reward function that makes the demonstrator appear near-optimal, rather than inferring the underlying intentions of the demonstrator that may have been poorly executed in practice. In this paper, we introduce a novel reward-learning-from-observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in order to infer high-quality reward functions from a set of potentially poor demonstrations. When combined with deep reinforcement learning, T-REX outperforms state-of-the-art imitation learning and IRL methods on multiple Atari and MuJoCo benchmark tasks and achieves performance that is often more than twice the performance of the best demonstration. We also demonstrate that T-REX is robust to ranking noise and can accurately extrapolate intention by simply watching a learner noisily improve at a task over time.
△ Less
Submitted 8 July, 2019; v1 submitted 12 April, 2019;
originally announced April 2019.
-
Risk-Aware Active Inverse Reinforcement Learning
Authors:
Daniel S. Brown,
Yuchen Cui,
Scott Niekum
Abstract:
Active learning from demonstration allows a robot to query a human for specific types of input to achieve efficient learning. Existing work has explored a variety of active query strategies; however, to our knowledge, none of these strategies directly minimize the performance risk of the policy the robot is learning. Utilizing recent advances in performance bounds for inverse reinforcement learnin…
▽ More
Active learning from demonstration allows a robot to query a human for specific types of input to achieve efficient learning. Existing work has explored a variety of active query strategies; however, to our knowledge, none of these strategies directly minimize the performance risk of the policy the robot is learning. Utilizing recent advances in performance bounds for inverse reinforcement learning, we propose a risk-aware active inverse reinforcement learning algorithm that focuses active queries on areas of the state space with the potential for large generalization error. We show that risk-aware active learning outperforms standard active IRL approaches on gridworld, simulated driving, and table setting tasks, while also providing a performance-based stop** criterion that allows a robot to know when it has received enough demonstrations to safely perform a task.
△ Less
Submitted 3 June, 2019; v1 submitted 8 January, 2019;
originally announced January 2019.
-
New Estimation Approaches for the Hierarchical Linear Ballistic Accumulator Model
Authors:
David Gunawan,
Guy E. Hawkins,
Minh-Ngoc Tran,
Robert Kohn,
Scott Brown
Abstract:
The Linear Ballistic Accumulator (Brown & Heathcote, 2008) model is used as a measurement tool to answer questions about applied psychology. The analyses based on this model depend upon the model selected and its estimated parameters. Modern approaches use hierarchical Bayesian models and Markov chain Monte-Carlo (MCMC) methods to estimate the posterior distribution of the parameters. Although the…
▽ More
The Linear Ballistic Accumulator (Brown & Heathcote, 2008) model is used as a measurement tool to answer questions about applied psychology. The analyses based on this model depend upon the model selected and its estimated parameters. Modern approaches use hierarchical Bayesian models and Markov chain Monte-Carlo (MCMC) methods to estimate the posterior distribution of the parameters. Although there are several approaches available for model selection, they are all based on the posterior samples produced via MCMC, which means that the model selection inference inherits the properties of the MCMC sampler. To improve on current approaches to LBA inference we propose two methods that are based on recent advances in particle MCMC methodology; they are qualitatively different from existing approaches as well as from each other. The first approach is particle Metropolis-within-Gibbs; the second approach is density tempered sequential Monte Carlo. Both new approaches provide very efficient sampling and can be applied to estimate the marginal likelihood, which provides Bayes factors for model selection. The first approach is usually faster. The second approach provides a direct estimate of the marginal likelihood, uses the first approach in its Markov move step and is very efficient to parallelize on high performance computers. The new methods are illustrated by applying them to simulated and real data, and through pseudo code. The code implementing the methods is freely available.
△ Less
Submitted 2 March, 2020; v1 submitted 26 June, 2018;
originally announced June 2018.
-
Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications
Authors:
Daniel S. Brown,
Scott Niekum
Abstract:
Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing for policy improvement and generalization. However, despite much recent interest in IRL, little work has been done to understand the minimum set of demonstrations needed to teach a specific sequential decision-making task. We formalize the problem of finding maximally informative demonstrations for IRL as a…
▽ More
Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing for policy improvement and generalization. However, despite much recent interest in IRL, little work has been done to understand the minimum set of demonstrations needed to teach a specific sequential decision-making task. We formalize the problem of finding maximally informative demonstrations for IRL as a machine teaching problem where the goal is to find the minimum number of demonstrations needed to specify the reward equivalence class of the demonstrator. We extend previous work on algorithmic teaching for sequential decision-making tasks by showing a reduction to the set cover problem which enables an efficient approximation algorithm for determining the set of maximally-informative demonstrations. We apply our proposed machine teaching algorithm to two novel applications: providing a lower bound on the number of queries needed to learn a policy using active IRL and develo** a novel IRL algorithm that can learn more efficiently from informative demonstrations than a standard IRL approach.
△ Less
Submitted 16 August, 2019; v1 submitted 19 May, 2018;
originally announced May 2018.
-
Uncharted Forest a Technique for Exploratory Data Analysis
Authors:
Casey Kneale,
Steven D. Brown
Abstract:
Exploratory data analysis is crucial for develo** and understanding classification models from high-dimensional datasets. We explore the utility of a new unsupervised tree ensemble called uncharted forest for visualizing class associations, sample-sample associations, class heterogeneity, and uninformative classes for provenance studies. The uncharted forest algorithm can be used to partition da…
▽ More
Exploratory data analysis is crucial for develo** and understanding classification models from high-dimensional datasets. We explore the utility of a new unsupervised tree ensemble called uncharted forest for visualizing class associations, sample-sample associations, class heterogeneity, and uninformative classes for provenance studies. The uncharted forest algorithm can be used to partition data using random selections of variables and metrics based on statistical spread. After each tree is grown, a tally of the samples that arrive at every terminal node is maintained. Those tallies are stored in single sample association matrix and a likelihood measure for each sample being partitioned with one another can be made. That matrix may be readily viewed as a heat map, and the probabilities can be quantified via new metrics that account for class or cluster membership. We display the advantages and limitations of using this technique by applying it to two classification datasets and three provenance study datasets. Two of the metrics presented in this paper are also compared with widely used metrics from two algorithms that have variance-based clustering mechanisms.
△ Less
Submitted 30 June, 2018; v1 submitted 11 February, 2018;
originally announced February 2018.
-
Band Target Entropy Minimization and Target Partial Least Squares for Spectral Recovery and Calibration
Authors:
Casey Kneale,
Steven D. Brown
Abstract:
The resolution and calibration of pure spectra of minority components in measurements of chemical mixtures without prior knowledge of the mixture is a challenging problem. In this work, a combination of band target entropy minimization (BTEM) and target partial least squares (T-PLS) was used to obtain estimates for single pure component spectra and to calibrate those estimates in a true, one-at-a-…
▽ More
The resolution and calibration of pure spectra of minority components in measurements of chemical mixtures without prior knowledge of the mixture is a challenging problem. In this work, a combination of band target entropy minimization (BTEM) and target partial least squares (T-PLS) was used to obtain estimates for single pure component spectra and to calibrate those estimates in a true, one-at-a-time fashion. This approach allows for minor components to be targeted and their relative amounts estimated in the presence of other varying components in spectral data. The use of T-PLS estimation is an improvement to the BTEM method because it overcomes the need to identify all of the pure components prior to estimation. Estimated amounts from this combination were found to be similar to those obtained from a standard method, multivariate curve resolution-alternating least squares (MCR-ALS), on a simple, three component mixture dataset. Studies from two experimental datasets demonstrate where the combination of BTEM and T-PLS could model the pure component spectra and obtain concentration profiles of minor components but MCR-ALS could not.
△ Less
Submitted 27 March, 2018; v1 submitted 11 February, 2018;
originally announced February 2018.
-
Small Moving Window Calibration Models for Soft Sensing Processes with Limited History
Authors:
Casey Kneale,
Steven D. Brown
Abstract:
Five simple soft sensor methodologies with two update conditions were compared on two experimentally-obtained datasets and one simulated dataset. The soft sensors investigated were moving window partial least squares regression (and a recursive variant), moving window random forest regression, the mean moving window of $y$, and a novel random forest partial least squares regression ensemble (RF-PL…
▽ More
Five simple soft sensor methodologies with two update conditions were compared on two experimentally-obtained datasets and one simulated dataset. The soft sensors investigated were moving window partial least squares regression (and a recursive variant), moving window random forest regression, the mean moving window of $y$, and a novel random forest partial least squares regression ensemble (RF-PLS), all of which can be used with small sample sizes so that they can be rapidly placed online. It was found that, on two of the datasets studied, small window sizes led to the lowest prediction errors for all of the moving window methods studied. On the majority of datasets studied, the RF-PLS calibration method offered the lowest one-step-ahead prediction errors compared to those of the other methods, and it demonstrated greater predictive stability at larger time delays than moving window PLS alone. It was found that both the random forest and RF-PLS methods most adequately modeled the datasets that did not feature purely monotonic increases in property values, but that both methods performed more poorly than moving window PLS models on one dataset with purely monotonic property values. Other data dependent findings are presented and discussed.
△ Less
Submitted 13 March, 2018; v1 submitted 31 October, 2017;
originally announced October 2017.
-
Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning
Authors:
Daniel S. Brown,
Scott Niekum
Abstract:
In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance bounds in the inverse reinforcement learning setting---where the true reward function is unknown and only samples of expert behavior are given. We propose a sam…
▽ More
In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance bounds in the inverse reinforcement learning setting---where the true reward function is unknown and only samples of expert behavior are given. We propose a sampling method based on Bayesian inverse reinforcement learning that uses demonstrations to determine practical high-confidence upper bounds on the $α$-worst-case difference in expected return between any evaluation policy and the optimal policy under the expert's unknown reward function. We evaluate our proposed bound on both a standard grid navigation task and a simulated driving task and achieve tighter and more accurate bounds than a feature count-based baseline. We also give examples of how our proposed bound can be utilized to perform risk-aware policy selection and risk-aware policy improvement. Because our proposed bound requires several orders of magnitude fewer demonstrations than existing high-confidence bounds, it is the first practical method that allows agents that learn from demonstration to express confidence in the quality of their learned policy.
△ Less
Submitted 22 June, 2018; v1 submitted 3 July, 2017;
originally announced July 2017.
-
A PageRank Model for Player Performance Assessment in Basketball, Soccer and Hockey
Authors:
Shael Brown
Abstract:
In the sports of soccer, hockey and basketball the most commonly used statistics for player performance assessment are divided into two categories: offensive statistics and defensive statistics. However, qualitative assessments of playmaking (for example making "smart" passes) are difficult to quantify. It would be advantageous to have available a single statistic that can emphasize the flow of a…
▽ More
In the sports of soccer, hockey and basketball the most commonly used statistics for player performance assessment are divided into two categories: offensive statistics and defensive statistics. However, qualitative assessments of playmaking (for example making "smart" passes) are difficult to quantify. It would be advantageous to have available a single statistic that can emphasize the flow of a game, rewarding those players who initiate and contribute to successful plays more. In this paper we will examine a model based on Google's PageRank. Other papers have explored ranking teams, coaches, and captains but here we construct ratings and rankings for individual members on both teams that emphasizes initiating and partaking in successful plays and forcing defensive turnovers. For a soccer/hockey/basketball game, our model assigns a node for each of the n players who play in the game and a "goal node". Arcs between player nodes indicate sport-specific situations (including passes, turnovers, scoring, fouls, out-of-bounds, play-stoppages, turnovers, missed shots, defensive plays etc.), tailored for each sport. As well, some additional arcs are added in to ensure that the associated matrix is primitive and hence there is a unique PageRank vector. The PageRank vector of the associated matrix is used to rank the players of the game. To illustrate the model, data was taken from nine NBA games played between 2014-2016. Many of the top-ranked players (in the model) in a given game had some of the most impressive traditional stat-lines. However, from the model there were surprises where some players who had impressive stat-lines had lower ranks, and others who had less impressive stat-lines had higher ranks. Overall, the model's ranking and ratings reflect more the flow of the game compared to traditional sports statistics.
△ Less
Submitted 31 March, 2017;
originally announced April 2017.
-
Predictive Hierarchical Clustering: Learning clusters of CPT codes for improving surgical outcomes
Authors:
Elizabeth C. Lorenzi,
Stephanie L. Brown,
Zhifei Sun,
Katherine Heller
Abstract:
We develop a novel algorithm, Predictive Hierarchical Clustering (PHC), for agglomerative hierarchical clustering of current procedural terminology (CPT) codes. Our predictive hierarchical clustering aims to cluster subgroups, not individual observations, found within our data, such that the clusters discovered result in optimal performance of a classification model. Therefore, merges are chosen b…
▽ More
We develop a novel algorithm, Predictive Hierarchical Clustering (PHC), for agglomerative hierarchical clustering of current procedural terminology (CPT) codes. Our predictive hierarchical clustering aims to cluster subgroups, not individual observations, found within our data, such that the clusters discovered result in optimal performance of a classification model. Therefore, merges are chosen based on a Bayesian hypothesis test, which chooses pairings of the subgroups that result in the best model fit, as measured by held out predictive likelihoods. We place a Dirichlet prior on the probability of merging clusters, allowing us to adjust the size and sparsity of clusters. The motivation is to predict patient-specific surgical outcomes using data from ACS NSQIP (American College of Surgeon's National Surgical Quality Improvement Program). An important predictor of surgical outcomes is the actual surgical procedure performed as described by a CPT code. We use PHC to cluster CPT codes, represented as subgroups, together in a way that enables us to better predict patient-specific outcomes compared to currently used clusters based on clinical judgment.
△ Less
Submitted 1 August, 2017; v1 submitted 24 April, 2016;
originally announced April 2016.