Skip to main content

Showing 1–50 of 50 results for author: Scott, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2307.00190  [pdf

    stat.AP

    Estimands in Real-World Evidence Studies

    Authors: Jie Chen, Daniel Scharfstein, Hongwei Wang, Binbing Yu, Yang Song, Weili He, John Scott, Xiwu Lin, Hana Lee

    Abstract: A Real-World Evidence (RWE) Scientific Working Group (SWG) of the American Statistical Association Biopharmaceutical Section (ASA BIOP) has been reviewing statistical considerations for the generation of RWE to support regulatory decision-making. As part of the effort, the working group is addressing estimands in RWE studies. Constructing the right estimand -- the target of estimation -- which ref… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

  2. arXiv:2209.12316  [pdf, other

    cs.LG stat.AP stat.ML

    Weather2vec: Representation Learning for Causal Inference with Non-Local Confounding in Air Pollution and Climate Studies

    Authors: Mauricio Tec, James Scott, Corwin Zigler

    Abstract: Estimating the causal effects of a spatially-varying intervention on a spatially-varying outcome may be subject to non-local confounding (NLC), a phenomenon that can bias estimates when the treatments and outcomes of a given unit are dictated in part by the covariates of other nearby units. In particular, NLC is a challenge for evaluating the effects of environmental policies and climate events on… ▽ More

    Submitted 11 December, 2022; v1 submitted 25 September, 2022; originally announced September 2022.

    Journal ref: AAAI 2023

  3. arXiv:2110.12461  [pdf, other

    stat.CO stat.ME

    Epidemia: An R Package for Semi-Mechanistic Bayesian Modelling of Infectious Diseases using Point Processes

    Authors: James A. Scott, Axel Gandy, Swapnil Mishra, Samir Bhatt, Seth Flaxman, H. Juliette T. Unwin, Jonathan Ish-Horowicz

    Abstract: This article introduces epidemia, an R package for Bayesian, regression-oriented modeling of infectious diseases. The implemented models define a likelihood for all observed data while also explicitly modeling transmission dynamics: an approach often termed as semi-mechanistic. Infections are propagated over time using renewal equations. This approach is inspired by self-exciting, continuous-time… ▽ More

    Submitted 24 October, 2021; originally announced October 2021.

  4. arXiv:2109.09104  [pdf, other

    stat.ME

    Approximate Conditional Sampling for Pattern Detection in Weighted Networks

    Authors: James A. Scott, Axel Gandy

    Abstract: Assessing the statistical significance of network patterns is crucial for understanding whether such patterns indicate the presence of interesting network phenomena, or whether they simply result from less interesting processes, such as nodal-heterogeneity. Typically, significance is computed with reference to a null model. While there has been extensive research into such null models for unweight… ▽ More

    Submitted 19 September, 2021; originally announced September 2021.

  5. arXiv:2012.00394  [pdf, other

    stat.AP stat.ME

    Semi-Mechanistic Bayesian Modeling of COVID-19 with Renewal Processes

    Authors: Samir Bhatt, Neil Ferguson, Seth Flaxman, Axel Gandy, Swapnil Mishra, James A. Scott

    Abstract: We propose a general Bayesian approach to modeling epidemics such as COVID-19. The approach grew out of specific analyses conducted during the pandemic, in particular an analysis concerning the effects of non-pharmaceutical interventions (NPIs) in reducing COVID-19 transmission in 11 European countries. The model parameterizes the time varying reproduction number $R_t$ through a regression framewo… ▽ More

    Submitted 29 December, 2020; v1 submitted 1 December, 2020; originally announced December 2020.

  6. arXiv:2002.10438  [pdf, other

    cs.LG stat.ML

    xAI-GAN: Enhancing Generative Adversarial Networks via Explainable AI Systems

    Authors: Vineel Nagisetty, Laura Graves, Joseph Scott, Vijay Ganesh

    Abstract: Generative Adversarial Networks (GANs) are a revolutionary class of Deep Neural Networks (DNNs) that have been successfully used to generate realistic images, music, text, and other data. However, GAN training presents many challenges, notably it can be very resource-intensive. A potential weakness in GANs is that it requires a lot of data for successful training and data collection can be an expe… ▽ More

    Submitted 29 March, 2022; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: 7 pages (+ 2 page for reference)

  7. arXiv:2001.06465  [pdf, ps, other

    stat.ME stat.CO

    Unit Testing for MCMC and other Monte Carlo Methods

    Authors: Axel Gandy, James Scott

    Abstract: We propose approaches for testing implementations of Markov Chain Monte Carlo methods as well as of general Monte Carlo methods. Based on statistical hypothesis tests, these approaches can be used in a unit testing framework to, for example, check if individual steps in a Gibbs sampler or a reversible jump MCMC have the desired invariant distribution. Two exact tests for assessing whether a given… ▽ More

    Submitted 19 September, 2021; v1 submitted 17 January, 2020; originally announced January 2020.

  8. arXiv:1912.06946  [pdf, other

    stat.AP

    Monotone function estimation in the presence of extreme data coarsening: Analysis of preeclampsia and birth weight in urban Uganda

    Authors: Jennifer E. Starling, Catherine E. Aiken, Jared S. Murray, Annettee Nakimuli, James G. Scott

    Abstract: This paper proposes a Bayesian hierarchical model to characterize the relationship between birth weight and maternal pre-eclampsia across gestation at a large maternity hospital in urban Uganda. Key scientific questions we investigate include: 1) how pre-eclampsia compares to other maternal-fetal covariates as a predictor of birth weight; and 2) whether the impact of pre-eclampsia on birthweight v… ▽ More

    Submitted 14 December, 2019; originally announced December 2019.

  9. arXiv:1911.08106  [pdf, other

    stat.AP

    How Likely are Ride-share Drivers to Earn a Living Wage? Large-scale Spatio-temporal Density Smoothing with the Graph-fused Elastic Net

    Authors: Mauricio Tec, Natalia Zuniga-Garcia, Randy B. Machemehl, James G. Scott

    Abstract: Ride-sourcing or transportation network companies (TNCs) provide on-demand transportation service for compensation, connecting drivers of personal vehicles with passengers through smartphone applications. In this study, we consider the problem of estimating a spatiotemporally varying probability distribution for the productivity of a TNC driver, using data on more than 1.2 million TNC trips in Aus… ▽ More

    Submitted 9 July, 2021; v1 submitted 19 November, 2019; originally announced November 2019.

  10. arXiv:1911.07553  [pdf, other

    stat.ME

    A projection approach for multiple monotone regression

    Authors: Lizhen Lin, Brian St. Thomas, Walter W. Piegorsch, James Scott, Carlos Carvalho

    Abstract: Shape-constrained inference has wide applicability in bioassay, medicine, economics, risk assessment, and many other fields. Although there has been a large amount of work on monotone-constrained univariate curve estimation, multivariate shape-constrained problems are much more challenging, and fewer advances have been made in this direction. With a focus on monotone regression with multiple predi… ▽ More

    Submitted 18 November, 2019; originally announced November 2019.

  11. arXiv:1905.09405  [pdf, other

    stat.AP

    Targeted Smooth Bayesian Causal Forests: An analysis of heterogeneous treatment effects for simultaneous versus interval medical abortion regimens over gestation

    Authors: Jennifer E. Starling, Jared S. Murray, Patricia A. Lohr, Abigail R. A. Aiken, Carlos M. Carvalho, James G. Scott

    Abstract: We introduce Targeted Smooth Bayesian Causal Forests (tsBCF), a nonparametric Bayesian approach for estimating heterogeneous treatment effects which vary smoothly over a single covariate in the observational data setting. The tsBCF method induces smoothness by parameterizing terminal tree nodes with smooth functions, and allows for separate regularization of treatment effects versus prognostic eff… ▽ More

    Submitted 23 February, 2020; v1 submitted 22 May, 2019; originally announced May 2019.

  12. arXiv:1812.04567  [pdf, other

    stat.AP

    A flat persistence diagram for improved visualization of persistent homology

    Authors: Raoul R. Wadhwa, Andrew Dhawan, Drew F. K. Williamson, Jacob G. Scott

    Abstract: Visualization in the emerging field of topological data analysis has progressed from persistence barcodes and persistence diagrams to display of two-parameter persistent homology. Although persistence barcodes and diagrams have permitted insight into the geometry underlying complex datasets, visualization of even single-parameter persistent homology has significant room for improvement. Here, we p… ▽ More

    Submitted 5 January, 2019; v1 submitted 11 December, 2018; originally announced December 2018.

    Comments: 4 pages, 2 figures

  13. Optimal post-selection inference for sparse signals: a nonparametric empirical-Bayes approach

    Authors: Spencer Woody, Oscar Hernan Madrid Padilla, James G. Scott

    Abstract: Many recently developed Bayesian methods have focused on sparse signal detection. However, much less work has been done addressing the natural follow-up question: how to make valid inferences for the magnitude of those signals after selection. Ordinary Bayesian credible intervals suffer from selection bias, owing to the fact that the target of inference is chosen adaptively. Existing Bayesian appr… ▽ More

    Submitted 13 November, 2020; v1 submitted 25 October, 2018; originally announced October 2018.

  14. arXiv:1809.10329  [pdf, other

    stat.AP

    Evaluation of Ride-Sourcing Search Frictions and Driver Productivity: A Spatial Denoising Approach

    Authors: Natalia Zuniga-Garcia, Mauricio Tec, James G. Scott, Natalia Ruiz-Juri, Randy B. Machemehl

    Abstract: This paper considers the problem of measuring spatial and temporal variation in driver productivity on ride-sourcing trips. This variation is especially important from a driver's perspective: if a platform's drivers experience systematic disparities in earnings because of variation in their riders' destinations, they may perceive the pricing model as inequitable. This perception can exacerbate sea… ▽ More

    Submitted 11 October, 2019; v1 submitted 26 September, 2018; originally announced September 2018.

    Comments: 34 pages

  15. arXiv:1809.06758  [pdf, other

    stat.ME stat.CO

    State-Dependent Kernel Selection for Conditional Sampling of Graphs

    Authors: James Scott, Axel Gandy

    Abstract: This paper introduces new efficient algorithms for two problems: sampling conditional on vertex degrees in unweighted graphs, and sampling conditional on vertex strengths in weighted graphs. The algorithms can sample conditional on the presence or absence of an arbitrary number of edges. The resulting conditional distributions provide the basis for exact tests. Existing samplers based on MCMC or s… ▽ More

    Submitted 18 September, 2018; originally announced September 2018.

    Comments: Package implementing the samplers can be found at https://github.com/jscott6/cgsampr

  16. arXiv:1805.07656  [pdf, other

    stat.ME

    BART with Targeted Smoothing: An analysis of patient-specific stillbirth risk

    Authors: Jennifer E. Starling, Jared S. Murray, Carlos M. Carvalho, Radek K. Bukowski, James G. Scott

    Abstract: This article introduces BART with Targeted Smoothing, or tsBART, a new Bayesian tree-based model for nonparametric regression. The goal of tsBART is to introduce smoothness over a single target covariate t, while not necessarily requiring smoothness over other covariates x. TsBART is based on the Bayesian Additive Regression Trees (BART) model, an ensemble of regression trees. TsBART extends BART… ▽ More

    Submitted 3 June, 2019; v1 submitted 19 May, 2018; originally announced May 2018.

  17. arXiv:1804.00327  [pdf, other

    stat.AP q-bio.PE

    Socioeconomic bias in influenza surveillance

    Authors: Samuel V. Scarpino, James G. Scott, Rosalind M. Eggo, Bruce Clements, Nedialko B. Dimitrov, Lauren Ancel Meyers

    Abstract: Individuals in low socioeconomic brackets are considered at-risk for develo** influenza-related complications and often exhibit higher than average influenza-related hospitalization rates. This disparity has been attributed to various factors, including restricted access to preventative and therapeutic health care, limited sick leave, and household structure. Adequate influenza surveillance in t… ▽ More

    Submitted 1 April, 2018; originally announced April 2018.

  18. arXiv:1708.01947  [pdf, other

    stat.ML

    Interpretable Low-Dimensional Regression via Data-Adaptive Smoothing

    Authors: Wesley Tansey, Jesse Thomason, James G. Scott

    Abstract: We consider the problem of estimating a regression function in the common situation where the number of features is small, where interpretability of the model is a high priority, and where simple linear or additive models fail to provide adequate performance. To address this problem, we present Maximum Variance Total Variation denoising (MVTV), an approach that is conceptually related both to CART… ▽ More

    Submitted 6 August, 2017; originally announced August 2017.

    Comments: 4 pages, 1 figure presented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia

  19. arXiv:1702.07405  [pdf, other

    stat.ML

    GapTV: Accurate and Interpretable Low-Dimensional Regression and Classification

    Authors: Wesley Tansey, James G. Scott

    Abstract: We consider the problem of estimating a regression function in the common situation where the number of features is small, where interpretability of the model is a high priority, and where simple linear or additive models fail to provide adequate performance. To address this problem, we present GapTV, an approach that is conceptually related both to CART and to the more recent CRISP algorithm, a s… ▽ More

    Submitted 23 February, 2017; originally announced February 2017.

  20. arXiv:1702.07398  [pdf, other

    stat.ML

    Deep Nonparametric Estimation of Discrete Conditional Distributions via Smoothed Dyadic Partitioning

    Authors: Wesley Tansey, Karl Pichotta, James G. Scott

    Abstract: We present an approach to deep estimation of discrete conditional probability distributions. Such models have several applications, including generative modeling of audio, image, and video data. Our approach combines two main techniques: dyadic partitioning and graph-based smoothing of the discrete space. By recursively decomposing each dimension into a series of binary splits and smoothing over t… ▽ More

    Submitted 28 February, 2017; v1 submitted 23 February, 2017; originally announced February 2017.

  21. arXiv:1612.07867  [pdf, other

    stat.ME

    Sequential nonparametric tests for a change in distribution: an application to detecting radiological anomalies

    Authors: Oscar Hernan Madrid Padilla, Alex Athey, Alex Reinhart, James G. Scott

    Abstract: We propose a sequential nonparametric test for detecting a change in distribution, based on windowed Kolmogorov--Smirnov statistics. The approach is simple, robust, highly computationally efficient, easy to calibrate, and requires no parametric assumptions about the underlying null and alternative distributions. We show that both the false-alarm rate and the power of our procedure are amenable to… ▽ More

    Submitted 22 December, 2016; originally announced December 2016.

  22. arXiv:1612.00388  [pdf, other

    stat.ML cs.LG stat.AP

    Diet2Vec: Multi-scale analysis of massive dietary data

    Authors: Wesley Tansey, Edward W. Lowe Jr., James G. Scott

    Abstract: Smart phone apps that enable users to easily track their diets have become widespread in the last decade. This has created an opportunity to discover new insights into obesity and weight loss by analyzing the eating habits of the users of such apps. In this paper, we present diet2vec: an approach to modeling latent structure in a massive database of electronic diet journals. Through an iterative c… ▽ More

    Submitted 1 December, 2016; originally announced December 2016.

    Comments: Accepted to the NIPS 2016 Workshop on Machine Learning for Health

  23. arXiv:1606.02321  [pdf, other

    stat.ML

    Better Conditional Density Estimation for Neural Networks

    Authors: Wesley Tansey, Karl Pichotta, James G. Scott

    Abstract: The vast majority of the neural network literature focuses on predicting point values for a given set of response variables, conditioned on a feature vector. In many cases we need to model the full joint conditional distribution over the response variables rather than simply making point predictions. In this paper, we present two novel approaches to such conditional density estimation (CDE): Multi… ▽ More

    Submitted 7 June, 2016; originally announced June 2016.

    Comments: 12 pages, 3 figures, code available soon

  24. arXiv:1511.06750  [pdf, other

    stat.ME

    A deconvolution path for mixtures

    Authors: Oscar Hernan Madrid Padilla, Nicholas G. Polson, James G. Scott

    Abstract: We propose a class of estimators for deconvolution in mixture models based on a simple two-step "bin-and-smooth" procedure applied to histogram counts. The method is both statistically and computationally efficient: by exploiting recent advances in convex optimization, we are able to provide a full deconvolution path that shows the estimate for the mixing distribution across a range of plausible d… ▽ More

    Submitted 25 May, 2017; v1 submitted 20 November, 2015; originally announced November 2015.

    Journal ref: Electronic Journal of Statistics Volume 12, Number 1 (2018), 1717-1751

  25. arXiv:1509.04348  [pdf, other

    stat.ME

    Nonparametric density estimation by histogram trend filtering

    Authors: Oscar Hernan Madrid Padilla, James G. Scott

    Abstract: We propose a novel approach for density estimation called histogram trend filtering. Our estimator arises from looking at surrogate Poisson model for counts of observations in a partition of the support of the data. We begin by showing consistency for a variational estimator for this density estimation problem. We then study a discrete estimator that can be efficiently found via convex optimizatio… ▽ More

    Submitted 6 February, 2016; v1 submitted 14 September, 2015; originally announced September 2015.

  26. arXiv:1507.07271  [pdf, other

    stat.ME physics.data-an stat.AP

    Multiscale spatial density smoothing: an application to large-scale radiological survey and anomaly detection

    Authors: Wesley Tansey, Alex Athey, Alex Reinhart, James G. Scott

    Abstract: We consider the problem of estimating a spatially varying density function, motivated by problems that arise in large-scale radiological survey and anomaly detection. In this context, the density functions to be estimated are the background gamma-ray energy spectra at sites spread across a large geographical area, such as nuclear production and waste-storage sites, military bases, medical faciliti… ▽ More

    Submitted 16 September, 2016; v1 submitted 26 July, 2015; originally announced July 2015.

    Comments: 36 pages, 10 figures

    Journal ref: Journal of the American Statistical Association, vol. 112 no. 519 (2017), pp. 1047-1063

  27. arXiv:1505.06475  [pdf, other

    stat.ML stat.CO

    A Fast and Flexible Algorithm for the Graph-Fused Lasso

    Authors: Wesley Tansey, James G. Scott

    Abstract: We propose a new algorithm for solving the graph-fused lasso (GFL), a method for parameter estimation that operates under the assumption that the signal tends to be locally constant over a predefined graph structure. Our key insight is to decompose the graph into a set of trails which can then each be solved efficiently using techniques for the ordinary (1D) fused lasso. We leverage these trails i… ▽ More

    Submitted 1 June, 2015; v1 submitted 24 May, 2015; originally announced May 2015.

    Comments: 16 pages, 6 figures

  28. arXiv:1502.06930  [pdf, ps, other

    stat.ME stat.CO stat.ML

    Tensor decomposition with generalized lasso penalties

    Authors: Oscar Hernan Madrid Padilla, James G. Scott

    Abstract: We present an approach for penalized tensor decomposition (PTD) that estimates smoothly varying latent factors in multi-way data. This generalizes existing work on sparse tensor decomposition and penalized matrix decompositions, in a manner parallel to the generalized lasso for regression and smoothing problems. Our approach presents many nontrivial challenges at the intersection of modeling and c… ▽ More

    Submitted 12 May, 2016; v1 submitted 24 February, 2015; originally announced February 2015.

  29. arXiv:1502.03175  [pdf, other

    stat.ML cs.LG stat.ME

    Proximal Algorithms in Statistics and Machine Learning

    Authors: Nicholas G. Polson, James G. Scott, Brandon T. Willard

    Abstract: In this paper we develop proximal methods for statistical learning. Proximal point algorithms are useful in statistics and machine learning for obtaining optimization solutions for composite functions. Our approach exploits closed-form solutions of proximal operators and envelope representations based on the Moreau, Forward-Backward, Douglas-Rachford and Half-Quadratic envelopes. Envelope represen… ▽ More

    Submitted 30 May, 2015; v1 submitted 10 February, 2015; originally announced February 2015.

  30. arXiv:1411.6144  [pdf, other

    stat.ME stat.AP stat.CO

    False discovery rate smoothing

    Authors: Wesley Tansey, Oluwasanmi Koyejo, Russell A. Poldrack, James G. Scott

    Abstract: We present false discovery rate smoothing, an empirical-Bayes method for exploiting spatial structure in large multiple-testing problems. FDR smoothing automatically finds spatially localized regions of significant test statistics. It then relaxes the threshold of statistical significance within these regions, and tightens it elsewhere, in a manner that controls the overall false-discovery rate at… ▽ More

    Submitted 14 November, 2016; v1 submitted 22 November, 2014; originally announced November 2014.

    Comments: Added misspecification analysis, added pathological scenario discussions, additional comparisons, new graph fused lasso algorithm

  31. arXiv:1409.3601  [pdf, other

    stat.CO math.ST

    Vertical-likelihood Monte Carlo

    Authors: Nicholas G. Polson, James G. Scott

    Abstract: In this review, we address the use of Monte Carlo methods for approximating definite integrals of the form $Z = \int L(x) d P(x)$, where $L$ is a target function (often a likelihood) and $P$ a finite measure. We present vertical-likelihood Monte Carlo, which is an approach for designing the importance function $g(x)$ used in importance sampling. Our approach exploits a duality between two random v… ▽ More

    Submitted 23 June, 2015; v1 submitted 11 September, 2014; originally announced September 2014.

  32. arXiv:1406.0177  [pdf, other

    stat.ME

    Mixtures, envelopes, and hierarchical duality

    Authors: Nicholas G. Polson, James G. Scott

    Abstract: We develop a connection between mixture and envelope representations of objective functions that arise frequently in statistics. We refer to this connection using the term "hierarchical duality." Our results suggest an interesting and previously under-exploited relationship between marginalization and profiling, or equivalently between the Fenchel--Moreau theorem for convex functions and the Berns… ▽ More

    Submitted 22 February, 2015; v1 submitted 1 June, 2014; originally announced June 2014.

  33. arXiv:1405.0506  [pdf, other

    stat.CO

    Sampling Polya-Gamma random variates: alternate and approximate techniques

    Authors: Jesse Windle, Nicholas G. Polson, James G. Scott

    Abstract: Efficiently sampling from the Pólya-Gamma distribution, ${PG}(b,z)$, is an essential element of Pólya-Gamma data augmentation. Polson et. al (2013) show how to efficiently sample from the ${PG}(1,z)$ distribution. We build two new samplers that offer improved performance when sampling from the ${PG}(b,z)$ distribution and $b$ is not unity.

    Submitted 2 May, 2014; originally announced May 2014.

  34. arXiv:1404.3331  [pdf, other

    stat.ME stat.ML

    Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes

    Authors: Mingyuan Zhou, Oscar Hernan Madrid Padilla, James G. Scott

    Abstract: We define a family of probability distributions for random count matrices with a potentially unbounded number of rows and columns. The three distributions we consider are derived from the gamma-Poisson, gamma-negative binomial, and beta-negative binomial processes. Because the models lead to closed-form Gibbs sampling update equations, they are natural candidates for nonparametric Bayesian priors… ▽ More

    Submitted 13 July, 2015; v1 submitted 12 April, 2014; originally announced April 2014.

    Comments: To appear in Journal of the American Statistical Association (Theory and Methods). 31 pages + 11 page supplement, 5 figures

  35. arXiv:1308.0774  [pdf, other

    stat.CO

    Efficient Data Augmentation in Dynamic Models for Binary and Count Data

    Authors: Jesse Windle, Carlos M. Carvalho, James G. Scott, Liang Sun

    Abstract: Dynamic linear models with Gaussian observations and Gaussian states lead to closed-form formulas for posterior simulation. However, these closed-form formulas break down when the response or state evolution ceases to be Gaussian. Dynamic, generalized linear models exemplify a class of models for which this is the case, and include, amongst other models, dynamic binomial logistic regression and dy… ▽ More

    Submitted 19 September, 2013; v1 submitted 3 August, 2013; originally announced August 2013.

    Comments: 22 Pages, 1 figure, 1 tables

  36. arXiv:1307.3495  [pdf, other

    stat.ME stat.AP

    False discovery rate regression: an application to neural synchrony detection in primary visual cortex

    Authors: James G. Scott, Ryan C. Kelly, Matthew A. Smith, Pengcheng Zhou, Robert E. Kass

    Abstract: Many approaches for multiple testing begin with the assumption that all tests in a given study should be combined into a global false-discovery-rate analysis. But this may be inappropriate for many of today's large-scale screening problems, where auxiliary information about each test is often available, and where a combined analysis can lead to poorly calibrated error rates within different subset… ▽ More

    Submitted 8 June, 2014; v1 submitted 12 July, 2013; originally announced July 2013.

  37. arXiv:1306.0040  [pdf, other

    stat.CO math.ST stat.ML

    Expectation-maximization for logistic regression

    Authors: James G. Scott, Liang Sun

    Abstract: We present a family of expectation-maximization (EM) algorithms for binary and negative-binomial logistic regression, drawing a sharp connection with the variational-Bayes algorithm of Jaakkola and Jordan (2000). Indeed, our results allow a version of this variational-Bayes approach to be re-interpreted as a true EM algorithm. We study several interesting features of the algorithm, and of this pre… ▽ More

    Submitted 31 May, 2013; originally announced June 2013.

  38. arXiv:1304.3378  [pdf, other

    stat.ME math.ST

    Nonparametric Bayesian testing for monotonicity

    Authors: James G. Scott, Thomas S. Shively, Stephen G. Walker

    Abstract: This paper studies the problem of testing whether a function is monotone from a nonparametric Bayesian perspective. Two new families of tests are constructed. The first uses constrained smoothing splines, together with a hierarchical stochastic-process prior that explicitly controls the prior probability of monotonicity. The second uses regression splines, together with two proposals for the prior… ▽ More

    Submitted 1 June, 2014; v1 submitted 11 April, 2013; originally announced April 2013.

  39. arXiv:1205.0310  [pdf, other

    stat.ME stat.CO stat.ML

    Bayesian inference for logistic models using Polya-Gamma latent variables

    Authors: Nicholas G. Polson, James G. Scott, Jesse Windle

    Abstract: We propose a new data-augmentation strategy for fully Bayesian inference in models with binomial likelihoods. The approach appeals to a new class of Polya-Gamma distributions, which are constructed in detail. A variety of examples are presented to show the versatility of the method, including logistic regression, negative binomial regression, nonlinear mixed-effects models, and spatial models for… ▽ More

    Submitted 22 July, 2013; v1 submitted 1 May, 2012; originally announced May 2012.

  40. arXiv:1111.0617  [pdf, other

    stat.AP

    The partition problem: case studies in Bayesian screening for time-varying model structure

    Authors: Zesong Liu, Jesse Windle, James G. Scott

    Abstract: This paper presents two case studies of data sets where the main inferential goal is to characterize time-varying patterns in model structure. Both of these examples are seen to be general cases of the so-called "partition problem," where auxiliary information (in this case, time) defines a partition over sample space, and where different models hold for each element of the partition. In the first… ▽ More

    Submitted 2 November, 2011; originally announced November 2011.

  41. arXiv:1110.5789  [pdf, other

    q-fin.ST stat.AP

    An empirical test for Eurozone contagion using an asset-pricing model with heavy-tailed stochastic volatility

    Authors: Nicholas G. Polson, James G. Scott

    Abstract: This paper proposes an empirical test of financial contagion in European equity markets during the tumultuous period of 2008-2011. Our analysis shows that traditional GARCH and Gaussian stochastic-volatility models are unable to explain two key stylized features of global markets during presumptive contagion periods: shocks to aggregate market volatility can be sudden and explosive, and they are a… ▽ More

    Submitted 26 March, 2012; v1 submitted 26 October, 2011; originally announced October 2011.

  42. arXiv:1109.4180  [pdf, other

    stat.ME math.ST stat.CO

    Default Bayesian analysis for multi-way tables: a data-augmentation approach

    Authors: Nicholas G. Polson, James G. Scott

    Abstract: This paper proposes a strategy for regularized estimation in multi-way contingency tables, which are common in meta-analyses and multi-center clinical trials. Our approach is based on data augmentation, and appeals heavily to a novel class of Polya-Gamma distributions. Our main contributions are to build up the relevant distributional theory and to demonstrate three useful features of this data-au… ▽ More

    Submitted 19 September, 2011; originally announced September 2011.

  43. arXiv:1109.2279  [pdf, other

    stat.ME stat.CO stat.ML

    The Bayesian Bridge

    Authors: Nicholas G. Polson, James G. Scott, Jesse Windle

    Abstract: We propose the Bayesian bridge estimator for regularized regression and classification. Two key mixture representations for the Bayesian bridge model are developed: (1) a scale mixture of normals with respect to an alpha-stable random variable; and (2) a mixture of Bartlett--Fejer kernels (or triangle densities) with respect to a two-component mixture of gamma random variables. Both lead to MCMC m… ▽ More

    Submitted 27 October, 2012; v1 submitted 11 September, 2011; originally announced September 2011.

    Comments: Supplemental files are available from the second author's website

  44. arXiv:1104.4937  [pdf, other

    stat.ME

    On the half-Cauchy prior for a global scale parameter

    Authors: Nicholas G. Polson, James G. Scott

    Abstract: This paper argues that the half-Cauchy distribution should replace the inverse-Gamma distribution as a default prior for a top-level scale parameter in Bayesian hierarchical models, at least for cases where a proper prior is necessary. Our arguments involve a blend of Bayesian and frequentist reasoning, and are intended to complement the original case made by Gelman (2006) in support of the folded… ▽ More

    Submitted 24 September, 2011; v1 submitted 26 April, 2011; originally announced April 2011.

  45. arXiv:1103.5407  [pdf, other

    stat.ME stat.CO

    Data augmentation for non-Gaussian regression models using variance-mean mixtures

    Authors: Nicholas G. Polson, James G. Scott

    Abstract: We use the theory of normal variance-mean mixtures to derive a data-augmentation scheme for a class of common regularization problems. This generalizes existing theory on normal variance mixtures for priors in regression and classification. It also allows variants of the expectation-maximization algorithm to be brought to bear on a wider range of models than previously appreciated. We demonstrate… ▽ More

    Submitted 22 September, 2012; v1 submitted 28 March, 2011; originally announced March 2011.

    Comments: Added a discussion of quasi-Newton acceleration

  46. arXiv:1010.5265  [pdf, other

    stat.CO stat.ME

    Parameter expansion in local-shrinkage models

    Authors: James G. Scott

    Abstract: This paper considers the problem of using MCMC to fit sparse Bayesian models based on normal scale-mixture priors. Examples of this framework include the Bayesian LASSO and the horseshoe prior. We study the usefulness of parameter expansion (PX) for improving convergence in such models, which is notoriously slow when the global variance component is near zero. Our conclusion is that parameter expa… ▽ More

    Submitted 25 October, 2010; originally announced October 2010.

  47. arXiv:1010.5223  [pdf, ps, other

    stat.ME stat.AP

    Good, great, or lucky? Screening for firms with sustained superior performance using heavy-tailed priors

    Authors: Nicholas G. Polson, James G. Scott

    Abstract: This paper examines historical patterns of ROA (return on assets) for a cohort of 53,038 publicly traded firms across 93 countries, measured over the past 45 years. Our goal is to screen for firms whose ROA trajectories suggest that they have systematically outperformed their peer groups over time. Such a project faces at least three statistical difficulties: adjustment for relevant covariates, ma… ▽ More

    Submitted 22 March, 2012; v1 submitted 25 October, 2010; originally announced October 2010.

    Comments: Published in at http://dx.doi.org/10.1214/11-AOAS512 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS512

    Journal ref: Annals of Applied Statistics 2012, Vol. 6, No. 1, 161-185

  48. arXiv:1010.3390  [pdf, other

    stat.ME math.ST

    Local shrinkage rules, Levy processes, and regularized regression

    Authors: Nicholas G. Polson, James G. Scott

    Abstract: We use Levy processes to generate joint prior distributions, and therefore penalty functions, for a location parameter as p grows large. This generalizes the class of local-global shrinkage rules based on scale mixtures of normals, illuminates new connections among disparate methods, and leads to new results for computing posterior means and modes under a wide class of priors. We extend this frame… ▽ More

    Submitted 23 April, 2011; v1 submitted 16 October, 2010; originally announced October 2010.

  49. Nonparametric Bayesian multiple testing for longitudinal performance stratification

    Authors: James G. Scott

    Abstract: This paper describes a framework for flexible multiple hypothesis testing of autoregressive time series. The modeling approach is Bayesian, though a blend of frequentist and Bayesian reasoning is used to evaluate procedures. Nonparametric characterizations of both the null and alternative hypotheses will be shown to be the key robustification step necessary to ensure reasonable Type-I error perfor… ▽ More

    Submitted 29 September, 2010; originally announced September 2010.

    Comments: Published in at http://dx.doi.org/10.1214/09-AOAS252 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS252

    Journal ref: Annals of Applied Statistics 2009, Vol. 3, No. 4, 1655-1674

  50. arXiv:0911.1768  [pdf, other

    stat.ME stat.AP

    Benchmarking Historical Corporate Performance

    Authors: James G. Scott

    Abstract: This paper uses Bayesian tree models for statistical benchmarking in data sets with awkward marginals and complicated dependence structures. The method is applied to a very large database on corporate performance over the last four decades. The results of this study provide a formal basis for making cross-peer-group comparisons among companies in very different industries and operating environment… ▽ More

    Submitted 25 October, 2010; v1 submitted 9 November, 2009; originally announced November 2009.