Search | arXiv e-print repository

Empirical Macroeconomics and DSGE Modeling in Statistical Perspective

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi

Abstract: Dynamic stochastic general equilibrium (DSGE) models have been an ubiquitous, and controversial, part of macroeconomics for decades. In this paper, we approach DSGEs purely as statstical models. We do this by applying two common model validation checks to the canonical Smets and Wouters 2007 DSGE: (1) we simulate the model and see how well it can be estimated from its own simulation output, and (2… ▽ More Dynamic stochastic general equilibrium (DSGE) models have been an ubiquitous, and controversial, part of macroeconomics for decades. In this paper, we approach DSGEs purely as statstical models. We do this by applying two common model validation checks to the canonical Smets and Wouters 2007 DSGE: (1) we simulate the model and see how well it can be estimated from its own simulation output, and (2) we see how well it can seem to fit nonsense data. We find that (1) even with centuries' worth of data, the model remains poorly estimated, and (2) when we swap series at random, so that (e.g.) what the model gets as the inflation rate is really hours worked, what it gets as hours worked is really investment, etc., the fit is often only slightly impaired, and in a large percentage of cases actually improves (even out of sample). Taken together, these findings cast serious doubt on the meaningfulness of parameter estimates for this DSGE, and on whether this specification represents anything structural about the economy. Constructively, our approaches can be used for model validation by anyone working with macroeconomic time series. △ Less

Submitted 31 October, 2022; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: 36 pages, 21 figures, 7 tables

arXiv:2205.13698 [pdf, other]

Characterizing the robustness of Bayesian adaptive experimental designs to active learning bias

Authors: Sabina J. Sloman, Daniel M. Oppenheimer, Stephen B. Broomell, Cosma Rohilla Shalizi

Abstract: Bayesian adaptive experimental design is a form of active learning, which chooses samples to maximize the information they give about uncertain parameters. Prior work has shown that other forms of active learning can suffer from active learning bias, where unrepresentative sampling leads to inconsistent parameter estimates. We show that active learning bias can also afflict Bayesian adaptive exper… ▽ More Bayesian adaptive experimental design is a form of active learning, which chooses samples to maximize the information they give about uncertain parameters. Prior work has shown that other forms of active learning can suffer from active learning bias, where unrepresentative sampling leads to inconsistent parameter estimates. We show that active learning bias can also afflict Bayesian adaptive experimental design, depending on model misspecification. We analyze the case of estimating a linear model, and show that worse misspecification implies more severe active learning bias. At the same time, model classes incorporating more "noise" - i.e., specifying higher inherent variance in observations - suffer less from active learning bias. Finally, we demonstrate empirically that insights from the linear model can predict the presence and degree of active learning bias in nonlinear contexts, namely in a (simulated) preference learning experiment. △ Less

Submitted 28 November, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

arXiv:2203.09077 [pdf, other]

Evaluating Posterior Distributions by Selectively Breeding Prior Samples

Authors: Cosma Rohilla Shalizi

Abstract: Using Markov chain Monte Carlo to sample from posterior distributions was the key innovation which made Bayesian data analysis practical. Notoriously, however, MCMC is hard to tune, hard to diagnose, and hard to parallelize. This pedagogical note explores variants on a universal {\em non}-Markov-chain Monte Carlo scheme for sampling from posterior distributions. The basic idea is to draw parameter… ▽ More Using Markov chain Monte Carlo to sample from posterior distributions was the key innovation which made Bayesian data analysis practical. Notoriously, however, MCMC is hard to tune, hard to diagnose, and hard to parallelize. This pedagogical note explores variants on a universal {\em non}-Markov-chain Monte Carlo scheme for sampling from posterior distributions. The basic idea is to draw parameter values from the prior distributions, evaluate the likelihood of each draw, and then copy that draw a number of times proportional to its likelihood. The distribution after copying is an approximation to the posterior which becomes exact as the number of initial samples goes to infinity; the convergence of the approximation is easily analyzed, and is uniform over Glivenko-Cantelli classes. While not {\em entirely} practical, the schemes are straightforward to implement (a few lines of R), easily parallelized, and require no rejection, burn-in, convergence diagnostics, or tuning of any control settings. I provide references to the prior art which deals with some of the practical obstacles, at some cost in computational and analytical simplicity. △ Less

Submitted 17 March, 2022; originally announced March 2022.

Comments: 16 pages, 2 figures, code included in text

arXiv:2111.09220 [pdf, other]

A Note on Simulation-Based Inference by Matching Random Features

Authors: Cosma Rohilla Shalizi

Abstract: We can, and should, do statistical inference on simulation models by adjusting the parameters in the simulation so that the values of {\em randomly chosen} functions of the simulation output match the values of those same functions calculated on the data. Results from the "state-space reconstruction" or "geometry from a time series'' literature in nonlinear dynamics indicate that just $2d+1$ such… ▽ More We can, and should, do statistical inference on simulation models by adjusting the parameters in the simulation so that the values of {\em randomly chosen} functions of the simulation output match the values of those same functions calculated on the data. Results from the "state-space reconstruction" or "geometry from a time series'' literature in nonlinear dynamics indicate that just $2d+1$ such functions will typically suffice to identify a model with a $d$-dimensional parameter space. Results from the "random features" literature in machine learning suggest that using random functions of the data can be an efficient replacement for using optimal functions. In this preliminary, proof-of-concept note, I sketch some of the key results, and present numerical evidence about the new method's properties. A separate, forthcoming manuscript will elaborate on theoretical and numerical details. △ Less

Submitted 17 November, 2021; originally announced November 2021.

Comments: 41 pages, 14 figures

arXiv:1912.03387 [pdf, other]

Conditional Mutual Information Estimation for Mixed Discrete and Continuous Variables with Nearest Neighbors

Authors: Octavio César Mesner, Cosma Rohilla Shalizi

Abstract: Fields like public health, public policy, and social science often want to quantify the degree of dependence between variables whose relationships take on unknown functional forms. Typically, in fact, researchers in these fields are attempting to evaluate causal theories, and so want to quantify dependence after conditioning on other variables that might explain, mediate or confound causal relatio… ▽ More Fields like public health, public policy, and social science often want to quantify the degree of dependence between variables whose relationships take on unknown functional forms. Typically, in fact, researchers in these fields are attempting to evaluate causal theories, and so want to quantify dependence after conditioning on other variables that might explain, mediate or confound causal relations. One reason conditional mutual information is not more widely used for these tasks is the lack of estimators which can handle combinations of continuous and discrete random variables, common in applications. This paper develops a new method for estimating mutual and conditional mutual information for data samples containing a mix of discrete and continuous variables. We prove that this estimator is consistent and show, via simulation, that it is more accurate than similar estimators. △ Less

Submitted 6 December, 2019; originally announced December 2019.

arXiv:1711.00813 [pdf, ps, other]

doi 10.1214/21-EJS1896

Bootstrap** Exchangeable Random Graphs

Authors: Alden Green, Cosma Rohilla Shalizi

Abstract: We introduce two new bootstraps for exchangeable random graphs. One, the "empirical graphon bootstrap", is based purely on resampling, while the other, the "histogram bootstrap", is a model-based "sieve" bootstrap. We show that both of them accurately approximate the sampling distributions of motif densities, i.e., of the normalized counts of the number of times fixed subgraphs appear in the netwo… ▽ More We introduce two new bootstraps for exchangeable random graphs. One, the "empirical graphon bootstrap", is based purely on resampling, while the other, the "histogram bootstrap", is a model-based "sieve" bootstrap. We show that both of them accurately approximate the sampling distributions of motif densities, i.e., of the normalized counts of the number of times fixed subgraphs appear in the network. These densities characterize the distribution of (infinite) exchangeable networks. Our bootstraps therefore give a valid quantification of uncertainty in inferences about fundamental network statistics, and so of parameters identifiable from them. △ Less

Submitted 3 January, 2022; v1 submitted 2 November, 2017; originally announced November 2017.

Journal ref: Electronic Journal of Statistics, vol. 16 (2022), pp. 1058--1095

arXiv:1607.06565 [pdf, other]

doi 10.1080/01621459.2021.1953506

Estimating Causal Peer Influence in Homophilous Social Networks by Inferring Latent Locations

Authors: Edward McFowland III, Cosma Rohilla Shalizi

Abstract: Social influence cannot be identified from purely observational data on social networks, because such influence is generically confounded with latent homophily, i.e., with a node's network partners being informative about the node's attributes and therefore its behavior. If the network grows according to either a latent community (stochastic block) model, or a continuous latent space model, then l… ▽ More Social influence cannot be identified from purely observational data on social networks, because such influence is generically confounded with latent homophily, i.e., with a node's network partners being informative about the node's attributes and therefore its behavior. If the network grows according to either a latent community (stochastic block) model, or a continuous latent space model, then latent homophilous attributes can be consistently estimated from the global pattern of social ties. We show that, for common versions of those two network models, these estimates are so informative that controlling for estimated attributes allows for asymptotically unbiased and consistent estimation of social-influence effects in linear models. In particular, the bias shrinks at a rate which directly reflects how much information the network provides about the latent attributes. These are the first results on the consistent non-experimental estimation of social-influence effects in the presence of latent homophily, and we discuss the prospects for generalizing them. △ Less

Submitted 17 June, 2021; v1 submitted 22 July, 2016; originally announced July 2016.

Comments: 35 pages, 4 figures

Journal ref: Journal of the American Statistical Association (2022)

arXiv:1506.02686 [pdf, other]

The LICORS Cabinet: Nonparametric Algorithms for Spatio-temporal Prediction

Authors: George D. Montanez, Cosma Rohilla Shalizi

Abstract: Spatio-temporal data is intrinsically high dimensional, so unsupervised modeling is only feasible if we can exploit structure in the process. When the dynamics are local in both space and time, this structure can be exploited by splitting the global field into many lower-dimensional "light cones". We review light cone decompositions for predictive state reconstruction, introducing three simple lig… ▽ More Spatio-temporal data is intrinsically high dimensional, so unsupervised modeling is only feasible if we can exploit structure in the process. When the dynamics are local in both space and time, this structure can be exploited by splitting the global field into many lower-dimensional "light cones". We review light cone decompositions for predictive state reconstruction, introducing three simple light cone algorithms. These methods allow for tractable inference of spatio-temporal data, such as full-frame video. The algorithms make few assumptions on the underlying process yet have good predictive performance and can provide distributions over spatio-temporal data, enabling sophisticated probabilistic inference. △ Less

Submitted 14 September, 2016; v1 submitted 8 June, 2015; originally announced June 2015.

arXiv:1401.6595 [pdf, ps, other]

doi 10.1214/15-AOAS837

Regularized brain reading with shrinkage and smoothing

Authors: Leila Wehbe, Aaditya Ramdas, Rebecca C. Steorts, Cosma Rohilla Shalizi

Abstract: Functional neuroimaging measures how the brain responds to complex stimuli. However, sample sizes are modest, noise is substantial, and stimuli are high dimensional. Hence, direct estimates are inherently imprecise and call for regularization. We compare a suite of approaches which regularize via shrinkage: ridge regression, the elastic net (a generalization of ridge regression and the lasso), and… ▽ More Functional neuroimaging measures how the brain responds to complex stimuli. However, sample sizes are modest, noise is substantial, and stimuli are high dimensional. Hence, direct estimates are inherently imprecise and call for regularization. We compare a suite of approaches which regularize via shrinkage: ridge regression, the elastic net (a generalization of ridge regression and the lasso), and a hierarchical Bayesian model based on small area estimation (SAE). We contrast regularization with spatial smoothing and combinations of smoothing and shrinkage. All methods are tested on functional magnetic resonance imaging (fMRI) data from multiple subjects participating in two different experiments related to reading, for both predicting neural response to stimuli and decoding stimuli from responses. Interestingly, when the regularization parameters are chosen by cross-validation independently for every voxel, low/high regularization is chosen in voxels where the classification accuracy is high/low, indicating that the regularization intensity is a good tool for identification of relevant voxels for the cognitive task. Surprisingly, all the regularization methods work about equally well, suggesting that beating basic smoothing and shrinkage will take not only clever methods, but also careful modeling. △ Less

Submitted 4 February, 2016; v1 submitted 25 January, 2014; originally announced January 2014.

Comments: Published at http://dx.doi.org/10.1214/15-AOAS837 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS837

Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 4, 1997-2022

arXiv:1309.4859 [pdf, ps, other]

Predictive PAC Learning and Process Decompositions

Authors: Cosma Rohilla Shalizi, Aryeh Kontorovich

Abstract: We informally call a stochastic process learnable if it admits a generalization error approaching zero in probability for any concept class with finite VC-dimension (IID processes are the simplest example). A mixture of learnable processes need not be learnable itself, and certainly its generalization error need not decay at the same rate. In this paper, we argue that it is natural in predictive P… ▽ More We informally call a stochastic process learnable if it admits a generalization error approaching zero in probability for any concept class with finite VC-dimension (IID processes are the simplest example). A mixture of learnable processes need not be learnable itself, and certainly its generalization error need not decay at the same rate. In this paper, we argue that it is natural in predictive PAC to condition not on the past observations but on the mixture component of the sample path. This definition not only matches what a realistic learner might demand, but also allows us to sidestep several otherwise grave problems in learning from dependent data. In particular, we give a novel PAC generalization bound for mixtures of learnable processes with a generalization error that is not worse than that of each mixture component. We also provide a characterization of mixtures of absolutely regular ($β$-mixing) processes, of independent probability-theoretic interest. △ Less

Submitted 19 September, 2013; originally announced September 2013.

Comments: 9 pages, accepted in NIPS 2013

Journal ref: Advances in Neural Information Processing Systems 26 [NIPS 2013], pp.1619--1627

arXiv:1212.0463 [pdf, other]

Nonparametric risk bounds for time-series forecasting

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

Abstract: We derive generalization error bounds for traditional time-series forecasting models. Our results hold for many standard forecasting tools including autoregressive models, moving average models, and, more generally, linear state-space models. These non-asymptotic bounds need only weak assumptions on the data-generating process, yet allow forecasters to select among competing models and to guarante… ▽ More We derive generalization error bounds for traditional time-series forecasting models. Our results hold for many standard forecasting tools including autoregressive models, moving average models, and, more generally, linear state-space models. These non-asymptotic bounds need only weak assumptions on the data-generating process, yet allow forecasters to select among competing models and to guarantee, with high probability, that their chosen model will perform well. We motivate our techniques with and apply them to standard economic and financial forecasting tools---a GARCH model for predicting equity volatility and a dynamic stochastic general equilibrium model (DSGE), the standard tool in macroeconomic forecasting. We demonstrate in particular how our techniques can aid forecasters and policy makers in choosing models which behave well under uncertainty and mis-specification. △ Less

Submitted 10 September, 2016; v1 submitted 3 December, 2012; originally announced December 2012.

Comments: 34 pages, 3 figures

MSC Class: 62M20 (Primary) 91B84; 62G99 (Secondary)

Journal ref: Journal of Machine Learning Research. (2017). Vol 18. p. 1-40

arXiv:1211.3760 [pdf, other]

Mixed LICORS: A Nonparametric Algorithm for Predictive State Reconstruction

Authors: Georg M. Goerg, Cosma Rohilla Shalizi

Abstract: We introduce 'mixed LICORS', an algorithm for learning nonlinear, high-dimensional dynamics from spatio-temporal data, suitable for both prediction and simulation. Mixed LICORS extends the recent LICORS algorithm (Goerg and Shalizi, 2012) from hard clustering of predictive distributions to a non-parametric, EM-like soft clustering. This retains the asymptotic predictive optimality of LICORS, but,… ▽ More We introduce 'mixed LICORS', an algorithm for learning nonlinear, high-dimensional dynamics from spatio-temporal data, suitable for both prediction and simulation. Mixed LICORS extends the recent LICORS algorithm (Goerg and Shalizi, 2012) from hard clustering of predictive distributions to a non-parametric, EM-like soft clustering. This retains the asymptotic predictive optimality of LICORS, but, as we show in simulations, greatly improves out-of-sample forecasts with limited data. The new method is implemented in the publicly-available R package "LICORS" (http://cran.r-project.org/web/packages/LICORS/). △ Less

Submitted 2 May, 2013; v1 submitted 15 November, 2012; originally announced November 2012.

Comments: 11 pages; AISTATS 2013

Journal ref: AISTATS 2013, pp. 289--297

arXiv:1207.3994 [pdf, other]

doi 10.1088/1742-5468/2014/05/P05007

Model Selection for Degree-corrected Block Models

Authors: Xiaoran Yan, Cosma Rohilla Shalizi, Jacob E. Jensen, Florent Krzakala, Cristopher Moore, Lenka Zdeborova, Pan Zhang, Yaojia Zhu

Abstract: The proliferation of models for networks raises challenging problems of model selection: the data are sparse and globally dependent, and models are typically high-dimensional and have large numbers of latent variables. Together, these issues mean that the usual model-selection criteria do not work properly for networks. We illustrate these challenges, and show one way to resolve them, by consideri… ▽ More The proliferation of models for networks raises challenging problems of model selection: the data are sparse and globally dependent, and models are typically high-dimensional and have large numbers of latent variables. Together, these issues mean that the usual model-selection criteria do not work properly for networks. We illustrate these challenges, and show one way to resolve them, by considering the key network-analysis problem of dividing a graph into communities or blocks of nodes with homogeneous patterns of links to the rest of the network. The standard tool for doing this is the stochastic block model, under which the probability of a link between two nodes is a function solely of the blocks to which they belong. This imposes a homogeneous degree distribution within each block; this can be unrealistic, so degree-corrected block models add a parameter for each node, modulating its over-all degree. The choice between ordinary and degree-corrected block models matters because they make very different inferences about communities. We present the first principled and tractable approach to model selection between standard and degree-corrected block models, based on new large-graph asymptotics for the distribution of log-likelihood ratios under the stochastic block model, finding substantial departures from classical results for sparse graphs. We also develop linear-time approximations for log-likelihoods under both the stochastic block model and the degree-corrected model, using belief propagation. Applications to simulated and real networks show excellent agreement with our approximations. Our results thus both solve the practical problem of deciding on degree correction, and point to a general approach to model selection in network analysis. △ Less

Submitted 30 May, 2013; v1 submitted 17 July, 2012; originally announced July 2012.

Journal ref: J. Stat. Mech. (2014) P05007

arXiv:1206.2398 [pdf, other]

LICORS: Light Cone Reconstruction of States for Non-parametric Forecasting of Spatio-Temporal Systems

Authors: Georg M. Goerg, Cosma Rohilla Shalizi

Abstract: We present a new, non-parametric forecasting method for data where continuous values are observed discretely in space and time. Our method, "light-cone reconstruction of states" (LICORS), uses physical principles to identify predictive states which are local properties of the system, both in space and time. LICORS discovers the number of predictive states and their predictive distributions automat… ▽ More We present a new, non-parametric forecasting method for data where continuous values are observed discretely in space and time. Our method, "light-cone reconstruction of states" (LICORS), uses physical principles to identify predictive states which are local properties of the system, both in space and time. LICORS discovers the number of predictive states and their predictive distributions automatically, and consistently, under mild assumptions on the data source. We provide an algorithm to implement our method, along with a cross-validation scheme to pick control settings. Simulations show that CV-tuned LICORS outperforms standard methods in forecasting challenging spatio-temporal dynamics. Our work provides applied researchers with a new, highly automatic method to analyze and forecast spatio-temporal data. △ Less

Submitted 3 August, 2012; v1 submitted 11 June, 2012; originally announced June 2012.

Comments: Main text: 30 pages; supplementary material: 12 pages; 5+2 figures

arXiv:1111.3404 [pdf, ps, other]

Estimated VC dimension for risk bounds

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

Abstract: Vapnik-Chervonenkis (VC) dimension is a fundamental measure of the generalization capacity of learning algorithms. However, apart from a few special cases, it is hard or impossible to calculate analytically. Vapnik et al. [10] proposed a technique for estimating the VC dimension empirically. While their approach behaves well in simulations, it could not be used to bound the generalization risk of… ▽ More Vapnik-Chervonenkis (VC) dimension is a fundamental measure of the generalization capacity of learning algorithms. However, apart from a few special cases, it is hard or impossible to calculate analytically. Vapnik et al. [10] proposed a technique for estimating the VC dimension empirically. While their approach behaves well in simulations, it could not be used to bound the generalization risk of classifiers, because there were no bounds for the estimation error of the VC dimension itself. We rectify this omission, providing high probability concentration results for the proposed estimator and deriving corresponding generalization bounds. △ Less

Submitted 14 November, 2011; originally announced November 2011.

Comments: 11 pages

arXiv:1106.0730 [pdf, ps, other]

Rademacher complexity of stationary sequences

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi

Abstract: We show how to control the generalization error of time series models wherein past values of the outcome are used to predict future values. The results are based on a generalization of standard i.i.d. concentration inequalities to dependent data without the mixing assumptions common in the time series setting. Our proof and the result are simpler than previous analyses with dependent data or stoch… ▽ More We show how to control the generalization error of time series models wherein past values of the outcome are used to predict future values. The results are based on a generalization of standard i.i.d. concentration inequalities to dependent data without the mixing assumptions common in the time series setting. Our proof and the result are simpler than previous analyses with dependent data or stochastic adversaries which use sequential Rademacher complexities rather than the expected Rademacher complexity for i.i.d. processes. We also derive empirical Rademacher results without mixing assumptions resulting in fully calculable upper bounds. △ Less

Submitted 22 May, 2017; v1 submitted 3 June, 2011; originally announced June 2011.

Comments: 15 pages, 1 figure

arXiv:1103.0949 [pdf, other]

Adapting to Non-stationarity with Growing Expert Ensembles

Authors: Cosma Rohilla Shalizi, Abigail Z. Jacobs, Kristina Lisa Klinkner, Aaron Clauset

Abstract: When dealing with time series with complex non-stationarities, low retrospective regret on individual realizations is a more appropriate goal than low prospective risk in expectation. Online learning algorithms provide powerful guarantees of this form, and have often been proposed for use with non-stationary processes because of their ability to switch between different forecasters or ``experts''.… ▽ More When dealing with time series with complex non-stationarities, low retrospective regret on individual realizations is a more appropriate goal than low prospective risk in expectation. Online learning algorithms provide powerful guarantees of this form, and have often been proposed for use with non-stationary processes because of their ability to switch between different forecasters or ``experts''. However, existing methods assume that the set of experts whose forecasts are to be combined are all given at the start, which is not plausible when dealing with a genuinely historical or evolutionary system. We show how to modify the ``fixed shares'' algorithm for tracking the best expert to cope with a steadily growing set of experts, obtained by fitting new models to new data as it becomes available, and obtain regret bounds for the growing ensemble. △ Less

Submitted 28 June, 2011; v1 submitted 4 March, 2011; originally announced March 2011.

Comments: 9 pages, 1 figure; CMU Statistics Technical Report. v2: Added empirical example, revised discussion of related work

arXiv:1103.0942 [pdf, other]

Generalization error bounds for stationary autoregressive models

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

Abstract: We derive generalization error bounds for stationary univariate autoregressive (AR) models. We show that imposing stationarity is enough to control the Gaussian complexity without further regularization. This lets us use structural risk minimization for model selection. We demonstrate our methods by predicting interest rate movements. We derive generalization error bounds for stationary univariate autoregressive (AR) models. We show that imposing stationarity is enough to control the Gaussian complexity without further regularization. This lets us use structural risk minimization for model selection. We demonstrate our methods by predicting interest rate movements. △ Less

Submitted 3 June, 2011; v1 submitted 4 March, 2011; originally announced March 2011.

Comments: 10 pages, 3 figures. CMU Statistics Technical Report

arXiv:1103.0941 [pdf, ps, other]

Estimating $β$-mixing coefficients

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

Abstract: The literature on statistical learning for time series assumes the asymptotic independence or ``mixing' of the data-generating process. These mixing assumptions are never tested, nor are there methods for estimating mixing rates from data. We give an estimator for the $β$-mixing rate based on a single stationary sample path and show it is $L_1$-risk consistent. The literature on statistical learning for time series assumes the asymptotic independence or ``mixing' of the data-generating process. These mixing assumptions are never tested, nor are there methods for estimating mixing rates from data. We give an estimator for the $β$-mixing rate based on a single stationary sample path and show it is $L_1$-risk consistent. △ Less

Submitted 4 March, 2011; originally announced March 2011.

Comments: 9 pages, accepted by AIStats. CMU Statistics Technical Report

Journal ref: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011), pp. 516--524

arXiv:1102.4101 [pdf, ps, other]

Scaling and Hierarchy in Urban Economies

Authors: Cosma Rohilla Shalizi

Abstract: In several recent publications, Bettencourt, West and collaborators claim that properties of cities such as gross economic production, personal income, numbers of patents filed, number of crimes committed, etc., show super-linear power-scaling with total population, while measures of resource use show sub-linear power-law scaling. Re-analysis of the gross economic production and personal income fo… ▽ More In several recent publications, Bettencourt, West and collaborators claim that properties of cities such as gross economic production, personal income, numbers of patents filed, number of crimes committed, etc., show super-linear power-scaling with total population, while measures of resource use show sub-linear power-law scaling. Re-analysis of the gross economic production and personal income for cities in the United States, however, shows that the data cannot distinguish between power laws and other functional forms, including logarithmic growth, and that size predicts relatively little of the variation between cities. The striking appearance of scaling in previous work is largely artifact of using extensive quantities (city-wide totals) rather than intensive ones (per-capita rates). The remaining dependence of productivity on city size is explained by concentration of specialist service industries, with high value-added per worker, in larger cities, in accordance with the long-standing economic notion of the "hierarchy of central places". △ Less

Submitted 7 April, 2011; v1 submitted 20 February, 2011; originally announced February 2011.

Comments: v1: 15 pages, 9 figures, combines main text and supporting information into one document. Submitted to PNAS. v2: Text re-arranged to comply with journal policies; added analysis with logistic (asymptotically constant) scaling relations; minor corrections

arXiv:1004.4704 [pdf, other]

doi 10.1177/0049124111404820

Homophily and Contagion Are Generically Confounded in Observational Social Network Studies

Authors: Cosma Rohilla Shalizi, Andrew C. Thomas

Abstract: We consider processes on social networks that can potentially involve three factors: homophily, or the formation of social ties due to matching individual traits; social contagion, also known as social influence; and the causal effect of an individual's covariates on their behavior or other measurable responses. We show that, generically, all of these are confounded with each other. Distinguishing… ▽ More We consider processes on social networks that can potentially involve three factors: homophily, or the formation of social ties due to matching individual traits; social contagion, also known as social influence; and the causal effect of an individual's covariates on their behavior or other measurable responses. We show that, generically, all of these are confounded with each other. Distinguishing them from one another requires strong assumptions on the parametrization of the social process or on the adequacy of the covariates used (or both). In particular we demonstrate, with simple examples, that asymmetries in regression coefficients cannot identify causal effects, and that very simple models of imitation (a form of social contagion) can produce substantial correlations between an individual's enduring traits and their choices, even when there is no intrinsic affinity between them. We also suggest some possible constructive responses to these results. △ Less

Submitted 29 November, 2010; v1 submitted 27 April, 2010; originally announced April 2010.

Comments: 27 pages, 9 figures. V2: Revised in response to referees. V3: Ditto

Journal ref: Sociological Methods and Research, vol. 40 (2011), pp. 211--239

arXiv:1004.3476 [pdf, ps, other]

doi 10.1198/jasa.2009.tm08326

Approximate Methods for State-Space Models

Authors: Shinsuke Koyama, Lucia Castellanos Pérez-Bolde, Cosma Rohilla Shalizi, Robert E. Kass

Abstract: State-space models provide an important body of techniques for analyzing time-series, but their use requires estimating unobserved states. The optimal estimate of the state is its conditional expectation given the observation histories, and computing this expectation is hard when there are nonlinearities. Existing filtering methods, including sequential Monte Carlo, tend to be either inaccurate… ▽ More State-space models provide an important body of techniques for analyzing time-series, but their use requires estimating unobserved states. The optimal estimate of the state is its conditional expectation given the observation histories, and computing this expectation is hard when there are nonlinearities. Existing filtering methods, including sequential Monte Carlo, tend to be either inaccurate or slow. In this paper, we study a nonlinear filter for nonlinear/non-Gaussian state-space models, which uses Laplace's method, an asymptotic series expansion, to approximate the state's conditional mean and variance, together with a Gaussian conditional distribution. This {\em Laplace-Gaussian filter} (LGF) gives fast, recursive, deterministic state estimates, with an error which is set by the stochastic characteristics of the model and is, we show, stable over time. We illustrate the estimation ability of the LGF by applying it to the problem of neural decoding and compare it to sequential Monte Carlo both in simulations and with real data. We find that the LGF can deliver superior results in a small fraction of the computing time. △ Less

Submitted 20 April, 2010; originally announced April 2010.

Comments: 31 pages, 4 figures. Different pagination from journal version due to incompatible style files but same content; the supplemental file for the journal appears here as appendices B--E.

Journal ref: Journal of the American Statistical Association, volume 105, 2010, pp. 170--180

arXiv:1001.0036 [pdf, other]

doi 10.1162/neco.2009.12-07-678

The Computational Structure of Spike Trains

Authors: Robert Haslinger, Kristina Lisa Klinkner, Cosma Rohilla Shalizi

Abstract: Neurons perform computations, and convey the results of those computations through the statistical structure of their output spike trains. Here we present a practical method, grounded in the information-theoretic analysis of prediction, for inferring a minimal representation of that structure and for characterizing its complexity. Starting from spike trains, our approach finds their causal state… ▽ More Neurons perform computations, and convey the results of those computations through the statistical structure of their output spike trains. Here we present a practical method, grounded in the information-theoretic analysis of prediction, for inferring a minimal representation of that structure and for characterizing its complexity. Starting from spike trains, our approach finds their causal state models (CSMs), the minimal hidden Markov models or stochastic automata capable of generating statistically identical time series. We then use these CSMs to objectively quantify both the generalizable structure and the idiosyncratic randomness of the spike train. Specifically, we show that the expected algorithmic information content (the information needed to describe the spike train exactly) can be split into three parts describing (1) the time-invariant structure (complexity) of the minimal spike-generating process, which describes the spike train statistically; (2) the randomness (internal entropy rate) of the minimal spike-generating process; and (3) a residual pure noise term not described by the minimal spike-generating process. We use CSMs to approximate each of these quantities. The CSMs are inferred nonparametrically from the data, making only mild regularity assumptions, via the causal state splitting reconstruction algorithm. The methods presented here complement more traditional spike train analyses by describing not only spiking probability and spike train entropy, but also the complexity of a spike train's structure. We demonstrate our approach using both simulated spike trains and experimental data recorded in rat barrel cortex during vibrissa stimulation. △ Less

Submitted 30 December, 2009; originally announced January 2010.

Comments: Somewhat different format from journal version but same content

Journal ref: Neural Computation, vol. 22 (2010), pp. 121--157

arXiv:0706.1062 [pdf, ps, other]

doi 10.1137/070710111

Power-law distributions in empirical data

Authors: Aaron Clauset, Cosma Rohilla Shalizi, M. E. J. Newman

Abstract: Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution -- the part of the distribution representing large but rare events -- and by the diffic… ▽ More Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution -- the part of the distribution representing large but rare events -- and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data while in others the power law is ruled out. △ Less

Submitted 2 February, 2009; v1 submitted 7 June, 2007; originally announced June 2007.

Comments: 43 pages, 11 figures, 7 tables, 4 appendices; code available at http://www.santafe.edu/~aaronc/powerlaws/

Journal ref: SIAM Review 51, 661-703 (2009)

Showing 1–24 of 24 results for author: Shalizi, C R