Skip to main content

Showing 1–50 of 54 results for author: Shalizi, C R

.
  1. arXiv:2210.16224  [pdf, other

    stat.AP

    Empirical Macroeconomics and DSGE Modeling in Statistical Perspective

    Authors: Daniel J. McDonald, Cosma Rohilla Shalizi

    Abstract: Dynamic stochastic general equilibrium (DSGE) models have been an ubiquitous, and controversial, part of macroeconomics for decades. In this paper, we approach DSGEs purely as statstical models. We do this by applying two common model validation checks to the canonical Smets and Wouters 2007 DSGE: (1) we simulate the model and see how well it can be estimated from its own simulation output, and (2… ▽ More

    Submitted 31 October, 2022; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: 36 pages, 21 figures, 7 tables

  2. arXiv:2205.13698  [pdf, other

    stat.ME stat.ML

    Characterizing the robustness of Bayesian adaptive experimental designs to active learning bias

    Authors: Sabina J. Sloman, Daniel M. Oppenheimer, Stephen B. Broomell, Cosma Rohilla Shalizi

    Abstract: Bayesian adaptive experimental design is a form of active learning, which chooses samples to maximize the information they give about uncertain parameters. Prior work has shown that other forms of active learning can suffer from active learning bias, where unrepresentative sampling leads to inconsistent parameter estimates. We show that active learning bias can also afflict Bayesian adaptive exper… ▽ More

    Submitted 28 November, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

  3. arXiv:2203.09085  [pdf, ps, other

    math.PR physics.data-an

    A Simple Non-Stationary Mean Ergodic Theorem, with Bonus Weak Law of Large Numbers

    Authors: Cosma Rohilla Shalizi

    Abstract: This brief pedagogical note re-proves a simple theorem on the convergence, in $L_2$ and in probability, of time averages of non-stationary time series to the mean of expectation values. The basic condition is that the sum of covariances grows sub-quadratically with the length of the time series. I make no claim to originality; the result is widely, but unevenly, spread bit of folklore among users… ▽ More

    Submitted 19 March, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

    Comments: v2: Fixed notation to replace statements like $A_n \rightarrow m_n$ with ones like $A_n - m_n \rightarrow 0$; small wording changes and typo corrections in Remark 3

  4. arXiv:2203.09077  [pdf, other

    stat.CO

    Evaluating Posterior Distributions by Selectively Breeding Prior Samples

    Authors: Cosma Rohilla Shalizi

    Abstract: Using Markov chain Monte Carlo to sample from posterior distributions was the key innovation which made Bayesian data analysis practical. Notoriously, however, MCMC is hard to tune, hard to diagnose, and hard to parallelize. This pedagogical note explores variants on a universal {\em non}-Markov-chain Monte Carlo scheme for sampling from posterior distributions. The basic idea is to draw parameter… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: 16 pages, 2 figures, code included in text

  5. arXiv:2111.09220  [pdf, other

    stat.ME nlin.CD physics.data-an

    A Note on Simulation-Based Inference by Matching Random Features

    Authors: Cosma Rohilla Shalizi

    Abstract: We can, and should, do statistical inference on simulation models by adjusting the parameters in the simulation so that the values of {\em randomly chosen} functions of the simulation output match the values of those same functions calculated on the data. Results from the "state-space reconstruction" or "geometry from a time series'' literature in nonlinear dynamics indicate that just $2d+1$ such… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

    Comments: 41 pages, 14 figures

  6. arXiv:1912.03387  [pdf, other

    math.ST stat.ME

    Conditional Mutual Information Estimation for Mixed Discrete and Continuous Variables with Nearest Neighbors

    Authors: Octavio César Mesner, Cosma Rohilla Shalizi

    Abstract: Fields like public health, public policy, and social science often want to quantify the degree of dependence between variables whose relationships take on unknown functional forms. Typically, in fact, researchers in these fields are attempting to evaluate causal theories, and so want to quantify dependence after conditioning on other variables that might explain, mediate or confound causal relatio… ▽ More

    Submitted 6 December, 2019; originally announced December 2019.

  7. arXiv:1711.02834  [pdf, other

    math.ST

    Bootstrap** Generalization Error Bounds for Time Series

    Authors: Robert Lunde, Cosma Rohilla Shalizi

    Abstract: We consider the problem of finding confidence intervals for the risk of forecasting the future of a stationary, ergodic stochastic process, using a model estimated from the past of the process. We show that a bootstrap procedure provides valid confidence intervals for the risk, when the data source is sufficiently mixing, and the loss function and the estimator are suitably smooth. Autoregressive… ▽ More

    Submitted 29 November, 2017; v1 submitted 8 November, 2017; originally announced November 2017.

  8. arXiv:1711.02123  [pdf, ps, other

    math.ST cs.SI physics.soc-ph

    Consistency of Maximum Likelihood for Continuous-Space Network Models I

    Authors: Cosma Rohilla Shalizi, Dena Marie Asta

    Abstract: A very popular class of models for networks posits that each node is represented by a point in a continuous latent space, and that the probability of an edge between nodes is a decreasing function of the distance between them in this latent space. We study the embedding problem for these models, of recovering the latent positions from the observed graph. Assuming certain natural symmetry and smoot… ▽ More

    Submitted 29 June, 2022; v1 submitted 6 November, 2017; originally announced November 2017.

    Comments: 17 pages

  9. Bootstrap** Exchangeable Random Graphs

    Authors: Alden Green, Cosma Rohilla Shalizi

    Abstract: We introduce two new bootstraps for exchangeable random graphs. One, the "empirical graphon bootstrap", is based purely on resampling, while the other, the "histogram bootstrap", is a model-based "sieve" bootstrap. We show that both of them accurately approximate the sampling distributions of motif densities, i.e., of the normalized counts of the number of times fixed subgraphs appear in the netwo… ▽ More

    Submitted 3 January, 2022; v1 submitted 2 November, 2017; originally announced November 2017.

    Journal ref: Electronic Journal of Statistics, vol. 16 (2022), pp. 1058--1095

  10. arXiv:1709.09702  [pdf, other

    math.ST

    Projective, Sparse, and Learnable Latent Position Network Models

    Authors: Neil A. Spencer, Cosma Rohilla Shalizi

    Abstract: When modeling network data using a latent position model, it is typical to assume that the nodes' positions are independently and identically distributed. However, this assumption implies the average node degree grows linearly with the number of nodes, which is inappropriate when the graph is thought to be sparse. We propose an alternative assumption -- that the latent positions are generated acco… ▽ More

    Submitted 8 September, 2023; v1 submitted 27 September, 2017; originally announced September 2017.

    Comments: 70 pages, 2 figures

  11. arXiv:1607.06565  [pdf, other

    stat.ME cs.SI physics.soc-ph

    Estimating Causal Peer Influence in Homophilous Social Networks by Inferring Latent Locations

    Authors: Edward McFowland III, Cosma Rohilla Shalizi

    Abstract: Social influence cannot be identified from purely observational data on social networks, because such influence is generically confounded with latent homophily, i.e., with a node's network partners being informative about the node's attributes and therefore its behavior. If the network grows according to either a latent community (stochastic block) model, or a continuous latent space model, then l… ▽ More

    Submitted 17 June, 2021; v1 submitted 22 July, 2016; originally announced July 2016.

    Comments: 35 pages, 4 figures

    Journal ref: Journal of the American Statistical Association (2022)

  12. arXiv:1506.02686  [pdf, other

    stat.ML cs.LG

    The LICORS Cabinet: Nonparametric Algorithms for Spatio-temporal Prediction

    Authors: George D. Montanez, Cosma Rohilla Shalizi

    Abstract: Spatio-temporal data is intrinsically high dimensional, so unsupervised modeling is only feasible if we can exploit structure in the process. When the dynamics are local in both space and time, this structure can be exploited by splitting the global field into many lower-dimensional "light cones". We review light cone decompositions for predictive state reconstruction, introducing three simple lig… ▽ More

    Submitted 14 September, 2016; v1 submitted 8 June, 2015; originally announced June 2015.

  13. arXiv:1411.1350  [pdf, other

    math.ST

    Geometric Network Comparison

    Authors: Dena Asta, Cosma Rohilla Shalizi

    Abstract: Network analysis has a crucial need for tools to compare networks and assess the significance of differences between networks. We propose a principled statistical approach to network comparison that approximates networks as probability distributions on negatively curved manifolds. We outline the theory, as well as implement the approach on simulated networks.

    Submitted 5 November, 2014; originally announced November 2014.

  14. Regularized brain reading with shrinkage and smoothing

    Authors: Leila Wehbe, Aaditya Ramdas, Rebecca C. Steorts, Cosma Rohilla Shalizi

    Abstract: Functional neuroimaging measures how the brain responds to complex stimuli. However, sample sizes are modest, noise is substantial, and stimuli are high dimensional. Hence, direct estimates are inherently imprecise and call for regularization. We compare a suite of approaches which regularize via shrinkage: ridge regression, the elastic net (a generalization of ridge regression and the lasso), and… ▽ More

    Submitted 4 February, 2016; v1 submitted 25 January, 2014; originally announced January 2014.

    Comments: Published at http://dx.doi.org/10.1214/15-AOAS837 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS837

    Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 4, 1997-2022

  15. arXiv:1309.4859  [pdf, ps, other

    stat.ML

    Predictive PAC Learning and Process Decompositions

    Authors: Cosma Rohilla Shalizi, Aryeh Kontorovich

    Abstract: We informally call a stochastic process learnable if it admits a generalization error approaching zero in probability for any concept class with finite VC-dimension (IID processes are the simplest example). A mixture of learnable processes need not be learnable itself, and certainly its generalization error need not decay at the same rate. In this paper, we argue that it is natural in predictive P… ▽ More

    Submitted 19 September, 2013; originally announced September 2013.

    Comments: 9 pages, accepted in NIPS 2013

    Journal ref: Advances in Neural Information Processing Systems 26 [NIPS 2013], pp.1619--1627

  16. arXiv:1212.0463  [pdf, other

    math.ST cs.LG stat.ML

    Nonparametric risk bounds for time-series forecasting

    Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

    Abstract: We derive generalization error bounds for traditional time-series forecasting models. Our results hold for many standard forecasting tools including autoregressive models, moving average models, and, more generally, linear state-space models. These non-asymptotic bounds need only weak assumptions on the data-generating process, yet allow forecasters to select among competing models and to guarante… ▽ More

    Submitted 10 September, 2016; v1 submitted 3 December, 2012; originally announced December 2012.

    Comments: 34 pages, 3 figures

    MSC Class: 62M20 (Primary) 91B84; 62G99 (Secondary)

    Journal ref: Journal of Machine Learning Research. (2017). Vol 18. p. 1-40

  17. arXiv:1211.3760  [pdf, other

    stat.ME stat.ML

    Mixed LICORS: A Nonparametric Algorithm for Predictive State Reconstruction

    Authors: Georg M. Goerg, Cosma Rohilla Shalizi

    Abstract: We introduce 'mixed LICORS', an algorithm for learning nonlinear, high-dimensional dynamics from spatio-temporal data, suitable for both prediction and simulation. Mixed LICORS extends the recent LICORS algorithm (Goerg and Shalizi, 2012) from hard clustering of predictive distributions to a non-parametric, EM-like soft clustering. This retains the asymptotic predictive optimality of LICORS, but,… ▽ More

    Submitted 2 May, 2013; v1 submitted 15 November, 2012; originally announced November 2012.

    Comments: 11 pages; AISTATS 2013

    Journal ref: AISTATS 2013, pp. 289--297

  18. arXiv:1207.3994  [pdf, other

    cs.SI cond-mat.stat-mech math.ST physics.soc-ph stat.ML

    Model Selection for Degree-corrected Block Models

    Authors: Xiaoran Yan, Cosma Rohilla Shalizi, Jacob E. Jensen, Florent Krzakala, Cristopher Moore, Lenka Zdeborova, Pan Zhang, Yaojia Zhu

    Abstract: The proliferation of models for networks raises challenging problems of model selection: the data are sparse and globally dependent, and models are typically high-dimensional and have large numbers of latent variables. Together, these issues mean that the usual model-selection criteria do not work properly for networks. We illustrate these challenges, and show one way to resolve them, by consideri… ▽ More

    Submitted 30 May, 2013; v1 submitted 17 July, 2012; originally announced July 2012.

    Journal ref: J. Stat. Mech. (2014) P05007

  19. arXiv:1206.2398  [pdf, other

    stat.ME nlin.CG physics.data-an

    LICORS: Light Cone Reconstruction of States for Non-parametric Forecasting of Spatio-Temporal Systems

    Authors: Georg M. Goerg, Cosma Rohilla Shalizi

    Abstract: We present a new, non-parametric forecasting method for data where continuous values are observed discretely in space and time. Our method, "light-cone reconstruction of states" (LICORS), uses physical principles to identify predictive states which are local properties of the system, both in space and time. LICORS discovers the number of predictive states and their predictive distributions automat… ▽ More

    Submitted 3 August, 2012; v1 submitted 11 June, 2012; originally announced June 2012.

    Comments: Main text: 30 pages; supplementary material: 12 pages; 5+2 figures

  20. arXiv:1111.3404  [pdf, ps, other

    stat.ML

    Estimated VC dimension for risk bounds

    Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

    Abstract: Vapnik-Chervonenkis (VC) dimension is a fundamental measure of the generalization capacity of learning algorithms. However, apart from a few special cases, it is hard or impossible to calculate analytically. Vapnik et al. [10] proposed a technique for estimating the VC dimension empirically. While their approach behaves well in simulations, it could not be used to bound the generalization risk of… ▽ More

    Submitted 14 November, 2011; originally announced November 2011.

    Comments: 11 pages

  21. Consistency under sampling of exponential random graph models

    Authors: Cosma Rohilla Shalizi, Alessandro Rinaldo

    Abstract: The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-netwo… ▽ More

    Submitted 9 May, 2013; v1 submitted 13 November, 2011; originally announced November 2011.

    Comments: Published in at http://dx.doi.org/10.1214/12-AOS1044 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1044

    Journal ref: Annals of Statistics 2013, Vol. 41, No. 2, 508-535

  22. Estimating beta-mixing coefficients via histograms

    Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

    Abstract: The literature on statistical learning for time series often assumes asymptotic independence or "mixing" of the data-generating process. These mixing assumptions are never tested, nor are there methods for estimating mixing coefficients from data. Additionally, for many common classes of processes (Markov processes, ARMA processes, etc.) general functional forms for various mixing rates are known,… ▽ More

    Submitted 8 February, 2016; v1 submitted 27 September, 2011; originally announced September 2011.

    Comments: 30 pages, 8 figures. Longer version of arXiv:1103.0941 [stat.ML]

    Journal ref: Electron. J. Statist. 9 (2015), no. 2, 2855--2883

  23. arXiv:1106.0730  [pdf, ps, other

    stat.ML cs.LG

    Rademacher complexity of stationary sequences

    Authors: Daniel J. McDonald, Cosma Rohilla Shalizi

    Abstract: We show how to control the generalization error of time series models wherein past values of the outcome are used to predict future values. The results are based on a generalization of standard i.i.d. concentration inequalities to dependent data without the mixing assumptions common in the time series setting. Our proof and the result are simpler than previous analyses with dependent data or stoch… ▽ More

    Submitted 22 May, 2017; v1 submitted 3 June, 2011; originally announced June 2011.

    Comments: 15 pages, 1 figure

  24. arXiv:1103.0949  [pdf, other

    stat.ML cs.LG physics.data-an stat.ME

    Adapting to Non-stationarity with Growing Expert Ensembles

    Authors: Cosma Rohilla Shalizi, Abigail Z. Jacobs, Kristina Lisa Klinkner, Aaron Clauset

    Abstract: When dealing with time series with complex non-stationarities, low retrospective regret on individual realizations is a more appropriate goal than low prospective risk in expectation. Online learning algorithms provide powerful guarantees of this form, and have often been proposed for use with non-stationary processes because of their ability to switch between different forecasters or ``experts''.… ▽ More

    Submitted 28 June, 2011; v1 submitted 4 March, 2011; originally announced March 2011.

    Comments: 9 pages, 1 figure; CMU Statistics Technical Report. v2: Added empirical example, revised discussion of related work

  25. arXiv:1103.0942  [pdf, other

    stat.ML cs.LG

    Generalization error bounds for stationary autoregressive models

    Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

    Abstract: We derive generalization error bounds for stationary univariate autoregressive (AR) models. We show that imposing stationarity is enough to control the Gaussian complexity without further regularization. This lets us use structural risk minimization for model selection. We demonstrate our methods by predicting interest rate movements.

    Submitted 3 June, 2011; v1 submitted 4 March, 2011; originally announced March 2011.

    Comments: 10 pages, 3 figures. CMU Statistics Technical Report

  26. arXiv:1103.0941  [pdf, ps, other

    stat.ML cs.LG math.PR

    Estimating $β$-mixing coefficients

    Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

    Abstract: The literature on statistical learning for time series assumes the asymptotic independence or ``mixing' of the data-generating process. These mixing assumptions are never tested, nor are there methods for estimating mixing rates from data. We give an estimator for the $β$-mixing rate based on a single stationary sample path and show it is $L_1$-risk consistent.

    Submitted 4 March, 2011; originally announced March 2011.

    Comments: 9 pages, accepted by AIStats. CMU Statistics Technical Report

    Journal ref: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011), pp. 516--524

  27. arXiv:1102.4101  [pdf, ps, other

    stat.AP physics.data-an physics.soc-ph

    Scaling and Hierarchy in Urban Economies

    Authors: Cosma Rohilla Shalizi

    Abstract: In several recent publications, Bettencourt, West and collaborators claim that properties of cities such as gross economic production, personal income, numbers of patents filed, number of crimes committed, etc., show super-linear power-scaling with total population, while measures of resource use show sub-linear power-law scaling. Re-analysis of the gross economic production and personal income fo… ▽ More

    Submitted 7 April, 2011; v1 submitted 20 February, 2011; originally announced February 2011.

    Comments: v1: 15 pages, 9 figures, combines main text and supporting information into one document. Submitted to PNAS. v2: Text re-arranged to comply with journal policies; added analysis with logistic (asymptotically constant) scaling relations; minor corrections

  28. arXiv:1006.3868  [pdf, other

    math.ST physics.data-an

    Philosophy and the practice of Bayesian statistics

    Authors: Andrew Gelman, Cosma Rohilla Shalizi

    Abstract: A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypoth… ▽ More

    Submitted 28 June, 2011; v1 submitted 19 June, 2010; originally announced June 2010.

    Comments: 36 pages, 5 figures. v2: Fixed typo in caption of figure 1. v3: Further typo fixes. v4: Revised in response to referees

    Journal ref: British Journal of Mathematical and Statistical Psychology, vol. 66 (2013), pp. 8--38

  29. arXiv:1004.4704  [pdf, other

    stat.AP cs.SI physics.data-an physics.soc-ph

    Homophily and Contagion Are Generically Confounded in Observational Social Network Studies

    Authors: Cosma Rohilla Shalizi, Andrew C. Thomas

    Abstract: We consider processes on social networks that can potentially involve three factors: homophily, or the formation of social ties due to matching individual traits; social contagion, also known as social influence; and the causal effect of an individual's covariates on their behavior or other measurable responses. We show that, generically, all of these are confounded with each other. Distinguishing… ▽ More

    Submitted 29 November, 2010; v1 submitted 27 April, 2010; originally announced April 2010.

    Comments: 27 pages, 9 figures. V2: Revised in response to referees. V3: Ditto

    Journal ref: Sociological Methods and Research, vol. 40 (2011), pp. 211--239

  30. arXiv:1004.3476  [pdf, ps, other

    stat.ME physics.data-an q-bio.NC

    Approximate Methods for State-Space Models

    Authors: Shinsuke Koyama, Lucia Castellanos Pérez-Bolde, Cosma Rohilla Shalizi, Robert E. Kass

    Abstract: State-space models provide an important body of techniques for analyzing time-series, but their use requires estimating unobserved states. The optimal estimate of the state is its conditional expectation given the observation histories, and computing this expectation is hard when there are nonlinearities. Existing filtering methods, including sequential Monte Carlo, tend to be either inaccurate… ▽ More

    Submitted 20 April, 2010; originally announced April 2010.

    Comments: 31 pages, 4 figures. Different pagination from journal version due to incompatible style files but same content; the supplemental file for the journal appears here as appendices B--E.

    Journal ref: Journal of the American Statistical Association, volume 105, 2010, pp. 170--180

  31. arXiv:1001.0036  [pdf, other

    q-bio.NC cs.IT nlin.AO physics.data-an stat.ML

    The Computational Structure of Spike Trains

    Authors: Robert Haslinger, Kristina Lisa Klinkner, Cosma Rohilla Shalizi

    Abstract: Neurons perform computations, and convey the results of those computations through the statistical structure of their output spike trains. Here we present a practical method, grounded in the information-theoretic analysis of prediction, for inferring a minimal representation of that structure and for characterizing its complexity. Starting from spike trains, our approach finds their causal state… ▽ More

    Submitted 30 December, 2009; originally announced January 2010.

    Comments: Somewhat different format from journal version but same content

    Journal ref: Neural Computation, vol. 22 (2010), pp. 121--157

  32. arXiv:0901.1342  [pdf, other

    math.ST q-bio.PE

    Dynamics of Bayesian Updating with Dependent Data and Misspecified Models

    Authors: Cosma Rohilla Shalizi

    Abstract: Much is now known about the consistency of Bayesian updating on infinite-dimensional parameter spaces with independent or Markovian data. Necessary conditions for consistency include the prior putting enough weight on the correct neighborhoods of the data-generating distribution; various sufficient conditions further restrict the prior in ways analogous to capacity control in frequentist nonpara… ▽ More

    Submitted 13 November, 2009; v1 submitted 11 January, 2009; originally announced January 2009.

    Comments: 36 pages, 1 figure. v2: typo fixes, minor formatting changes. v3: Improved notation, added references, new theorem on convergence rates. v4: minor changes to text, added references. v5: Minor typo corrections; matches journal version except for format details

    MSC Class: 62C10; 62G20; 62M09; 60F10; 62M05; 92D15; 94A17

    Journal ref: _Electronic Journal of Statistics_, vol. 3 (2009): 1039--1074

  33. arXiv:0710.4911  [pdf, other

    cs.CY physics.soc-ph

    Social Media as Windows on the Social Life of the Mind

    Authors: Cosma Rohilla Shalizi

    Abstract: This is a programmatic paper, marking out two directions in which the study of social media can contribute to broader problems of social science: understanding cultural evolution and understanding collective cognition. Under the first heading, I discuss some difficulties with the usual, adaptationist explanations of cultural phenomena, alternative explanations involving network diffusion effects… ▽ More

    Submitted 25 October, 2007; originally announced October 2007.

    Comments: 6 pages, 1 figure, AAAI format, submitted to AAAI spring 2008 symposium on "Social Information Processing"

  34. arXiv:0706.1062  [pdf, ps, other

    physics.data-an cond-mat.dis-nn stat.AP stat.ME

    Power-law distributions in empirical data

    Authors: Aaron Clauset, Cosma Rohilla Shalizi, M. E. J. Newman

    Abstract: Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution -- the part of the distribution representing large but rare events -- and by the diffic… ▽ More

    Submitted 2 February, 2009; v1 submitted 7 June, 2007; originally announced June 2007.

    Comments: 43 pages, 11 figures, 7 tables, 4 appendices; code available at http://www.santafe.edu/~aaronc/powerlaws/

    Journal ref: SIAM Review 51, 661-703 (2009)

  35. arXiv:math/0701854  [pdf, ps, other

    math.ST physics.data-an

    Maximum Likelihood Estimation for q-Exponential (Tsallis) Distributions

    Authors: Cosma Rohilla Shalizi

    Abstract: This expository note describes how to apply the method of maximum likelihood to estimate the parameters of the ``$q$-exponential'' distributions introduced by Tsallis and collaborators. It also describes the relationship of these distributions to the classical Pareto distributions.

    Submitted 31 January, 2007; v1 submitted 29 January, 2007; originally announced January 2007.

    Comments: 4 pages, 1 figure; accompanying R code available from http://bactra.org/research/tsallis-MLE/. V2: Added results on estimation from censored data, re-arranged introduction, minor corrections and wording changes throughout, updated code

    MSC Class: 62F10; 62P35

  36. arXiv:q-bio/0609008  [pdf, ps, other

    q-bio.NC nlin.AO physics.data-an q-bio.QM

    Discovering Functional Communities in Dynamical Networks

    Authors: Cosma Rohilla Shalizi, Marcelo F. Camperi, Kristina Lisa Klinkner

    Abstract: Many networks are important because they are substrates for dynamical systems, and their pattern of functional connectivity can itself be dynamic -- they can functionally reorganize, even if their underlying anatomical structure remains fixed. However, the recent rapid progress in discovering the community structure of networks has overwhelmingly focused on that constant anatomical connectivity.… ▽ More

    Submitted 29 September, 2006; v1 submitted 6 September, 2006; originally announced September 2006.

    Comments: 18 pages, 4 figures, Springer "Lecture Notes in Computer Science" style. Forthcoming in the proceedings of the workshop "Statistical Network Analysis: Models, Issues and New Directions", at ICML 2006. Version 2: small clarifications, typo corrections, added reference

  37. arXiv:nlin/0508001  [pdf, ps, other

    nlin.CG math.ST physics.data-an

    Automatic Filters for the Detection of Coherent Structure in Spatiotemporal Systems

    Authors: Cosma Rohilla Shalizi, Robert Haslinger, Jean-Baptiste Rouquier, Kristina Lisa Klinkner, Cristopher Moore

    Abstract: Most current methods for identifying coherent structures in spatially-extended systems rely on prior information about the form which those structures take. Here we present two new approaches to automatically filter the changing configurations of spatial dynamical systems and extract coherent structures. One, local sensitivity filtering, is a modification of the local Lyapunov exponent approach… ▽ More

    Submitted 29 July, 2005; originally announced August 2005.

    Comments: 16 pages, 21 figures. Figures considerably compressed to fit arxiv requirements; write first author for higher-resolution versions

    Journal ref: Physical Review E 73 (2006): 036104

  38. arXiv:nlin/0507067  [pdf, ps, other

    nlin.AO nlin.CG

    Quantifying Self-Organization in Cyclic Cellular Automata

    Authors: Cosma Rohilla Shalizi, Kristina Lisa Shalizi

    Abstract: Cyclic cellular automata (CCA) are models of excitable media. Started from random initial conditions, they produce several different kinds of spatial structure, depending on their control parameters. We introduce new tools from information theory that let us calculate the dynamical information content of spatial random processes. This complexity measure allows us to quantitatively determine the… ▽ More

    Submitted 29 July, 2005; originally announced July 2005.

    Comments: 10 pages, 6 figures. This was a preliminary report on the research whose final results appeared in nlin.AO/0409024. However, this report includes certain algorithmic details and discussion of related literature omitted from the paper for reasons of space

    Journal ref: Lutz Schimansky-Geier, Derek Abbott, Alexander Neiman and Christian Van den Broeck (eds.),_Noise in Complex Systems and Stochastic Dynamics_ (Bellingham, Washington: SPIE, 2003), pp. 108--117

  39. arXiv:q-bio/0506009  [pdf, ps, other

    q-bio.NC math.ST nlin.CD q-bio.QM

    Measuring Shared Information and Coordinated Activity in Neuronal Networks

    Authors: Kristina Lisa Klinkner, Cosma Rohilla Shalizi, Marcelo F. Camperi

    Abstract: Most nervous systems encode information about stimuli in the responding activity of large neuronal networks. This activity often manifests itself as dynamically coordinated sequences of action potentials. Since multiple electrode recordings are now a standard tool in neuroscience research, it is important to have a measure of such network-wide behavioral coordination and information sharing, app… ▽ More

    Submitted 29 July, 2005; v1 submitted 7 June, 2005; originally announced June 2005.

    Comments: 8 pages, 6 figures

  40. arXiv:cond-mat/0410063  [pdf, ps, other

    cond-mat.stat-mech

    The Backwards Arrow of Time of the Coherently Bayesian Statistical Mechanic

    Authors: Cosma Rohilla Shalizi

    Abstract: Many physicists think that the maximum entropy formalism is a straightforward application of Bayesian statistical ideas to statistical mechanics. Some even say that statistical mechanics is just the general Bayesian logic of inductive inference applied to large mechanical systems. This approach identifies thermodynamic entropy with the information-theoretic uncertainty of an (ideal) observer's s… ▽ More

    Submitted 8 November, 2004; v1 submitted 4 October, 2004; originally announced October 2004.

    Comments: 5 pages. Comments unusually welcome. V2: Added sub-section on long-run behavior of the posterior density

  41. arXiv:nlin/0409024  [pdf, ps, other

    nlin.AO cond-mat.stat-mech math.ST nlin.CG physics.data-an

    Quantifying Self-Organization with Optimal Predictors

    Authors: Cosma Rohilla Shalizi, Kristina Lisa Shalizi, Robert Haslinger

    Abstract: Despite broad interest in self-organizing systems, there are few quantitative, experimentally-applicable criteria for self-organization. The existing criteria all give counter-intuitive results for important cases. In this Letter, we propose a new criterion, namely an internally-generated increase in the statistical complexity, the amount of information required for optimal prediction of the sys… ▽ More

    Submitted 10 September, 2004; originally announced September 2004.

    Comments: Four pages, two color figures

    Journal ref: Physical Review Letters, vol. 93, no. 11 (10 September 2004), article 118701

  42. arXiv:cs/0406011  [pdf, ps, other

    cs.LG math.ST nlin.CD physics.data-an

    Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences

    Authors: Cosma Rohilla Shalizi, Kristina Lisa Shalizi

    Abstract: We present a new method for nonlinear prediction of discrete random sequences under minimal structural assumptions. We give a mathematical construction for optimal predictors of such processes, in the form of hidden Markov models. We then describe an algorithm, CSSR (Causal-State Splitting Reconstruction), which approximates the ideal predictor from data. We discuss the reliability of CSSR, its… ▽ More

    Submitted 6 June, 2004; originally announced June 2004.

    Comments: 8 pages, 4 figures

    ACM Class: I.2.6

    Journal ref: pp. 504--511 in Max Chickering and Joseph Halpern (eds.), _Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference_ (2004)

  43. arXiv:nlin/0307015  [pdf, ps, other

    nlin.AO cond-mat.stat-mech nlin.CD nlin.CG physics.data-an q-bio.QM

    Methods and Techniques of Complex Systems Science: An Overview

    Authors: Cosma Rohilla Shalizi

    Abstract: In this chapter, I review the main methods and techniques of complex systems science. As a first step, I distinguish among the broad patterns which recur across complex systems, the topics complex systems science commonly studies, the tools employed, and the foundational science of complex systems. The focus of this chapter is overwhelmingly on the third heading, that of tools. These in turn div… ▽ More

    Submitted 24 March, 2006; v1 submitted 9 July, 2003; originally announced July 2003.

    Comments: 96 pages, 8 figures. Versions 2 and 3: corrects minor typographical errors. Version 4: Expanded examples, updated references (through late 2004), matches published version up to changes in formatting

    Journal ref: Chapter 1 (pp. 33--114) in Thomas S. Deisboeck and J. Yasha Kresh (eds.),_Complex Systems Science in Biomedicine_ (New York: Springer, 2006)

  44. arXiv:math/0305160  [pdf, ps, other

    math.PR cond-mat.stat-mech nlin.CG physics.data-an

    Optimal Nonlinear Prediction of Random Fields on Networks

    Authors: Cosma Rohilla Shalizi

    Abstract: It is increasingly common to encounter time-varying random fields on networks (metabolic networks, sensor arrays, distributed computing, etc.). This paper considers the problem of optimal, nonlinear prediction of these fields, showing from an information-theoretic perspective that it is formally identical to the problem of finding minimal local sufficient statistics. I derive general properties… ▽ More

    Submitted 16 June, 2003; v1 submitted 12 May, 2003; originally announced May 2003.

    Comments: 20 pages, 5 figures. For the conference "Discrete Models of Complex Systems" (Lyon, June, 2003). v2: Typos fixed, regenerated figures should now produce readable PDF output

    Journal ref: Discrete Mathematics and Theoretical Computer Science, vol. AB(DMCS), pp. 11--30 (2003)

  45. arXiv:cond-mat/0303625  [pdf, ps, other

    cond-mat.stat-mech

    What Is a Macrostate? Subjective Observations and Objective Dynamics

    Authors: Cosma Rohilla Shalizi, Cristopher Moore

    Abstract: We consider the question of whether thermodynamic macrostates are objective consequences of dynamics, or subjective reflections of our ignorance of a physical system. We argue that they are both; more specifically, that the set of macrostates forms the unique maximal partition of phase space which 1) is consistent with our observations (a subjective fact about our ability to observe the system)… ▽ More

    Submitted 29 March, 2003; originally announced March 2003.

    Comments: 15 pages, no figures

  46. arXiv:cs/0210025  [pdf, ps, other

    cs.LG cs.CL

    An Algorithm for Pattern Discovery in Time Series

    Authors: Cosma Rohilla Shalizi, Kristina Lisa Shalizi, James P. Crutchfield

    Abstract: We present a new algorithm for discovering patterns in time series and other sequential data. We exhibit a reliable procedure for building the minimal set of hidden, Markovian states that is statistically capable of producing the behavior exhibited in the data -- the underlying process's causal states. Unlike conventional methods for fitting hidden Markov models (HMMs) to data, our algorithm mak… ▽ More

    Submitted 26 November, 2002; v1 submitted 28 October, 2002; originally announced October 2002.

    Comments: 26 pages, 5 figures; 5 tables; http://www.santafe.edu/projects/CompMech Added discussion of algorithm parameters; improved treatment of convergence and time complexity; added comparison to older methods

    Report number: SFI Working Paper 02-10-060 ACM Class: I.2.6; H.1.1; E.4

  47. arXiv:cond-mat/0207407  [pdf, ps, other

    cond-mat.stat-mech nlin.AO

    Symbolic Dynamics for Discrete Adaptive Games

    Authors: Cosma Rohilla Shalizi, David J. Albers

    Abstract: We use symbolic dynamics to study discrete adaptive games, such as the minority game and the El Farol Bar problem. We show that no such game can have deterministic chaos. We put upper bounds on the statistical complexity and period of these games; the former is at most linear in the number agents and the size of their memories. We extend our results to cases where the players have infinite-durat… ▽ More

    Submitted 14 March, 2003; v1 submitted 16 July, 2002; originally announced July 2002.

    Comments: 8 pages, no figures, RevTeX, submitted to PRE. v2: Improved and expanded discussion of symbolic dynamics, complexity and bounded rationality, in response to comments

  48. arXiv:nlin/0008038  [pdf, ps, other

    nlin.CG nlin.AO nlin.PS

    Upper Bound on the Products of Particle Interactions in Cellular Automata

    Authors: Wim Hordijk, Cosma Rohilla Shalizi, James P. Crutchfield

    Abstract: Particle-like objects are observed to propagate and interact in many spatially extended dynamical systems. For one of the simplest classes of such systems, one-dimensional cellular automata, we establish a rigorous upper bound on the number of distinct products that these interactions can generate. The upper bound is controlled by the structural complexity of the interacting particles---a quanti… ▽ More

    Submitted 30 January, 2001; v1 submitted 29 August, 2000; originally announced August 2000.

    Comments: 17 pages, 12 figures, 3 tables, http://www.santafe.edu/projects/CompMech/papers/ub.html V2: References and accompanying text modified, to comply with legal demands arising from on-going intellectual property litigation among third parties. V3: Accepted for publication in Physica D. References added and other small changes made per referee suggestions

    Journal ref: Physica D 154 (2001): 240--258

  49. arXiv:nlin/0006025  [pdf, ps, other

    nlin.AO cond-mat.dis-nn cs.LG physics.data-an

    Information Bottlenecks, Causal States, and Statistical Relevance Bases: How to Represent Relevant Information in Memoryless Transduction

    Authors: Cosma Rohilla Shalizi, James P. Crutchfield

    Abstract: Discovering relevant, but possibly hidden, variables is a key step in constructing useful and predictive theories about the natural world. This brief note explains the connections between three approaches to this problem: the recently introduced information-bottleneck method, the computational mechanics approach to inferring optimal models, and Salmon's statistical relevance basis.

    Submitted 16 June, 2000; originally announced June 2000.

    Comments: 3 pages, no figures, submitted to PRE as a "brief report". Revision: added an acknowledgements section originally omitted by a LaTeX bug

    Journal ref: Advances in Complex Systems, vol. 5, pp. 91--95 (2002)

  50. arXiv:cs/0001027  [pdf, ps, other

    cs.LG cs.NE

    Pattern Discovery and Computational Mechanics

    Authors: Cosma Rohilla Shalizi, James P. Crutchfield

    Abstract: Computational mechanics is a method for discovering, describing and quantifying patterns, using tools from statistical physics. It constructs optimal, minimal models of stochastic processes and their underlying causal structures. These models tell us about the intrinsic computation embedded within a process---how it stores and transforms information. Here we summarize the mathematics of computat… ▽ More

    Submitted 28 January, 2000; originally announced January 2000.

    Comments: 12 pages, 3 figures; submitted to the Proceedings of the 17th International Conference on Machine Learning (differs slightly in pagination and citation format from that version)

    Report number: SFI 00-01-008 ACM Class: I.2.6; F.1.3; G.3; H.1.1