Search | arXiv e-print repository

A Simple Non-Stationary Mean Ergodic Theorem, with Bonus Weak Law of Large Numbers

Abstract: This brief pedagogical note re-proves a simple theorem on the convergence, in $L_2$ and in probability, of time averages of non-stationary time series to the mean of expectation values. The basic condition is that the sum of covariances grows sub-quadratically with the length of the time series. I make no claim to originality; the result is widely, but unevenly, spread bit of folklore among users… ▽ More This brief pedagogical note re-proves a simple theorem on the convergence, in $L_2$ and in probability, of time averages of non-stationary time series to the mean of expectation values. The basic condition is that the sum of covariances grows sub-quadratically with the length of the time series. I make no claim to originality; the result is widely, but unevenly, spread bit of folklore among users of applied probability. The goal of this note is merely to even out that distribution. △ Less

Submitted 19 March, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

Comments: v2: Fixed notation to replace statements like $A_n \rightarrow m_n$ with ones like $A_n - m_n \rightarrow 0$; small wording changes and typo corrections in Remark 3

arXiv:1912.03387 [pdf, other]

Conditional Mutual Information Estimation for Mixed Discrete and Continuous Variables with Nearest Neighbors

Authors: Octavio César Mesner, Cosma Rohilla Shalizi

Abstract: Fields like public health, public policy, and social science often want to quantify the degree of dependence between variables whose relationships take on unknown functional forms. Typically, in fact, researchers in these fields are attempting to evaluate causal theories, and so want to quantify dependence after conditioning on other variables that might explain, mediate or confound causal relatio… ▽ More Fields like public health, public policy, and social science often want to quantify the degree of dependence between variables whose relationships take on unknown functional forms. Typically, in fact, researchers in these fields are attempting to evaluate causal theories, and so want to quantify dependence after conditioning on other variables that might explain, mediate or confound causal relations. One reason conditional mutual information is not more widely used for these tasks is the lack of estimators which can handle combinations of continuous and discrete random variables, common in applications. This paper develops a new method for estimating mutual and conditional mutual information for data samples containing a mix of discrete and continuous variables. We prove that this estimator is consistent and show, via simulation, that it is more accurate than similar estimators. △ Less

Submitted 6 December, 2019; originally announced December 2019.

arXiv:1711.02834 [pdf, other]

Bootstrap** Generalization Error Bounds for Time Series

Authors: Robert Lunde, Cosma Rohilla Shalizi

Abstract: We consider the problem of finding confidence intervals for the risk of forecasting the future of a stationary, ergodic stochastic process, using a model estimated from the past of the process. We show that a bootstrap procedure provides valid confidence intervals for the risk, when the data source is sufficiently mixing, and the loss function and the estimator are suitably smooth. Autoregressive… ▽ More We consider the problem of finding confidence intervals for the risk of forecasting the future of a stationary, ergodic stochastic process, using a model estimated from the past of the process. We show that a bootstrap procedure provides valid confidence intervals for the risk, when the data source is sufficiently mixing, and the loss function and the estimator are suitably smooth. Autoregressive (AR(d)) models estimated by least squares obey the necessary regularity conditions, even when mis-specified, and simulations show that the finite- sample coverage of our bounds quickly converges to the theoretical, asymptotic level. As an intermediate step, we derive sufficient conditions for asymptotic independence between empirical distribution functions formed by splitting a realization of a stochastic process, of independent interest. △ Less

Submitted 29 November, 2017; v1 submitted 8 November, 2017; originally announced November 2017.

arXiv:1711.02123 [pdf, ps, other]

Consistency of Maximum Likelihood for Continuous-Space Network Models I

Authors: Cosma Rohilla Shalizi, Dena Marie Asta

Abstract: A very popular class of models for networks posits that each node is represented by a point in a continuous latent space, and that the probability of an edge between nodes is a decreasing function of the distance between them in this latent space. We study the embedding problem for these models, of recovering the latent positions from the observed graph. Assuming certain natural symmetry and smoot… ▽ More A very popular class of models for networks posits that each node is represented by a point in a continuous latent space, and that the probability of an edge between nodes is a decreasing function of the distance between them in this latent space. We study the embedding problem for these models, of recovering the latent positions from the observed graph. Assuming certain natural symmetry and smoothness properties, we establish the uniform convergence of the log-likelihood of latent positions as the number of nodes grows. A consequence is that the maximum likelihood embedding converges on the true positions in a certain information-theoretic sense. Extensions of these results, to recovering distributions in the latent space, and so distributions over arbitrarily large graphs, will be treated in the sequel. △ Less

Submitted 29 June, 2022; v1 submitted 6 November, 2017; originally announced November 2017.

Comments: 17 pages

arXiv:1709.09702 [pdf, other]

Projective, Sparse, and Learnable Latent Position Network Models

Authors: Neil A. Spencer, Cosma Rohilla Shalizi

Abstract: When modeling network data using a latent position model, it is typical to assume that the nodes' positions are independently and identically distributed. However, this assumption implies the average node degree grows linearly with the number of nodes, which is inappropriate when the graph is thought to be sparse. We propose an alternative assumption -- that the latent positions are generated acco… ▽ More When modeling network data using a latent position model, it is typical to assume that the nodes' positions are independently and identically distributed. However, this assumption implies the average node degree grows linearly with the number of nodes, which is inappropriate when the graph is thought to be sparse. We propose an alternative assumption -- that the latent positions are generated according to a Poisson point process -- and show that it is compatible with various levels of sparsity. Unlike other notions of sparse latent position models in the literature, our framework also defines a projective sequence of probability models, thus ensuring consistency of statistical inference across networks of different sizes. We establish conditions for consistent estimation of the latent positions, and compare our results to existing frameworks for modeling sparse networks. △ Less

Submitted 8 September, 2023; v1 submitted 27 September, 2017; originally announced September 2017.

Comments: 70 pages, 2 figures

arXiv:1411.1350 [pdf, other]

Geometric Network Comparison

Authors: Dena Asta, Cosma Rohilla Shalizi

Abstract: Network analysis has a crucial need for tools to compare networks and assess the significance of differences between networks. We propose a principled statistical approach to network comparison that approximates networks as probability distributions on negatively curved manifolds. We outline the theory, as well as implement the approach on simulated networks. Network analysis has a crucial need for tools to compare networks and assess the significance of differences between networks. We propose a principled statistical approach to network comparison that approximates networks as probability distributions on negatively curved manifolds. We outline the theory, as well as implement the approach on simulated networks. △ Less

Submitted 5 November, 2014; originally announced November 2014.

arXiv:1212.0463 [pdf, other]

Nonparametric risk bounds for time-series forecasting

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

Abstract: We derive generalization error bounds for traditional time-series forecasting models. Our results hold for many standard forecasting tools including autoregressive models, moving average models, and, more generally, linear state-space models. These non-asymptotic bounds need only weak assumptions on the data-generating process, yet allow forecasters to select among competing models and to guarante… ▽ More We derive generalization error bounds for traditional time-series forecasting models. Our results hold for many standard forecasting tools including autoregressive models, moving average models, and, more generally, linear state-space models. These non-asymptotic bounds need only weak assumptions on the data-generating process, yet allow forecasters to select among competing models and to guarantee, with high probability, that their chosen model will perform well. We motivate our techniques with and apply them to standard economic and financial forecasting tools---a GARCH model for predicting equity volatility and a dynamic stochastic general equilibrium model (DSGE), the standard tool in macroeconomic forecasting. We demonstrate in particular how our techniques can aid forecasters and policy makers in choosing models which behave well under uncertainty and mis-specification. △ Less

Submitted 10 September, 2016; v1 submitted 3 December, 2012; originally announced December 2012.

Comments: 34 pages, 3 figures

MSC Class: 62M20 (Primary) 91B84; 62G99 (Secondary)

Journal ref: Journal of Machine Learning Research. (2017). Vol 18. p. 1-40

arXiv:1207.3994 [pdf, other]

doi 10.1088/1742-5468/2014/05/P05007

Model Selection for Degree-corrected Block Models

Authors: Xiaoran Yan, Cosma Rohilla Shalizi, Jacob E. Jensen, Florent Krzakala, Cristopher Moore, Lenka Zdeborova, Pan Zhang, Yaojia Zhu

Abstract: The proliferation of models for networks raises challenging problems of model selection: the data are sparse and globally dependent, and models are typically high-dimensional and have large numbers of latent variables. Together, these issues mean that the usual model-selection criteria do not work properly for networks. We illustrate these challenges, and show one way to resolve them, by consideri… ▽ More The proliferation of models for networks raises challenging problems of model selection: the data are sparse and globally dependent, and models are typically high-dimensional and have large numbers of latent variables. Together, these issues mean that the usual model-selection criteria do not work properly for networks. We illustrate these challenges, and show one way to resolve them, by considering the key network-analysis problem of dividing a graph into communities or blocks of nodes with homogeneous patterns of links to the rest of the network. The standard tool for doing this is the stochastic block model, under which the probability of a link between two nodes is a function solely of the blocks to which they belong. This imposes a homogeneous degree distribution within each block; this can be unrealistic, so degree-corrected block models add a parameter for each node, modulating its over-all degree. The choice between ordinary and degree-corrected block models matters because they make very different inferences about communities. We present the first principled and tractable approach to model selection between standard and degree-corrected block models, based on new large-graph asymptotics for the distribution of log-likelihood ratios under the stochastic block model, finding substantial departures from classical results for sparse graphs. We also develop linear-time approximations for log-likelihoods under both the stochastic block model and the degree-corrected model, using belief propagation. Applications to simulated and real networks show excellent agreement with our approximations. Our results thus both solve the practical problem of deciding on degree correction, and point to a general approach to model selection in network analysis. △ Less

Submitted 30 May, 2013; v1 submitted 17 July, 2012; originally announced July 2012.

Journal ref: J. Stat. Mech. (2014) P05007

arXiv:1111.3054 [pdf, ps, other]

doi 10.1214/12-AOS1044

Consistency under sampling of exponential random graph models

Authors: Cosma Rohilla Shalizi, Alessandro Rinaldo

Abstract: The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-netwo… ▽ More The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focusing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general results about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses. △ Less

Submitted 9 May, 2013; v1 submitted 13 November, 2011; originally announced November 2011.

Comments: Published in at http://dx.doi.org/10.1214/12-AOS1044 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1044

Journal ref: Annals of Statistics 2013, Vol. 41, No. 2, 508-535

arXiv:1109.5998 [pdf, other]

doi 10.1214/15-EJS1094

Estimating beta-mixing coefficients via histograms

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

Abstract: The literature on statistical learning for time series often assumes asymptotic independence or "mixing" of the data-generating process. These mixing assumptions are never tested, nor are there methods for estimating mixing coefficients from data. Additionally, for many common classes of processes (Markov processes, ARMA processes, etc.) general functional forms for various mixing rates are known,… ▽ More The literature on statistical learning for time series often assumes asymptotic independence or "mixing" of the data-generating process. These mixing assumptions are never tested, nor are there methods for estimating mixing coefficients from data. Additionally, for many common classes of processes (Markov processes, ARMA processes, etc.) general functional forms for various mixing rates are known, but not specific coefficients. We present the first estimator for beta-mixing coefficients based on a single stationary sample path and show that it is risk consistent. Since mixing rates depend on infinite-dimensional dependence, we use a Markov approximation based on only a finite memory length $d$. We present convergence rates for the Markov approximation and show that as $d\rightarrow\infty$, the Markov approximation converges to the true mixing coefficient. Our estimator is constructed using $d$-dimensional histogram density estimates. Allowing asymptotics in the bandwidth as well as the dimension, we prove $L^1$ concentration for the histogram as an intermediate step. Simulations wherein the mixing rates are calculable and a real-data example demonstrate our methodology. △ Less

Submitted 8 February, 2016; v1 submitted 27 September, 2011; originally announced September 2011.

Comments: 30 pages, 8 figures. Longer version of arXiv:1103.0941 [stat.ML]

Journal ref: Electron. J. Statist. 9 (2015), no. 2, 2855--2883

arXiv:1103.0941 [pdf, ps, other]

Estimating $β$-mixing coefficients

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

Abstract: The literature on statistical learning for time series assumes the asymptotic independence or ``mixing' of the data-generating process. These mixing assumptions are never tested, nor are there methods for estimating mixing rates from data. We give an estimator for the $β$-mixing rate based on a single stationary sample path and show it is $L_1$-risk consistent. The literature on statistical learning for time series assumes the asymptotic independence or ``mixing' of the data-generating process. These mixing assumptions are never tested, nor are there methods for estimating mixing rates from data. We give an estimator for the $β$-mixing rate based on a single stationary sample path and show it is $L_1$-risk consistent. △ Less

Submitted 4 March, 2011; originally announced March 2011.

Comments: 9 pages, accepted by AIStats. CMU Statistics Technical Report

Journal ref: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011), pp. 516--524

arXiv:1006.3868 [pdf, other]

doi 10.1111/j.2044-8317.2011.02037.x

Philosophy and the practice of Bayesian statistics

Authors: Andrew Gelman, Cosma Rohilla Shalizi

Abstract: A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypoth… ▽ More A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework. △ Less

Submitted 28 June, 2011; v1 submitted 19 June, 2010; originally announced June 2010.

Comments: 36 pages, 5 figures. v2: Fixed typo in caption of figure 1. v3: Further typo fixes. v4: Revised in response to referees

Journal ref: British Journal of Mathematical and Statistical Psychology, vol. 66 (2013), pp. 8--38

arXiv:0901.1342 [pdf, other]

doi 10.1214/09-EJS485

Dynamics of Bayesian Updating with Dependent Data and Misspecified Models

Authors: Cosma Rohilla Shalizi

Abstract: Much is now known about the consistency of Bayesian updating on infinite-dimensional parameter spaces with independent or Markovian data. Necessary conditions for consistency include the prior putting enough weight on the correct neighborhoods of the data-generating distribution; various sufficient conditions further restrict the prior in ways analogous to capacity control in frequentist nonpara… ▽ More Much is now known about the consistency of Bayesian updating on infinite-dimensional parameter spaces with independent or Markovian data. Necessary conditions for consistency include the prior putting enough weight on the correct neighborhoods of the data-generating distribution; various sufficient conditions further restrict the prior in ways analogous to capacity control in frequentist nonparametrics. The asymptotics of Bayesian updating with mis-specified models or priors, or non-Markovian data, are far less well explored. Here I establish sufficient conditions for posterior convergence when all hypotheses are wrong, and the data have complex dependencies. The main dynamical assumption is the asymptotic equipartition (Shannon-McMillan-Breiman) property of information theory. This, along with Egorov's Theorem on uniform convergence, lets me build a sieve-like structure for the prior. The main statistical assumption, also a form of capacity control, concerns the compatibility of the prior and the data-generating process, controlling the fluctuations in the log-likelihood when averaged over the sieve-like sets. In addition to posterior convergence, I derive a kind of large deviations principle for the posterior measure, extending in some cases to rates of convergence, and discuss the advantages of predicting using a combination of models known to be wrong. An appendix sketches connections between these results and the replicator dynamics of evolutionary theory. △ Less

Submitted 13 November, 2009; v1 submitted 11 January, 2009; originally announced January 2009.

Comments: 36 pages, 1 figure. v2: typo fixes, minor formatting changes. v3: Improved notation, added references, new theorem on convergence rates. v4: minor changes to text, added references. v5: Minor typo corrections; matches journal version except for format details

MSC Class: 62C10; 62G20; 62M09; 60F10; 62M05; 92D15; 94A17

Journal ref: _Electronic Journal of Statistics_, vol. 3 (2009): 1039--1074

arXiv:math/0701854 [pdf, ps, other]

Maximum Likelihood Estimation for q-Exponential (Tsallis) Distributions

Authors: Cosma Rohilla Shalizi

Abstract: This expository note describes how to apply the method of maximum likelihood to estimate the parameters of the ``$q$-exponential'' distributions introduced by Tsallis and collaborators. It also describes the relationship of these distributions to the classical Pareto distributions. This expository note describes how to apply the method of maximum likelihood to estimate the parameters of the ``$q$-exponential'' distributions introduced by Tsallis and collaborators. It also describes the relationship of these distributions to the classical Pareto distributions. △ Less

Submitted 31 January, 2007; v1 submitted 29 January, 2007; originally announced January 2007.

Comments: 4 pages, 1 figure; accompanying R code available from http://bactra.org/research/tsallis-MLE/. V2: Added results on estimation from censored data, re-arranged introduction, minor corrections and wording changes throughout, updated code

MSC Class: 62F10; 62P35

arXiv:nlin/0508001 [pdf, ps, other]

doi 10.1103/PhysRevE.73.036104

Automatic Filters for the Detection of Coherent Structure in Spatiotemporal Systems

Authors: Cosma Rohilla Shalizi, Robert Haslinger, Jean-Baptiste Rouquier, Kristina Lisa Klinkner, Cristopher Moore

Abstract: Most current methods for identifying coherent structures in spatially-extended systems rely on prior information about the form which those structures take. Here we present two new approaches to automatically filter the changing configurations of spatial dynamical systems and extract coherent structures. One, local sensitivity filtering, is a modification of the local Lyapunov exponent approach… ▽ More Most current methods for identifying coherent structures in spatially-extended systems rely on prior information about the form which those structures take. Here we present two new approaches to automatically filter the changing configurations of spatial dynamical systems and extract coherent structures. One, local sensitivity filtering, is a modification of the local Lyapunov exponent approach suitable to cellular automata and other discrete spatial systems. The other, local statistical complexity filtering, calculates the amount of information needed for optimal prediction of the system's behavior in the vicinity of a given point. By examining the changing spatiotemporal distributions of these quantities, we can find the coherent structures in a variety of pattern-forming cellular automata, without needing to guess or postulate the form of that structure. We apply both filters to elementary and cyclical cellular automata (ECA and CCA) and find that they readily identify particles, domains and other more complicated structures. We compare the results from ECA with earlier ones based upon the theory of formal languages, and the results from CCA with a more traditional approach based on an order parameter and free energy. While sensitivity and statistical complexity are equally adept at uncovering structure, they are based on different system properties (dynamical and probabilistic, respectively), and provide complementary information. △ Less

Submitted 29 July, 2005; originally announced August 2005.

Comments: 16 pages, 21 figures. Figures considerably compressed to fit arxiv requirements; write first author for higher-resolution versions

Journal ref: Physical Review E 73 (2006): 036104

arXiv:q-bio/0506009 [pdf, ps, other]

Measuring Shared Information and Coordinated Activity in Neuronal Networks

Authors: Kristina Lisa Klinkner, Cosma Rohilla Shalizi, Marcelo F. Camperi

Abstract: Most nervous systems encode information about stimuli in the responding activity of large neuronal networks. This activity often manifests itself as dynamically coordinated sequences of action potentials. Since multiple electrode recordings are now a standard tool in neuroscience research, it is important to have a measure of such network-wide behavioral coordination and information sharing, app… ▽ More Most nervous systems encode information about stimuli in the responding activity of large neuronal networks. This activity often manifests itself as dynamically coordinated sequences of action potentials. Since multiple electrode recordings are now a standard tool in neuroscience research, it is important to have a measure of such network-wide behavioral coordination and information sharing, applicable to multiple neural spike train data. We propose a new statistic, informational coherence, which measures how much better one unit can be predicted by knowing the dynamical state of another. We argue informational coherence is a measure of association and shared information which is superior to traditional pairwise measures of synchronization and correlation. To find the dynamical states, we use a recently-introduced algorithm which reconstructs effective state spaces from stochastic time series. We then extend the pairwise measure to a multivariate analysis of the network by estimating the network multi-information. We illustrate our method by testing it on a detailed model of the transition from gamma to beta rhythms. △ Less

Submitted 29 July, 2005; v1 submitted 7 June, 2005; originally announced June 2005.

Comments: 8 pages, 6 figures

arXiv:nlin/0409024 [pdf, ps, other]

doi 10.1103/PhysRevLett.93.118701

Quantifying Self-Organization with Optimal Predictors

Authors: Cosma Rohilla Shalizi, Kristina Lisa Shalizi, Robert Haslinger

Abstract: Despite broad interest in self-organizing systems, there are few quantitative, experimentally-applicable criteria for self-organization. The existing criteria all give counter-intuitive results for important cases. In this Letter, we propose a new criterion, namely an internally-generated increase in the statistical complexity, the amount of information required for optimal prediction of the sys… ▽ More Despite broad interest in self-organizing systems, there are few quantitative, experimentally-applicable criteria for self-organization. The existing criteria all give counter-intuitive results for important cases. In this Letter, we propose a new criterion, namely an internally-generated increase in the statistical complexity, the amount of information required for optimal prediction of the system's dynamics. We precisely define this complexity for spatially-extended dynamical systems, using the probabilistic ideas of mutual information and minimal sufficient statistics. This leads to a general method for predicting such systems, and a simple algorithm for estimating statistical complexity. The results of applying this algorithm to a class of models of excitable media (cyclic cellular automata) strongly support our proposal. △ Less

Submitted 10 September, 2004; originally announced September 2004.

Comments: Four pages, two color figures

Journal ref: Physical Review Letters, vol. 93, no. 11 (10 September 2004), article 118701

arXiv:cs/0406011 [pdf, ps, other]

Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences

Authors: Cosma Rohilla Shalizi, Kristina Lisa Shalizi

Abstract: We present a new method for nonlinear prediction of discrete random sequences under minimal structural assumptions. We give a mathematical construction for optimal predictors of such processes, in the form of hidden Markov models. We then describe an algorithm, CSSR (Causal-State Splitting Reconstruction), which approximates the ideal predictor from data. We discuss the reliability of CSSR, its… ▽ More We present a new method for nonlinear prediction of discrete random sequences under minimal structural assumptions. We give a mathematical construction for optimal predictors of such processes, in the form of hidden Markov models. We then describe an algorithm, CSSR (Causal-State Splitting Reconstruction), which approximates the ideal predictor from data. We discuss the reliability of CSSR, its data requirements, and its performance in simulations. Finally, we compare our approach to existing methods using variable-length Markov models and cross-validated hidden Markov models, and show theoretically and experimentally that our method delivers results superior to the former and at least comparable to the latter. △ Less

Submitted 6 June, 2004; originally announced June 2004.

Comments: 8 pages, 4 figures

ACM Class: I.2.6

Journal ref: pp. 504--511 in Max Chickering and Joseph Halpern (eds.), _Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference_ (2004)

arXiv:math/0305160 [pdf, ps, other]

doi 10.46298/dmtcs.2310

Optimal Nonlinear Prediction of Random Fields on Networks

Authors: Cosma Rohilla Shalizi

Abstract: It is increasingly common to encounter time-varying random fields on networks (metabolic networks, sensor arrays, distributed computing, etc.). This paper considers the problem of optimal, nonlinear prediction of these fields, showing from an information-theoretic perspective that it is formally identical to the problem of finding minimal local sufficient statistics. I derive general properties… ▽ More It is increasingly common to encounter time-varying random fields on networks (metabolic networks, sensor arrays, distributed computing, etc.). This paper considers the problem of optimal, nonlinear prediction of these fields, showing from an information-theoretic perspective that it is formally identical to the problem of finding minimal local sufficient statistics. I derive general properties of these statistics, show that they can be composed into global predictors, and explore their recursive estimation properties. For the special case of discrete-valued fields, I describe a convergent algorithm to identify the local predictors from empirical data, with minimal prior information about the field, and no distributional assumptions. △ Less

Submitted 16 June, 2003; v1 submitted 12 May, 2003; originally announced May 2003.

Comments: 20 pages, 5 figures. For the conference "Discrete Models of Complex Systems" (Lyon, June, 2003). v2: Typos fixed, regenerated figures should now produce readable PDF output

Journal ref: Discrete Mathematics and Theoretical Computer Science, vol. AB(DMCS), pp. 11--30 (2003)

Showing 1–19 of 19 results for author: Shalizi, C R