Search | arXiv e-print repository

Consistency of Maximum Likelihood for Continuous-Space Network Models I

Authors: Cosma Rohilla Shalizi, Dena Marie Asta

Abstract: A very popular class of models for networks posits that each node is represented by a point in a continuous latent space, and that the probability of an edge between nodes is a decreasing function of the distance between them in this latent space. We study the embedding problem for these models, of recovering the latent positions from the observed graph. Assuming certain natural symmetry and smoot… ▽ More A very popular class of models for networks posits that each node is represented by a point in a continuous latent space, and that the probability of an edge between nodes is a decreasing function of the distance between them in this latent space. We study the embedding problem for these models, of recovering the latent positions from the observed graph. Assuming certain natural symmetry and smoothness properties, we establish the uniform convergence of the log-likelihood of latent positions as the number of nodes grows. A consequence is that the maximum likelihood embedding converges on the true positions in a certain information-theoretic sense. Extensions of these results, to recovering distributions in the latent space, and so distributions over arbitrarily large graphs, will be treated in the sequel. △ Less

Submitted 29 June, 2022; v1 submitted 6 November, 2017; originally announced November 2017.

Comments: 17 pages

arXiv:1607.06565 [pdf, other]

doi 10.1080/01621459.2021.1953506

Estimating Causal Peer Influence in Homophilous Social Networks by Inferring Latent Locations

Authors: Edward McFowland III, Cosma Rohilla Shalizi

Abstract: Social influence cannot be identified from purely observational data on social networks, because such influence is generically confounded with latent homophily, i.e., with a node's network partners being informative about the node's attributes and therefore its behavior. If the network grows according to either a latent community (stochastic block) model, or a continuous latent space model, then l… ▽ More Social influence cannot be identified from purely observational data on social networks, because such influence is generically confounded with latent homophily, i.e., with a node's network partners being informative about the node's attributes and therefore its behavior. If the network grows according to either a latent community (stochastic block) model, or a continuous latent space model, then latent homophilous attributes can be consistently estimated from the global pattern of social ties. We show that, for common versions of those two network models, these estimates are so informative that controlling for estimated attributes allows for asymptotically unbiased and consistent estimation of social-influence effects in linear models. In particular, the bias shrinks at a rate which directly reflects how much information the network provides about the latent attributes. These are the first results on the consistent non-experimental estimation of social-influence effects in the presence of latent homophily, and we discuss the prospects for generalizing them. △ Less

Submitted 17 June, 2021; v1 submitted 22 July, 2016; originally announced July 2016.

Comments: 35 pages, 4 figures

Journal ref: Journal of the American Statistical Association (2022)

arXiv:1506.02686 [pdf, other]

The LICORS Cabinet: Nonparametric Algorithms for Spatio-temporal Prediction

Authors: George D. Montanez, Cosma Rohilla Shalizi

Abstract: Spatio-temporal data is intrinsically high dimensional, so unsupervised modeling is only feasible if we can exploit structure in the process. When the dynamics are local in both space and time, this structure can be exploited by splitting the global field into many lower-dimensional "light cones". We review light cone decompositions for predictive state reconstruction, introducing three simple lig… ▽ More Spatio-temporal data is intrinsically high dimensional, so unsupervised modeling is only feasible if we can exploit structure in the process. When the dynamics are local in both space and time, this structure can be exploited by splitting the global field into many lower-dimensional "light cones". We review light cone decompositions for predictive state reconstruction, introducing three simple light cone algorithms. These methods allow for tractable inference of spatio-temporal data, such as full-frame video. The algorithms make few assumptions on the underlying process yet have good predictive performance and can provide distributions over spatio-temporal data, enabling sophisticated probabilistic inference. △ Less

Submitted 14 September, 2016; v1 submitted 8 June, 2015; originally announced June 2015.

arXiv:1212.0463 [pdf, other]

Nonparametric risk bounds for time-series forecasting

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

Abstract: We derive generalization error bounds for traditional time-series forecasting models. Our results hold for many standard forecasting tools including autoregressive models, moving average models, and, more generally, linear state-space models. These non-asymptotic bounds need only weak assumptions on the data-generating process, yet allow forecasters to select among competing models and to guarante… ▽ More We derive generalization error bounds for traditional time-series forecasting models. Our results hold for many standard forecasting tools including autoregressive models, moving average models, and, more generally, linear state-space models. These non-asymptotic bounds need only weak assumptions on the data-generating process, yet allow forecasters to select among competing models and to guarantee, with high probability, that their chosen model will perform well. We motivate our techniques with and apply them to standard economic and financial forecasting tools---a GARCH model for predicting equity volatility and a dynamic stochastic general equilibrium model (DSGE), the standard tool in macroeconomic forecasting. We demonstrate in particular how our techniques can aid forecasters and policy makers in choosing models which behave well under uncertainty and mis-specification. △ Less

Submitted 10 September, 2016; v1 submitted 3 December, 2012; originally announced December 2012.

Comments: 34 pages, 3 figures

MSC Class: 62M20 (Primary) 91B84; 62G99 (Secondary)

Journal ref: Journal of Machine Learning Research. (2017). Vol 18. p. 1-40

arXiv:1207.3994 [pdf, other]

doi 10.1088/1742-5468/2014/05/P05007

Model Selection for Degree-corrected Block Models

Authors: Xiaoran Yan, Cosma Rohilla Shalizi, Jacob E. Jensen, Florent Krzakala, Cristopher Moore, Lenka Zdeborova, Pan Zhang, Yaojia Zhu

Abstract: The proliferation of models for networks raises challenging problems of model selection: the data are sparse and globally dependent, and models are typically high-dimensional and have large numbers of latent variables. Together, these issues mean that the usual model-selection criteria do not work properly for networks. We illustrate these challenges, and show one way to resolve them, by consideri… ▽ More The proliferation of models for networks raises challenging problems of model selection: the data are sparse and globally dependent, and models are typically high-dimensional and have large numbers of latent variables. Together, these issues mean that the usual model-selection criteria do not work properly for networks. We illustrate these challenges, and show one way to resolve them, by considering the key network-analysis problem of dividing a graph into communities or blocks of nodes with homogeneous patterns of links to the rest of the network. The standard tool for doing this is the stochastic block model, under which the probability of a link between two nodes is a function solely of the blocks to which they belong. This imposes a homogeneous degree distribution within each block; this can be unrealistic, so degree-corrected block models add a parameter for each node, modulating its over-all degree. The choice between ordinary and degree-corrected block models matters because they make very different inferences about communities. We present the first principled and tractable approach to model selection between standard and degree-corrected block models, based on new large-graph asymptotics for the distribution of log-likelihood ratios under the stochastic block model, finding substantial departures from classical results for sparse graphs. We also develop linear-time approximations for log-likelihoods under both the stochastic block model and the degree-corrected model, using belief propagation. Applications to simulated and real networks show excellent agreement with our approximations. Our results thus both solve the practical problem of deciding on degree correction, and point to a general approach to model selection in network analysis. △ Less

Submitted 30 May, 2013; v1 submitted 17 July, 2012; originally announced July 2012.

Journal ref: J. Stat. Mech. (2014) P05007

arXiv:1106.0730 [pdf, ps, other]

Rademacher complexity of stationary sequences

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi

Abstract: We show how to control the generalization error of time series models wherein past values of the outcome are used to predict future values. The results are based on a generalization of standard i.i.d. concentration inequalities to dependent data without the mixing assumptions common in the time series setting. Our proof and the result are simpler than previous analyses with dependent data or stoch… ▽ More We show how to control the generalization error of time series models wherein past values of the outcome are used to predict future values. The results are based on a generalization of standard i.i.d. concentration inequalities to dependent data without the mixing assumptions common in the time series setting. Our proof and the result are simpler than previous analyses with dependent data or stochastic adversaries which use sequential Rademacher complexities rather than the expected Rademacher complexity for i.i.d. processes. We also derive empirical Rademacher results without mixing assumptions resulting in fully calculable upper bounds. △ Less

Submitted 22 May, 2017; v1 submitted 3 June, 2011; originally announced June 2011.

Comments: 15 pages, 1 figure

arXiv:1103.0949 [pdf, other]

Adapting to Non-stationarity with Growing Expert Ensembles

Authors: Cosma Rohilla Shalizi, Abigail Z. Jacobs, Kristina Lisa Klinkner, Aaron Clauset

Abstract: When dealing with time series with complex non-stationarities, low retrospective regret on individual realizations is a more appropriate goal than low prospective risk in expectation. Online learning algorithms provide powerful guarantees of this form, and have often been proposed for use with non-stationary processes because of their ability to switch between different forecasters or ``experts''.… ▽ More When dealing with time series with complex non-stationarities, low retrospective regret on individual realizations is a more appropriate goal than low prospective risk in expectation. Online learning algorithms provide powerful guarantees of this form, and have often been proposed for use with non-stationary processes because of their ability to switch between different forecasters or ``experts''. However, existing methods assume that the set of experts whose forecasts are to be combined are all given at the start, which is not plausible when dealing with a genuinely historical or evolutionary system. We show how to modify the ``fixed shares'' algorithm for tracking the best expert to cope with a steadily growing set of experts, obtained by fitting new models to new data as it becomes available, and obtain regret bounds for the growing ensemble. △ Less

Submitted 28 June, 2011; v1 submitted 4 March, 2011; originally announced March 2011.

Comments: 9 pages, 1 figure; CMU Statistics Technical Report. v2: Added empirical example, revised discussion of related work

arXiv:1103.0942 [pdf, other]

Generalization error bounds for stationary autoregressive models

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

Abstract: We derive generalization error bounds for stationary univariate autoregressive (AR) models. We show that imposing stationarity is enough to control the Gaussian complexity without further regularization. This lets us use structural risk minimization for model selection. We demonstrate our methods by predicting interest rate movements. We derive generalization error bounds for stationary univariate autoregressive (AR) models. We show that imposing stationarity is enough to control the Gaussian complexity without further regularization. This lets us use structural risk minimization for model selection. We demonstrate our methods by predicting interest rate movements. △ Less

Submitted 3 June, 2011; v1 submitted 4 March, 2011; originally announced March 2011.

Comments: 10 pages, 3 figures. CMU Statistics Technical Report

arXiv:1103.0941 [pdf, ps, other]

Estimating $β$-mixing coefficients

Authors: Daniel J. McDonald, Cosma Rohilla Shalizi, Mark Schervish

Abstract: The literature on statistical learning for time series assumes the asymptotic independence or ``mixing' of the data-generating process. These mixing assumptions are never tested, nor are there methods for estimating mixing rates from data. We give an estimator for the $β$-mixing rate based on a single stationary sample path and show it is $L_1$-risk consistent. The literature on statistical learning for time series assumes the asymptotic independence or ``mixing' of the data-generating process. These mixing assumptions are never tested, nor are there methods for estimating mixing rates from data. We give an estimator for the $β$-mixing rate based on a single stationary sample path and show it is $L_1$-risk consistent. △ Less

Submitted 4 March, 2011; originally announced March 2011.

Comments: 9 pages, accepted by AIStats. CMU Statistics Technical Report

Journal ref: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011), pp. 516--524

arXiv:1004.4704 [pdf, other]

doi 10.1177/0049124111404820

Homophily and Contagion Are Generically Confounded in Observational Social Network Studies

Authors: Cosma Rohilla Shalizi, Andrew C. Thomas

Abstract: We consider processes on social networks that can potentially involve three factors: homophily, or the formation of social ties due to matching individual traits; social contagion, also known as social influence; and the causal effect of an individual's covariates on their behavior or other measurable responses. We show that, generically, all of these are confounded with each other. Distinguishing… ▽ More We consider processes on social networks that can potentially involve three factors: homophily, or the formation of social ties due to matching individual traits; social contagion, also known as social influence; and the causal effect of an individual's covariates on their behavior or other measurable responses. We show that, generically, all of these are confounded with each other. Distinguishing them from one another requires strong assumptions on the parametrization of the social process or on the adequacy of the covariates used (or both). In particular we demonstrate, with simple examples, that asymmetries in regression coefficients cannot identify causal effects, and that very simple models of imitation (a form of social contagion) can produce substantial correlations between an individual's enduring traits and their choices, even when there is no intrinsic affinity between them. We also suggest some possible constructive responses to these results. △ Less

Submitted 29 November, 2010; v1 submitted 27 April, 2010; originally announced April 2010.

Comments: 27 pages, 9 figures. V2: Revised in response to referees. V3: Ditto

Journal ref: Sociological Methods and Research, vol. 40 (2011), pp. 211--239

arXiv:1001.0036 [pdf, other]

doi 10.1162/neco.2009.12-07-678

The Computational Structure of Spike Trains

Authors: Robert Haslinger, Kristina Lisa Klinkner, Cosma Rohilla Shalizi

Abstract: Neurons perform computations, and convey the results of those computations through the statistical structure of their output spike trains. Here we present a practical method, grounded in the information-theoretic analysis of prediction, for inferring a minimal representation of that structure and for characterizing its complexity. Starting from spike trains, our approach finds their causal state… ▽ More Neurons perform computations, and convey the results of those computations through the statistical structure of their output spike trains. Here we present a practical method, grounded in the information-theoretic analysis of prediction, for inferring a minimal representation of that structure and for characterizing its complexity. Starting from spike trains, our approach finds their causal state models (CSMs), the minimal hidden Markov models or stochastic automata capable of generating statistically identical time series. We then use these CSMs to objectively quantify both the generalizable structure and the idiosyncratic randomness of the spike train. Specifically, we show that the expected algorithmic information content (the information needed to describe the spike train exactly) can be split into three parts describing (1) the time-invariant structure (complexity) of the minimal spike-generating process, which describes the spike train statistically; (2) the randomness (internal entropy rate) of the minimal spike-generating process; and (3) a residual pure noise term not described by the minimal spike-generating process. We use CSMs to approximate each of these quantities. The CSMs are inferred nonparametrically from the data, making only mild regularity assumptions, via the causal state splitting reconstruction algorithm. The methods presented here complement more traditional spike train analyses by describing not only spiking probability and spike train entropy, but also the complexity of a spike train's structure. We demonstrate our approach using both simulated spike trains and experimental data recorded in rat barrel cortex during vibrissa stimulation. △ Less

Submitted 30 December, 2009; originally announced January 2010.

Comments: Somewhat different format from journal version but same content

Journal ref: Neural Computation, vol. 22 (2010), pp. 121--157

arXiv:0710.4911 [pdf, other]

Social Media as Windows on the Social Life of the Mind

Authors: Cosma Rohilla Shalizi

Abstract: This is a programmatic paper, marking out two directions in which the study of social media can contribute to broader problems of social science: understanding cultural evolution and understanding collective cognition. Under the first heading, I discuss some difficulties with the usual, adaptationist explanations of cultural phenomena, alternative explanations involving network diffusion effects… ▽ More This is a programmatic paper, marking out two directions in which the study of social media can contribute to broader problems of social science: understanding cultural evolution and understanding collective cognition. Under the first heading, I discuss some difficulties with the usual, adaptationist explanations of cultural phenomena, alternative explanations involving network diffusion effects, and some ways these could be tested using social-media data. Under the second I describe some of the ways in which social media could be used to study how the social organization of an epistemic community supports its collective cognitive performance. △ Less

Submitted 25 October, 2007; originally announced October 2007.

Comments: 6 pages, 1 figure, AAAI format, submitted to AAAI spring 2008 symposium on "Social Information Processing"

arXiv:cs/0406011 [pdf, ps, other]

Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences

Authors: Cosma Rohilla Shalizi, Kristina Lisa Shalizi

Abstract: We present a new method for nonlinear prediction of discrete random sequences under minimal structural assumptions. We give a mathematical construction for optimal predictors of such processes, in the form of hidden Markov models. We then describe an algorithm, CSSR (Causal-State Splitting Reconstruction), which approximates the ideal predictor from data. We discuss the reliability of CSSR, its… ▽ More We present a new method for nonlinear prediction of discrete random sequences under minimal structural assumptions. We give a mathematical construction for optimal predictors of such processes, in the form of hidden Markov models. We then describe an algorithm, CSSR (Causal-State Splitting Reconstruction), which approximates the ideal predictor from data. We discuss the reliability of CSSR, its data requirements, and its performance in simulations. Finally, we compare our approach to existing methods using variable-length Markov models and cross-validated hidden Markov models, and show theoretically and experimentally that our method delivers results superior to the former and at least comparable to the latter. △ Less

Submitted 6 June, 2004; originally announced June 2004.

Comments: 8 pages, 4 figures

ACM Class: I.2.6

Journal ref: pp. 504--511 in Max Chickering and Joseph Halpern (eds.), _Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference_ (2004)

arXiv:cs/0210025 [pdf, ps, other]

An Algorithm for Pattern Discovery in Time Series

Authors: Cosma Rohilla Shalizi, Kristina Lisa Shalizi, James P. Crutchfield

Abstract: We present a new algorithm for discovering patterns in time series and other sequential data. We exhibit a reliable procedure for building the minimal set of hidden, Markovian states that is statistically capable of producing the behavior exhibited in the data -- the underlying process's causal states. Unlike conventional methods for fitting hidden Markov models (HMMs) to data, our algorithm mak… ▽ More We present a new algorithm for discovering patterns in time series and other sequential data. We exhibit a reliable procedure for building the minimal set of hidden, Markovian states that is statistically capable of producing the behavior exhibited in the data -- the underlying process's causal states. Unlike conventional methods for fitting hidden Markov models (HMMs) to data, our algorithm makes no assumptions about the process's causal architecture (the number of hidden states and their transition structure), but rather infers it from the data. It starts with assumptions of minimal structure and introduces complexity only when the data demand it. Moreover, the causal states it infers have important predictive optimality properties that conventional HMM states lack. We introduce the algorithm, review the theory behind it, prove its asymptotic reliability, use large deviation theory to estimate its rate of convergence, and compare it to other algorithms which also construct HMMs from data. We also illustrate its behavior on an example process, and report selected numerical results from an implementation. △ Less

Submitted 26 November, 2002; v1 submitted 28 October, 2002; originally announced October 2002.

Comments: 26 pages, 5 figures; 5 tables; http://www.santafe.edu/projects/CompMech Added discussion of algorithm parameters; improved treatment of convergence and time complexity; added comparison to older methods

Report number: SFI Working Paper 02-10-060 ACM Class: I.2.6; H.1.1; E.4

arXiv:nlin/0006025 [pdf, ps, other]

doi 10.1142/S0219525902000481

Information Bottlenecks, Causal States, and Statistical Relevance Bases: How to Represent Relevant Information in Memoryless Transduction

Authors: Cosma Rohilla Shalizi, James P. Crutchfield

Abstract: Discovering relevant, but possibly hidden, variables is a key step in constructing useful and predictive theories about the natural world. This brief note explains the connections between three approaches to this problem: the recently introduced information-bottleneck method, the computational mechanics approach to inferring optimal models, and Salmon's statistical relevance basis. Discovering relevant, but possibly hidden, variables is a key step in constructing useful and predictive theories about the natural world. This brief note explains the connections between three approaches to this problem: the recently introduced information-bottleneck method, the computational mechanics approach to inferring optimal models, and Salmon's statistical relevance basis. △ Less

Submitted 16 June, 2000; originally announced June 2000.

Comments: 3 pages, no figures, submitted to PRE as a "brief report". Revision: added an acknowledgements section originally omitted by a LaTeX bug

Journal ref: Advances in Complex Systems, vol. 5, pp. 91--95 (2002)

arXiv:cs/0001027 [pdf, ps, other]

Pattern Discovery and Computational Mechanics

Authors: Cosma Rohilla Shalizi, James P. Crutchfield

Abstract: Computational mechanics is a method for discovering, describing and quantifying patterns, using tools from statistical physics. It constructs optimal, minimal models of stochastic processes and their underlying causal structures. These models tell us about the intrinsic computation embedded within a process---how it stores and transforms information. Here we summarize the mathematics of computat… ▽ More Computational mechanics is a method for discovering, describing and quantifying patterns, using tools from statistical physics. It constructs optimal, minimal models of stochastic processes and their underlying causal structures. These models tell us about the intrinsic computation embedded within a process---how it stores and transforms information. Here we summarize the mathematics of computational mechanics, especially recent optimality and uniqueness results. We also expound the principles and motivations underlying computational mechanics, emphasizing its connections to the minimum description length principle, PAC theory, and other aspects of machine learning. △ Less

Submitted 28 January, 2000; originally announced January 2000.

Comments: 12 pages, 3 figures; submitted to the Proceedings of the 17th International Conference on Machine Learning (differs slightly in pagination and citation format from that version)

Report number: SFI 00-01-008 ACM Class: I.2.6; F.1.3; G.3; H.1.1

Showing 1–16 of 16 results for author: Shalizi, C R