Search | arXiv e-print repository

Language statistics at different spatial, temporal, and grammatical scales

Authors: Fernanda Sánchez-Puig, Rogelio Lozano-Aranda, Dante Pérez-Méndez, Ewan Colman, Alfredo J. Morales-Guzmán, Carlos Pineda, Pedro Juan Rivera Torres, Carlos Gershenson

Abstract: Statistical linguistics has advanced considerably in recent decades as data has become available. This has allowed researchers to study how statistical properties of languages change over time. In this work, we use data from Twitter to explore English and Spanish considering the rank diversity at different scales: temporal (from 3 to 96 hour intervals), spatial (from 3km to 3000+km radii), and gra… ▽ More Statistical linguistics has advanced considerably in recent decades as data has become available. This has allowed researchers to study how statistical properties of languages change over time. In this work, we use data from Twitter to explore English and Spanish considering the rank diversity at different scales: temporal (from 3 to 96 hour intervals), spatial (from 3km to 3000+km radii), and grammatical (from monograms to pentagrams). We find that all three scales are relevant. However, the greatest changes come from variations in the grammatical scale. At the lowest grammatical scale (monograms), the rank diversity curves are most similar, independently on the values of other scales, languages, and countries. As the grammatical scale grows, the rank diversity curves vary more depending on the temporal and spatial scales, as well as on the language and country. We also study the statistics of Twitter-specific tokens: emojis, hashtags, and user mentions. These particular type of tokens show a sigmoid kind of behaviour as a rank diversity function. Our results are helpful to quantify aspects of language statistics that seem universal and what may lead to variations. △ Less

Submitted 26 July, 2022; v1 submitted 1 July, 2022; originally announced July 2022.

arXiv:1601.00571 [pdf, ps, other]

doi 10.1103/PhysRevE.94.012313

Separating temporal and topological effects in walk-based network centrality

Authors: Ewan Colman, Nathaniel Charlton

Abstract: The recently introduced concept of dynamic communicability is a valuable tool for ranking the importance of nodes in a temporal network. Two metrics, broadcast score and receive score, were introduced to measure the centrality of a node with respect to a model of contagion based on time-respecting walks. This article examines the temporal and structural factors influencing these metrics by conside… ▽ More The recently introduced concept of dynamic communicability is a valuable tool for ranking the importance of nodes in a temporal network. Two metrics, broadcast score and receive score, were introduced to measure the centrality of a node with respect to a model of contagion based on time-respecting walks. This article examines the temporal and structural factors influencing these metrics by considering a versatile stochastic temporal network model. We analytically derive formulae to accurately predict the expectation of the broadcast and receive scores when one or more columns in a temporal edge-list are shuffled. These methods are then applied to two publicly available data-sets and we quantify how much the centrality of each individual depends on structural or temporal influences. From our analysis we highlight two practical contributions: a way to control for temporal variation when computing dynamic communicability, and the conclusion that the broadcast and receive scores can, under a range of circumstances, be replaced by the row and column sums of the matrix exponential of a weighted adjacency matrix given by the data. △ Less

Submitted 28 September, 2016; v1 submitted 4 January, 2016; originally announced January 2016.

Comments: 15 Pages, 8 figures

Journal ref: Phys. Rev. E 94, 012313 (2016)

arXiv:1501.05198 [pdf, ps, other]

doi 10.1103/PhysRevE.92.012817

Memory and burstiness in dynamic networks

Authors: Ewan R. Colman, Danica Vukadinović Greetham

Abstract: A discrete-time random process is described which can generate bursty sequences of events. A Bernoulli process, where the probability of an event occurring at time $t$ is given by a fixed probability $x$, is modified to include a memory effect where the event probability is increased proportionally to the number of events which occurred within a given amount of time preceding $t$. For small values… ▽ More A discrete-time random process is described which can generate bursty sequences of events. A Bernoulli process, where the probability of an event occurring at time $t$ is given by a fixed probability $x$, is modified to include a memory effect where the event probability is increased proportionally to the number of events which occurred within a given amount of time preceding $t$. For small values of $x$ the inter-event time distribution follows a power-law with exponent $-2-x$. We consider a dynamic network where each node forms, and breaks connections according to this process. The value of $x$ for each node depends on the fitness distribution, $ρ(x)$, from which it is drawn; we find exact solutions for the expectation of the degree distribution for a variety of possible fitness distributions, and for both cases where the memory effect either is, or is not present. This work can potentially lead to methods to uncover hidden fitness distributions from fast changing, temporal network data such as online social communications and fMRI scans. △ Less

Submitted 6 July, 2015; v1 submitted 21 January, 2015; originally announced January 2015.

Comments: 13 pages, 7 figures

Journal ref: Phys. Rev. E 92, 012817 (2015)

arXiv:1408.3570 [pdf, ps, other]

doi 10.1016/j.physa.2014.08.046

Local rewiring rules for evolving complex networks

Authors: Ewan R. Colman, Geoff J. Rodgers

Abstract: The effects of link rewiring are considered for the class of directed networks where each node has the same fixed out-degree. We model a network generated by three mechanisms that are present in various networked systems; growth, global rewiring and local rewiring. During a rewiring phase a node is randomly selected, one of its out-going edges is detached from its destination then re-attached to t… ▽ More The effects of link rewiring are considered for the class of directed networks where each node has the same fixed out-degree. We model a network generated by three mechanisms that are present in various networked systems; growth, global rewiring and local rewiring. During a rewiring phase a node is randomly selected, one of its out-going edges is detached from its destination then re-attached to the network in one of two possible ways; either globally to a randomly selected node, or locally to a descendant of a descendant of the originally selected node. Although the probability of attachment to a node increases with its connectivity, the probability of detachment also increases, the result is an exponential degree distribution with a small number of outlying nodes that have extremely large degree. We explain these outliers by identifying the circumstances for which a set of nodes can grow to very high degree. △ Less

Submitted 15 August, 2014; originally announced August 2014.

Comments: 8 pages, 5 figures

arXiv:1309.6225 [pdf, other]

doi 10.1016/j.physa.2012.07.034

Kinetics of node splitting in evolving complex networks

Authors: E. R. Colman, G. J. Rodgers

Abstract: We introduce a collection of complex networks generated by a combination of preferential attachment and a previously unexamined process of "splitting" nodes of degree $k$ into $k$ nodes of degree 1. Four networks are considered, each evolves at each time step by either preferential attachment, with probability $p$, or splitting with probability $1-p$. Two methods of attachment are considered; firs… ▽ More We introduce a collection of complex networks generated by a combination of preferential attachment and a previously unexamined process of "splitting" nodes of degree $k$ into $k$ nodes of degree 1. Four networks are considered, each evolves at each time step by either preferential attachment, with probability $p$, or splitting with probability $1-p$. Two methods of attachment are considered; first, attachment of an edge between a newly created node and existing node in the network, and secondly by attachment of an edge between two existing nodes. Splitting is also considered in two separate ways; first by selecting each node with equal probability and secondly, selecting the node with probability proportional to its degree. Exact solutions for the degree distributions are found and scale-free structure is exhibited in those networks where the candidates for splitting are chosen with uniform probability, those that are chosen preferentially are distributed with a power law with exponential cut-off. △ Less

Submitted 24 September, 2013; originally announced September 2013.

arXiv:1307.7389 [pdf, ps, other]

doi 10.1016/j.physa.2013.06.063

Complex scale-free networks with tunable power-law exponent and clustering

Authors: ER Colman, GJ Rodgers

Abstract: We introduce a network evolution process motivated by the network of citations in the scientific literature. In each iteration of the process a node is born and directed links are created from the new node to a set of target nodes already in the network. This set includes $m$ "ambassador" nodes and $l$ of each ambassador's descendants where $m$ and $l$ are random variables selected from any choice… ▽ More We introduce a network evolution process motivated by the network of citations in the scientific literature. In each iteration of the process a node is born and directed links are created from the new node to a set of target nodes already in the network. This set includes $m$ "ambassador" nodes and $l$ of each ambassador's descendants where $m$ and $l$ are random variables selected from any choice of distributions $p_{l}$ and $q_{m}$. The process mimics the tendency of authors to cite varying numbers of papers included in the bibliographies of the other papers they cite. We show that the degree distributions of the networks generated after a large number of iterations are scale-free and derive an expression for the power-law exponent. In a particular case of the model where the number of ambassadors is always the constant $m$ and the number of selected descendants from each ambassador is the constant $l$, the power-law exponent is $(2l+1)/l$. For this example we derive expressions for the degree distribution and clustering coefficient in terms of $l$ and $m$. We conclude that the proposed model can be tuned to have the same power law exponent and clustering coefficient of a broad range of the scale-free distributions that have been studied empirically. △ Less

Submitted 28 July, 2013; originally announced July 2013.

Comments: 16 pages, 10 figures, accepted journal paper

Showing 1–6 of 6 results for author: Colman, E