-
A generative model for community types in directed networks
Authors:
Cathy Xuanchi Liu,
Tristram J. Alexander,
Eduardo G. Altmann
Abstract:
Large complex networks are often organized into groups or communities. In this paper, we introduce and investigate a generative model of network evolution that reproduces all four pairwise community types that exist in directed networks: assortative, core-periphery, disassortative, and the newly introduced source-basin type. We fix the number of nodes and the community membership of each node, all…
▽ More
Large complex networks are often organized into groups or communities. In this paper, we introduce and investigate a generative model of network evolution that reproduces all four pairwise community types that exist in directed networks: assortative, core-periphery, disassortative, and the newly introduced source-basin type. We fix the number of nodes and the community membership of each node, allowing node connectivity to change through rewiring mechanisms that depend on the community membership of the involved nodes. We determine the dependence of the community relationship on the model parameters using a mean-field solution. It reveals that a difference in the swap probabilities of the two communities is a necessary condition to obtain a core-periphery relationship and that a difference in the average in-degree of the communities is a necessary condition for a source-basin relationship. More generally, our analysis reveals multiple possible scenarios for the transition between the different structure types, and sheds light on the mechanisms underlying the observation of the different types of communities in network data.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Sampling triangulations of manifolds using Monte Carlo methods
Authors:
Eduardo G. Altmann,
Jonathan Spreer
Abstract:
We propose a Monte Carlo method to efficiently find, count, and sample abstract triangulations of a given manifold M. The method is based on a biased random walk through all possible triangulations of M (in the Pachner graph), constructed by combining (bi-stellar) moves with suitable chosen accept/reject probabilities (Metropolis-Hastings). Asymptotically, the method guarantees that samples of tri…
▽ More
We propose a Monte Carlo method to efficiently find, count, and sample abstract triangulations of a given manifold M. The method is based on a biased random walk through all possible triangulations of M (in the Pachner graph), constructed by combining (bi-stellar) moves with suitable chosen accept/reject probabilities (Metropolis-Hastings). Asymptotically, the method guarantees that samples of triangulations are drawn at random from a chosen probability. This enables us not only to sample (rare) triangulations of particular interest but also to estimate the (extremely small) probability of obtaining them when isomorphism types of triangulations are sampled uniformly at random. We implement our general method for surface triangulations and 1-vertex triangulations of 3-manifolds. To showcase its usefulness, we present a number of experiments: (a) we recover asymptotic growth rates for the number of isomorphism types of simplicial triangulations of the 2-dimensional sphere; (b) we experimentally observe that the growth rate for the number of isomorphism types of 1-vertex triangulations of the 3-dimensional sphere appears to be singly exponential in the number of their tetrahedra; and (c) we present experimental evidence that a randomly chosen isomorphism type of 1-vertex n-tetrahedra 3-sphere triangulation, for n tending to infinity, almost surely shows a fixed edge-degree distribution which decays exponentially for large degrees, but shows non-monotonic behaviour for small degrees.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Probabilistic description of dissipative chaotic scattering
Authors:
Lachlan Burton,
Holger Dullin,
Eduardo G. Altmann
Abstract:
We investigate the extent to which the probabilistic properties of a chaotic scattering system with dissipation can be understood from the properties of the dissipation-free system. For large energies $E$, a fully chaotic scattering leads to an exponential decay of the survival probability $P(t) \sim e^{-κt}$ with an escape rate $κ$ that decreases with $E$. Dissipation $γ>0$ leads to the appearanc…
▽ More
We investigate the extent to which the probabilistic properties of a chaotic scattering system with dissipation can be understood from the properties of the dissipation-free system. For large energies $E$, a fully chaotic scattering leads to an exponential decay of the survival probability $P(t) \sim e^{-κt}$ with an escape rate $κ$ that decreases with $E$. Dissipation $γ>0$ leads to the appearance of different finite-time regimes in $P(t)$. We show how these different regimes can be understood for small $γ\ll 1$ and $t\gg 1/κ_0$ from the effective escape rate $κ_γ(t)=κ_0(E(t))$ (including the non-hyperbolic regime) until the energy reaches a critical value $E_c$ at which no escape is possible. More generally, we argue that for small dissipation $γ$ and long times $t$ the surviving trajectories in the dissipative system are distributed according to the conditionally invariant measure of the conservative system at the corresponding energy $E(t)<E(0)$. Quantitative predictions of our general theory are compared with numerical simulations in the Henon-Heiles model.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Quantifying the Dissimilarity of Texts
Authors:
Benjamin Shade,
Eduardo G. Altmann
Abstract:
Quantifying the dissimilarity of two texts is an important aspect of a number of natural language processing tasks, including semantic information retrieval, topic classification, and document clustering. In this paper, we compared the properties and performance of different dissimilarity measures $D$ using three different representations of texts -- vocabularies, word frequency distributions, and…
▽ More
Quantifying the dissimilarity of two texts is an important aspect of a number of natural language processing tasks, including semantic information retrieval, topic classification, and document clustering. In this paper, we compared the properties and performance of different dissimilarity measures $D$ using three different representations of texts -- vocabularies, word frequency distributions, and vector embeddings -- and three simple tasks -- clustering texts by author, subject, and time period. Using the Project Gutenberg database, we found that the generalised Jensen--Shannon divergence applied to word frequencies performed strongly across all tasks, that $D$'s based on vector embedding representations led to stronger performance for smaller texts, and that the optimal choice of approach was ultimately task-dependent. We also investigated, both analytically and numerically, the behaviour of the different $D$'s when the two texts varied in length by a factor $h$. We demonstrated that the (natural) estimator of the Jaccard distance between vocabularies was inconsistent and computed explicitly the $h$-dependency of the bias of the estimator of the generalised Jensen--Shannon divergence applied to word frequencies. We also found numerically that the Jensen--Shannon divergence and embedding-based approaches were robust to changes in $h$, while the Jaccard distance was not.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Modelling daily weight variation in honey bee hives
Authors:
Karina Arias-Calluari,
Theotime Colin,
Tanya Latty,
Mary Myerscough,
Eduardo G. Altmann
Abstract:
A quantitative understanding of the dynamics of bee colonies is important to support global efforts to improve bee health and enhance pollination services. Traditional approaches focus either on theoretical models or data-centred statistical analyses. Here we argue that the combination of these two approaches is essential to obtain interpretable information on the state of bee colonies and show ho…
▽ More
A quantitative understanding of the dynamics of bee colonies is important to support global efforts to improve bee health and enhance pollination services. Traditional approaches focus either on theoretical models or data-centred statistical analyses. Here we argue that the combination of these two approaches is essential to obtain interpretable information on the state of bee colonies and show how this can be achieved in the case of time series of intra-day weight variation. We model how the foraging and food processing activities of bees affect global hive weight through a set of ordinary differential equations and show how to estimate reliable ranges for the ten parameters of this model from measurements on a single day. Our analysis of 10 hives at different times shows that crucial indicators of the health of honey bee colonies are estimated robustly and fall in ranges compatible with previously reported results. The indicators include the amount of food collected (foraging success) and the number of active foragers, which may be used to develop early warning indicators of colony failure.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Multilayer Networks for Text Analysis with Multiple Data Types
Authors:
Charles C. Hyland,
Yuanming Tao,
Lamiae Azizi,
Martin Gerlach,
Tiago P. Peixoto,
Eduardo G. Altmann
Abstract:
We are interested in the widespread problem of clustering documents and finding topics in large collections of written documents in the presence of metadata and hyperlinks. To tackle the challenge of accounting for these different types of datasets, we propose a novel framework based on Multilayer Networks and Stochastic Block Models. The main innovation of our approach over other techniques is th…
▽ More
We are interested in the widespread problem of clustering documents and finding topics in large collections of written documents in the presence of metadata and hyperlinks. To tackle the challenge of accounting for these different types of datasets, we propose a novel framework based on Multilayer Networks and Stochastic Block Models. The main innovation of our approach over other techniques is that it applies the same non-parametric probabilistic framework to the different sources of datasets simultaneously. The key difference to other multilayer complex networks is the strong unbalance between the layers, with the average degree of different node types scaling differently with system size. We show that the latter observation is due to generic properties of text, such as Heaps' law, and strongly affects the inference of communities. We present and discuss the performance of our method in different datasets (hundreds of Wikipedia documents, thousands of scientific papers, and thousands of E-mails) showing that taking into account multiple types of information provides a more nuanced view on topic- and document-clusters and increases the ability to predict missing links.
△ Less
Submitted 30 June, 2021;
originally announced June 2021.
-
Spatial interactions in urban scaling laws
Authors:
Eduardo G. Altmann
Abstract:
Analyses of urban scaling laws assume that observations in different cities are independent of the existence of nearby cities. Here we introduce generative models and data-analysis methods that overcome this limitation by modelling explicitly the effect of interactions between individuals at different locations. Parameters that describe the scaling law and the spatial interactions are inferred fro…
▽ More
Analyses of urban scaling laws assume that observations in different cities are independent of the existence of nearby cities. Here we introduce generative models and data-analysis methods that overcome this limitation by modelling explicitly the effect of interactions between individuals at different locations. Parameters that describe the scaling law and the spatial interactions are inferred from data simultaneously, allowing for rigorous (Bayesian) model comparison and overcoming the problem of defining the boundaries of urban regions. Results in five different datasets show that including spatial interactions typically leads to better models and a change in the exponent of the scaling law. Data and codes are provided in Ref. [1].
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
Scaling laws and dynamics of hashtags on Twitter
Authors:
Hongjia H. Chen,
Tristram J. Alexander,
Diego F. M. Oliveira,
Eduardo G. Altmann
Abstract:
In this paper we quantify the statistical properties and dynamics of the frequency of hashtag use on Twitter. Hashtags are special words used in social media to attract attention and to organize content. Looking at the collection of all hashtags used in a period of time, we identify the scaling laws underpinning the hashtag frequency distribution (Zipf's law), the number of unique hashtags as a fu…
▽ More
In this paper we quantify the statistical properties and dynamics of the frequency of hashtag use on Twitter. Hashtags are special words used in social media to attract attention and to organize content. Looking at the collection of all hashtags used in a period of time, we identify the scaling laws underpinning the hashtag frequency distribution (Zipf's law), the number of unique hashtags as a function of sample size (Heaps' law), and the fluctuations around expected values (Taylor's law). While these scaling laws appear to be universal, in the sense that similar exponents are observed irrespective of when the sample is gathered, the volume and nature of the hashtags depends strongly on time, with the appearance of bursts at the minute scale, fat-tailed noise, and long-range correlations. We quantify this dynamics by computing the Jensen-Shannon divergence between hashtag distributions obtained $τ$ times apart and we find that the speed of change decays roughly as $1/τ$. Our findings are based on the analysis of 3.5 billion hashtags used between 2015 and 2016.
△ Less
Submitted 27 April, 2020;
originally announced April 2020.
-
Micro, Meso, Macro: the effect of triangles on communities in networks
Authors:
Sophie Wharrie,
Lamiae Azizi,
Eduardo G. Altmann
Abstract:
Meso-scale structures (communities) are used to understand the macro-scale properties of complex networks, such as their functionality and formation mechanisms. Micro-scale structures are known to exist in most complex networks (e.g., large number of triangles or motifs), but they are absent in the simple random-graph models considered (e.g., as null models) in community-detection algorithms. In t…
▽ More
Meso-scale structures (communities) are used to understand the macro-scale properties of complex networks, such as their functionality and formation mechanisms. Micro-scale structures are known to exist in most complex networks (e.g., large number of triangles or motifs), but they are absent in the simple random-graph models considered (e.g., as null models) in community-detection algorithms. In this paper we investigate the effect of micro-structures on the appearance of communities in networks. We find that alone the presence of triangles leads to the appearance of communities even in methods designed to avoid the detection of communities in random networks. This shows that communities can emerge spontaneously from simple processes of motiff generation happening at a micro-level. Our results are based on four widely used community-detection approaches (stochastic block model, spectral method, modularity maximization, and the Infomap algorithm) and three different generative network models (triadic closure, generalized configuration model, and random graphs with triangles).
△ Less
Submitted 15 July, 2019;
originally announced July 2019.
-
Testing statistical laws in complex systems
Authors:
Martin Gerlach,
Eduardo G. Altmann
Abstract:
The availability of large datasets requires an improved view on statistical laws in complex systems, such as Zipf's law of word frequencies, the Gutenberg-Richter law of earthquake magnitudes, or scale-free degree distribution in networks. In this paper we discuss how the statistical analysis of these laws are affected by correlations present in the observations, the typical scenario for data from…
▽ More
The availability of large datasets requires an improved view on statistical laws in complex systems, such as Zipf's law of word frequencies, the Gutenberg-Richter law of earthquake magnitudes, or scale-free degree distribution in networks. In this paper we discuss how the statistical analysis of these laws are affected by correlations present in the observations, the typical scenario for data from complex systems. We first show how standard maximum-likelihood recipes lead to false rejections of statistical laws in the presence of correlations. We then propose a conservative method (based on shuffling and under-sampling the data) to test statistical laws and find that accounting for correlations leads to smaller rejection rates and larger confidence intervals on estimated parameters.
△ Less
Submitted 25 April, 2019;
originally announced April 2019.
-
Taming chaos to sample rare events: the effect of weak chaos
Authors:
Jorge C. Leitao,
Joao M. V. P. Lopes,
Eduardo G. Altmann
Abstract:
Rare events in non-linear dynamical systems are difficult to sample because of the sensitivity to perturbations of initial conditions and of complex landscapes in phase space. Here we discuss strategies to control these difficulties and succeed in obtainining an efficient sampling within a Metropolis-Hastings Monte Carlo framework. After reviewing previous successes in the case of strongly chaotic…
▽ More
Rare events in non-linear dynamical systems are difficult to sample because of the sensitivity to perturbations of initial conditions and of complex landscapes in phase space. Here we discuss strategies to control these difficulties and succeed in obtainining an efficient sampling within a Metropolis-Hastings Monte Carlo framework. After reviewing previous successes in the case of strongly chaotic systems, we discuss the case of weakly chaotic systems. We show how different types of non-hyperbolicities limit the efficiency of previously designed sampling methods and we discuss strategies how to account for them. We focus on paradigmatic low-dimensional chaotic systems such as the logistic map, the Pomeau-Maneville map, and area-preserving maps with mixed phase space.
△ Less
Submitted 9 April, 2019;
originally announced April 2019.
-
Unraveling the Origin of Social Bursts in Collective Attention
Authors:
Manlio De Domenico,
Eduardo G. Altmann
Abstract:
In the era of social media, every day billions of individuals produce content in socio-technical systems resulting in a deluge of information. However, human attention is a limited resource and it is increasingly challenging to consume the most suitable content for one's interests. In fact, the complex interplay between individual and social activities in social systems overwhelmed by information…
▽ More
In the era of social media, every day billions of individuals produce content in socio-technical systems resulting in a deluge of information. However, human attention is a limited resource and it is increasingly challenging to consume the most suitable content for one's interests. In fact, the complex interplay between individual and social activities in social systems overwhelmed by information results in bursty activity of collective attention which are still poorly understood. Here, we tackle this challenge by analyzing the online activity of millions of users in a popular microblogging platform during exceptional events, from NBA Finals to the elections of Pope Francis and the discovery of gravitational waves. We observe extreme fluctuations in collective attention that we are able to characterize and explain by considering the co-occurrence of two fundamental factors: the heterogeneity of social interactions and the preferential attention towards influential users. Our findings demonstrate how combining simple mechanisms provides a route towards complex social phenomena.
△ Less
Submitted 15 March, 2019;
originally announced March 2019.
-
Monte Carlo sampling in diffusive dynamical systems
Authors:
Diego Tapias,
David P. Sanders,
Eduardo G. Altmann
Abstract:
We introduce a Monte Carlo algorithm to efficiently compute transport properties of chaotic dynamical systems. Our method exploits the importance sampling technique that favors trajectories in the tail of the distribution of displacements, where deviations from a diffusive process are most prominent. We search for initial conditions using a proposal that correlates states in the Markov chain const…
▽ More
We introduce a Monte Carlo algorithm to efficiently compute transport properties of chaotic dynamical systems. Our method exploits the importance sampling technique that favors trajectories in the tail of the distribution of displacements, where deviations from a diffusive process are most prominent. We search for initial conditions using a proposal that correlates states in the Markov chain constructed via a Metropolis-Hastings algorithm. We show that our method outperforms the direct sampling method and also Metropolis-Hastings methods with alternative proposals. We test our general method through numerical simulations in 1D (box-map) and 2D (Lorentz gas) systems.
△ Less
Submitted 4 April, 2018;
originally announced April 2018.
-
The common origin of symmetry and structure in genetic sequences
Authors:
Giampaolo Cristadoro,
Mirko Degli Esposti,
Eduardo G. Altmann
Abstract:
Biologists have long sought a way to explain how statistical properties of genetic sequences emerged and are maintained through evolution. On the one hand, non-random structures at different scales indicate a complex genome organisation. On the other hand, single-strand symmetry has been scrutinised using neutral models in which correlations are not considered or irrelevant, contrary to empirical…
▽ More
Biologists have long sought a way to explain how statistical properties of genetic sequences emerged and are maintained through evolution. On the one hand, non-random structures at different scales indicate a complex genome organisation. On the other hand, single-strand symmetry has been scrutinised using neutral models in which correlations are not considered or irrelevant, contrary to empirical evidence. Different studies investigated these two statistical features separately, reaching minimal consensus despite sustained efforts. Here we unravel previously unknown symmetries in genetic sequences, which are organized hierarchically through scales in which non-random structures are known to be present. These observations are confirmed through the statistical analysis of the human genome and explained through a simple domain model. These results suggest that domain models which account for the cumulative action of mobile elements can explain simultaneously non-random structures and symmetries in genetic sequences.
△ Less
Submitted 31 October, 2018; v1 submitted 6 October, 2017;
originally announced October 2017.
-
A network approach to topic models
Authors:
Martin Gerlach,
Tiago P. Peixoto,
Eduardo G. Altmann
Abstract:
One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach which infers the latent topical structure of a collection of documents. Despite their success --- in particular of its most widely used variant called Latent Dirichlet Allocation (LDA) --- and numerous application…
▽ More
One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach which infers the latent topical structure of a collection of documents. Despite their success --- in particular of its most widely used variant called Latent Dirichlet Allocation (LDA) --- and numerous applications in sociology, history, and linguistics, topic models are known to suffer from severe conceptual and practical problems, e.g. a lack of justification for the Bayesian priors, discrepancies with statistical properties of real texts, and the inability to properly choose the number of topics. Here we obtain a fresh view on the problem of identifying topical structures by relating it to the problem of finding communities in complex networks. This is achieved by representing text corpora as bipartite networks of documents and words. By adapting existing community-detection methods -- using a stochastic block model (SBM) with non-parametric priors -- we obtain a more versatile and principled framework for topic modeling (e.g., it automatically detects the number of topics and hierarchically clusters both the words and documents). The analysis of artificial and real corpora demonstrates that our SBM approach leads to better topic models than LDA in terms of statistical model selection. More importantly, our work shows how to formally relate methods from community detection and topic modeling, opening the possibility of cross-fertilization between these two fields.
△ Less
Submitted 19 July, 2018; v1 submitted 4 August, 2017;
originally announced August 2017.
-
Using text analysis to quantify the similarity and evolution of scientific disciplines
Authors:
Laercio Dias,
Martin Gerlach,
Joachim Scharloth,
Eduardo G. Altmann
Abstract:
We use an information-theoretic measure of linguistic similarity to investigate the organization and evolution of scientific fields. An analysis of almost 20M papers from the past three decades reveals that the linguistic similarity is related but different from experts and citation-based classifications, leading to an improved view on the organization of science. A temporal analysis of the simila…
▽ More
We use an information-theoretic measure of linguistic similarity to investigate the organization and evolution of scientific fields. An analysis of almost 20M papers from the past three decades reveals that the linguistic similarity is related but different from experts and citation-based classifications, leading to an improved view on the organization of science. A temporal analysis of the similarity of fields shows that some fields (e.g., computer science) are becoming increasingly central, but that on average the similarity between pairs has not changed in the last decades. This suggests that tendencies of convergence (e.g., multi-disciplinarity) and divergence (e.g., specialization) of disciplines are in balance.
△ Less
Submitted 27 June, 2017;
originally announced June 2017.
-
Importance Sampling of Rare Events in Chaotic Systems
Authors:
Jorge C. Leitao,
Joao M. Viana Parente Lopes,
Eduardo G. Altmann
Abstract:
Finding and sampling rare trajectories in dynamical systems is a difficult computational task underlying numerous problems and applications. In this paper we show how to construct Metropolis- Hastings Monte Carlo methods that can efficiently sample rare trajectories in the (extremely rough) phase space of chaotic systems. As examples of our general framework we compute the distribution of finite-t…
▽ More
Finding and sampling rare trajectories in dynamical systems is a difficult computational task underlying numerous problems and applications. In this paper we show how to construct Metropolis- Hastings Monte Carlo methods that can efficiently sample rare trajectories in the (extremely rough) phase space of chaotic systems. As examples of our general framework we compute the distribution of finite-time Lyapunov exponents (in different chaotic maps) and the distribution of escape times (in transient-chaos problems). Our methods sample exponentially rare states in polynomial number of samples (in both low- and high-dimensional systems). An open-source software that implements our algorithms and reproduces our results can be found in https://github.com/jorgecarleitao/chaospp
△ Less
Submitted 22 January, 2017;
originally announced January 2017.
-
Generalized Entropies and the Similarity of Texts
Authors:
Eduardo G. Altmann,
Laercio Dias,
Martin Gerlach
Abstract:
We show how generalized Gibbs-Shannon entropies can provide new insights on the statistical properties of texts. The universal distribution of word frequencies (Zipf's law) implies that the generalized entropies, computed at the word level, are dominated by words in a specific range of frequencies. Here we show that this is the case not only for the generalized entropies but also for the generaliz…
▽ More
We show how generalized Gibbs-Shannon entropies can provide new insights on the statistical properties of texts. The universal distribution of word frequencies (Zipf's law) implies that the generalized entropies, computed at the word level, are dominated by words in a specific range of frequencies. Here we show that this is the case not only for the generalized entropies but also for the generalized (Jensen-Shannon) divergences, used to compute the similarity between different texts. This finding allows us to identify the contribution of specific words (and word frequencies) for the different generalized entropies and also to estimate the size of the databases needed to obtain a reliable estimation of the divergences. We test our results in large databases of books (from the Google n-gram database) and scientific papers (indexed by Web of Science).
△ Less
Submitted 11 November, 2016;
originally announced November 2016.
-
Searching chaotic saddles in high dimensions
Authors:
M. Sala,
J. C. Leitao,
E. G. Altmann
Abstract:
We propose new methods to numerically approximate non-attracting sets governing transiently-chaotic systems. Trajectories starting in a vicinity $Ω$ of these sets escape $Ω$ in a finite time $τ$ and the problem is to find initial conditions ${\bf x} \in Ω$ with increasingly large $τ= τ({\bf x})$. We search points ${\bf x}'$ with $τ({\bf x}')>τ({\bf x})$ in a {\it search domain} in $Ω$. Our first m…
▽ More
We propose new methods to numerically approximate non-attracting sets governing transiently-chaotic systems. Trajectories starting in a vicinity $Ω$ of these sets escape $Ω$ in a finite time $τ$ and the problem is to find initial conditions ${\bf x} \in Ω$ with increasingly large $τ= τ({\bf x})$. We search points ${\bf x}'$ with $τ({\bf x}')>τ({\bf x})$ in a {\it search domain} in $Ω$. Our first method considers a search domain with size that decreases exponentially in $τ$, with an exponent proportional to the largest Lyapunov exponent $λ_1$. Our second method considers anisotropic search domains in the {\it tangent} unstable manifold, where each direction scale as the inverse of the corresponding {\it expanding} singular value of the Jacobian matrix of the iterated map. We show that both methods outperform the state-of-the-art {\it Stagger-and-Step} method (Sweet, Nusse, and York, Phys. Rev. Lett. {\bf 86}, 2261, 2001) but that only the anisotropic method achieves an efficiency independent of $τ$ for the case of high-dimensional systems with multiple positive Lyapunov exponents. We perform simulations in a chain of coupled Hénon maps in up to 24 dimensions ($12$ positive Lyapunov exponents). This suggests the possibility of characterizing also non-attracting sets in spatio-temporal systems.
△ Less
Submitted 2 January, 2017; v1 submitted 18 October, 2016;
originally announced October 2016.
-
Stochastic dynamics and the predictability of big hits in online videos
Authors:
Jose M. Miotto,
Holger Kantz,
Eduardo G. Altmann
Abstract:
The competition for the attention of users is a central element of the Internet. Crucial issues are the origin and predictability of big hits, the few items that capture a big portion of the total attention. We address these issues analyzing 10 million time series of videos' views from YouTube. We find that the average gain of views is linearly proportional to the number of views a video already h…
▽ More
The competition for the attention of users is a central element of the Internet. Crucial issues are the origin and predictability of big hits, the few items that capture a big portion of the total attention. We address these issues analyzing 10 million time series of videos' views from YouTube. We find that the average gain of views is linearly proportional to the number of views a video already has, in agreement with usual rich-get-richer mechanisms and Gibrat's law, but this fails to explain the prevalence of big hits. The reason is that the fluctuations around the average views are themselves heavy tailed. Based on these empirical observations, we propose a stochastic differential equation with Lévy noise as a model of the dynamics of videos. We show how this model is substantially better in estimating the probability of an ordinary item becoming a big hit, which is considerably underestimated in the traditional proportional-growth models.
△ Less
Submitted 10 March, 2017; v1 submitted 20 September, 2016;
originally announced September 2016.
-
Impact of lexical and sentiment factors on the popularity of scientific papers
Authors:
Julian Sienkiewicz,
Eduardo G. Altmann
Abstract:
We investigate how textual properties of scientific papers relate to the number of citations they receive. Our main finding is that correlations are non-linear and affect differently most-cited and typical papers. For instance, we find that in most journals short titles correlate positively with citations only for the most cited papers, for typical papers the correlation is in most cases negative.…
▽ More
We investigate how textual properties of scientific papers relate to the number of citations they receive. Our main finding is that correlations are non-linear and affect differently most-cited and typical papers. For instance, we find that in most journals short titles correlate positively with citations only for the most cited papers, for typical papers the correlation is in most cases negative. Our analysis of 6 different factors, calculated both at the title and abstract level of 4.3 million papers in over 1500 journals, reveals the number of authors, and the length and complexity of the abstract, as having the strongest (positive) influence on the number of citations.
△ Less
Submitted 24 May, 2016;
originally announced May 2016.
-
Is this scaling nonlinear?
Authors:
J. C. Leitao,
J. M. Miotto,
M. Gerlach,
E. G. Altmann
Abstract:
One of the most celebrated findings in complex systems in the last decade is that different indexes y (e.g., patents) scale nonlinearly with the population~x of the cities in which they appear, i.e., $y\sim x^β, β\neq 1$. More recently, the generality of this finding has been questioned in studies using new databases and different definitions of city boundaries. In this paper we investigate the ex…
▽ More
One of the most celebrated findings in complex systems in the last decade is that different indexes y (e.g., patents) scale nonlinearly with the population~x of the cities in which they appear, i.e., $y\sim x^β, β\neq 1$. More recently, the generality of this finding has been questioned in studies using new databases and different definitions of city boundaries. In this paper we investigate the existence of nonlinear scaling using a probabilistic framework in which fluctuations are accounted explicitly. In particular, we show that this allows not only to (a) estimate $β$ and confidence intervals, but also to (b) quantify the evidence in favor of $β\neq 1$ and (c) test the hypothesis that the observations are compatible with the nonlinear scaling. We employ this framework to compare $5$ different models to $15$ different datasets and we find that the answers to points (a)-(c) crucially depend on the fluctuations contained in the data, on how they are modeled, and on the fact that the city sizes are heavy-tailed distributed.
△ Less
Submitted 11 April, 2016;
originally announced April 2016.
-
Similarity of symbol frequency distributions with heavy tails
Authors:
Martin Gerlach,
Francesc Font-Clos,
Eduardo G. Altmann
Abstract:
Quantifying the similarity between symbolic sequences is a traditional problem in Information Theory which requires comparing the frequencies of symbols in different sequences. In numerous modern applications, ranging from DNA over music to texts, the distribution of symbol frequencies is characterized by heavy-tailed distributions (e.g., Zipf's law). The large number of low-frequency symbols in t…
▽ More
Quantifying the similarity between symbolic sequences is a traditional problem in Information Theory which requires comparing the frequencies of symbols in different sequences. In numerous modern applications, ranging from DNA over music to texts, the distribution of symbol frequencies is characterized by heavy-tailed distributions (e.g., Zipf's law). The large number of low-frequency symbols in these distributions poses major difficulties to the estimation of the similarity between sequences, e.g., they hinder an accurate finite-size estimation of entropies. Here we show analytically how the systematic (bias) and statistical (fluctuations) errors in these estimations depend on the sample size~$N$ and on the exponent~$γ$ of the heavy-tailed distribution. Our results are valid for the Shannon entropy $(α=1)$, its corresponding similarity measures (e.g., the Jensen-Shanon divergence), and also for measures based on the generalized entropy of order $α$. For small $α$'s, including $α=1$, the errors decay slower than the $1/N$-decay observed in short-tailed distributions. For $α$ larger than a critical value $α^* = 1+1/γ\leq 2$, the $1/N$-decay is recovered. We show the practical significance of our results by quantifying the evolution of the English language over the last two centuries using a complete $α$-spectrum of measures. We find that frequent words change more slowly than less frequent words and that $α=2$ provides the most robust measure to quantify language change.
△ Less
Submitted 15 April, 2016; v1 submitted 1 October, 2015;
originally announced October 2015.
-
Sampling motif-constrained ensembles of networks
Authors:
Rico Fischer,
Jorge C. Leitao,
Tiago P. Peixoto,
Eduardo G. Altmann
Abstract:
The statistical significance of network properties is conditioned on null models which satisfy spec- ified properties but that are otherwise random. Exponential random graph models are a principled theoretical framework to generate such constrained ensembles, but which often fail in practice, either due to model inconsistency, or due to the impossibility to sample networks from them. These problem…
▽ More
The statistical significance of network properties is conditioned on null models which satisfy spec- ified properties but that are otherwise random. Exponential random graph models are a principled theoretical framework to generate such constrained ensembles, but which often fail in practice, either due to model inconsistency, or due to the impossibility to sample networks from them. These problems affect the important case of networks with prescribed clustering coefficient or number of small connected subgraphs (motifs). In this paper we use the Wang-Landau method to obtain a multicanonical sampling that overcomes both these problems. We sample, in polynomial time, net- works with arbitrary degree sequences from ensembles with imposed motifs counts. Applying this method to social networks, we investigate the relation between transitivity and homophily, and we quantify the correlation between different types of motifs, finding that single motifs can explain up to 60% of the variation of motif profiles.
△ Less
Submitted 17 November, 2015; v1 submitted 30 July, 2015;
originally announced July 2015.
-
Temporal-varying failures of nodes in networks
Authors:
Georgie Knight,
Giampaolo Cristadoro,
Eduardo G. Altmann
Abstract:
We consider networks in which random walkers are removed because of the failure of specific nodes. We interpret the rate of loss as a measure of the importance of nodes, a notion we denote as failure-centrality. We show that the degree of the node is not sufficient to determine this measure and that, in a first approximation, the shortest loops through the node have to be taken into account. We pr…
▽ More
We consider networks in which random walkers are removed because of the failure of specific nodes. We interpret the rate of loss as a measure of the importance of nodes, a notion we denote as failure-centrality. We show that the degree of the node is not sufficient to determine this measure and that, in a first approximation, the shortest loops through the node have to be taken into account. We propose approximations of the failure-centrality which are valid for temporal-varying failures and we dwell on the possibility of externally changing the relative importance of nodes in a given network, by exploiting the interference between the loops of a node and the cycles of the temporal pattern of failures. In the limit of long failure cycles we show analytically that the escape in a node is larger than the one estimated from a stochastic failure with the same failure probability. We test our general formalism in two real-world networks (air-transportation and e-mail users) and show how communities lead to deviations from predictions for failures in hubs.
△ Less
Submitted 7 July, 2015;
originally announced July 2015.
-
Statistical laws in linguistics
Authors:
Eduardo G. Altmann,
Martin Gerlach
Abstract:
Zipf's law is just one out of many universal laws proposed to describe statistical regularities in language. Here we review and critically discuss how these laws can be statistically interpreted, fitted, and tested (falsified). The modern availability of large databases of written text allows for tests with an unprecedent statistical accuracy and also a characterization of the fluctuations around…
▽ More
Zipf's law is just one out of many universal laws proposed to describe statistical regularities in language. Here we review and critically discuss how these laws can be statistically interpreted, fitted, and tested (falsified). The modern availability of large databases of written text allows for tests with an unprecedent statistical accuracy and also a characterization of the fluctuations around the typical behavior. We find that fluctuations are usually much larger than expected based on simplifying statistical assumptions (e.g., independence and lack of correlations between observations).These simplifications appear also in usual statistical tests so that the large fluctuations can be erroneously interpreted as a falsification of the law. Instead, here we argue that linguistic laws are only meaningful (falsifiable) if accompanied by a model for which the fluctuations can be computed (e.g., a generative model of the text). The large fluctuations we report show that the constraints imposed by linguistic laws on the creativity process of text generation are not as tight as one could expect.
△ Less
Submitted 11 February, 2015;
originally announced February 2015.
-
Chaotic Explosions
Authors:
Eduardo G. Altmann,
Jefferson S. E. Portela,
Tamás Tél
Abstract:
We investigate chaotic dynamical systems for which the intensity of trajectories might grow unlimited in time. We show that (i) the intensity grows exponentially in time and is distributed spatially according to a fractal measure with an information dimension smaller than that of the phase space,(ii) such exploding cases can be described by an operator formalism similar to the one applied to chaot…
▽ More
We investigate chaotic dynamical systems for which the intensity of trajectories might grow unlimited in time. We show that (i) the intensity grows exponentially in time and is distributed spatially according to a fractal measure with an information dimension smaller than that of the phase space,(ii) such exploding cases can be described by an operator formalism similar to the one applied to chaotic systems with absorption (decaying intensities), but (iii) the invariant quantities characterizing explosion and absorption are typically not directly related to each other, e.g., the decay rate and fractal dimensions of absorbing maps typically differ from the ones computed in the corresponding inverse (exploding) maps. We illustrate our general results through numerical simulation in the cardioid billiard mimicking a lasing optical cavity, and through analytical calculations in the baker map.
△ Less
Submitted 22 January, 2015;
originally announced January 2015.
-
Efficiency of Monte Carlo Sampling in Chaotic Systems
Authors:
Jorge C. Leitão,
Eduardo G. Altmann,
J. M. Viana Parente Lopes
Abstract:
In this paper we investigate how the complexity of chaotic phase spaces affect the efficiency of importance sampling Monte Carlo simulations. We focus on a flat-histogram simulation of the distribution of finite-time Lyapunov exponent in a simple chaotic system and obtain analytically that the computational effort of the simulation: (i) scales polynomially with the finite-time, a tremendous improv…
▽ More
In this paper we investigate how the complexity of chaotic phase spaces affect the efficiency of importance sampling Monte Carlo simulations. We focus on a flat-histogram simulation of the distribution of finite-time Lyapunov exponent in a simple chaotic system and obtain analytically that the computational effort of the simulation: (i) scales polynomially with the finite-time, a tremendous improvement over the exponential scaling obtained in usual uniform sampling simulations; and (ii) the polynomial scaling is sub-optimal, a phenomenon known as critical slowing down. We show that critical slowing down appears because of the limited possibilities to issue a local proposal on the Monte Carlo procedure in chaotic systems. These results remain valid in other methods and show how generic properties of chaotic systems limit the efficiency of Monte Carlo simulations.
△ Less
Submitted 20 July, 2014;
originally announced July 2014.
-
Extracting information from S-curves of language change
Authors:
Fakhteh Ghanbarnejad,
Martin Gerlach,
Jose M. Miotto,
Eduardo G. Altmann
Abstract:
It is well accepted that adoption of innovations are described by S-curves (slow start, accelerating period, and slow end). In this paper, we analyze how much information on the dynamics of innovation spreading can be obtained from a quantitative description of S-curves. We focus on the adoption of linguistic innovations for which detailed databases of written texts from the last 200 years allow f…
▽ More
It is well accepted that adoption of innovations are described by S-curves (slow start, accelerating period, and slow end). In this paper, we analyze how much information on the dynamics of innovation spreading can be obtained from a quantitative description of S-curves. We focus on the adoption of linguistic innovations for which detailed databases of written texts from the last 200 years allow for an unprecedented statistical precision. Combining data analysis with simulations of simple models (e.g., the Bass dynamics on complex networks) we identify signatures of endogenous and exogenous factors in the S-curves of adoption. We propose a measure to quantify the strength of these factors and three different methods to estimate it from S-curves. We obtain cases in which the exogenous factors are dominant (in the adoption of German orthographic reforms and of one irregular verb) and cases in which endogenous factors are dominant (in the adoption of conventions for romanization of Russian names and in the regularization of most studied verbs). These results show that the shape of S-curve is not universal and contains information on the adoption mechanism. (published at "J. R. Soc. Interface, vol. 11, no. 101, (2014) 1044"; DOI: http://dx.doi.org/10.1098/rsif.2014.1044)
△ Less
Submitted 30 October, 2014; v1 submitted 17 June, 2014;
originally announced June 2014.
-
Scaling laws and fluctuations in the statistics of word frequencies
Authors:
Martin Gerlach,
Eduardo G. Altmann
Abstract:
In this paper we combine statistical analysis of large text databases and simple stochastic models to explain the appearance of scaling laws in the statistics of word frequencies. Besides the sublinear scaling of the vocabulary size with database size (Heaps' law), here we report a new scaling of the fluctuations around this average (fluctuation scaling analysis). We explain both scaling laws by m…
▽ More
In this paper we combine statistical analysis of large text databases and simple stochastic models to explain the appearance of scaling laws in the statistics of word frequencies. Besides the sublinear scaling of the vocabulary size with database size (Heaps' law), here we report a new scaling of the fluctuations around this average (fluctuation scaling analysis). We explain both scaling laws by modeling the usage of words by simple stochastic processes in which the overall distribution of word-frequencies is fat tailed (Zipf's law) and the frequency of a single word is subject to fluctuations across documents (as in topic models). In this framework, the mean and the variance of the vocabulary size can be expressed as quenched averages, implying that: i) the inhomogeneous dissemination of words cause a reduction of the average vocabulary size in comparison to the homogeneous case, and ii) correlations in the co-occurrence of words lead to an increase in the variance and the vocabulary size becomes a non-self-averaging quantity. We address the implications of these observations to the measurement of lexical richness. We test our results in three large text databases (Google-ngram, Enlgish Wikipedia, and a collection of scientific articles).
△ Less
Submitted 4 November, 2014; v1 submitted 17 June, 2014;
originally announced June 2014.
-
Predictability of extreme events in social media
Authors:
José M. Miotto,
Eduardo G. Altmann
Abstract:
It is part of our daily social-media experience that seemingly ordinary items (videos, news, publications, etc.) unexpectedly gain an enormous amount of attention. Here we investigate how unexpected these events are. We propose a method that, given some information on the items, quantifies the predictability of events, i.e., the potential of identifying in advance the most successful items defined…
▽ More
It is part of our daily social-media experience that seemingly ordinary items (videos, news, publications, etc.) unexpectedly gain an enormous amount of attention. Here we investigate how unexpected these events are. We propose a method that, given some information on the items, quantifies the predictability of events, i.e., the potential of identifying in advance the most successful items defined as the upper bound for the quality of any prediction based on the same information. Applying this method to different data, ranging from views in YouTube videos to posts in Usenet discussion groups, we invariantly find that the predictability increases for the most extreme events. This indicates that, despite the inherently stochastic collective dynamics of users, efficient prediction is possible for the most extreme events.
△ Less
Submitted 8 December, 2014; v1 submitted 14 March, 2014;
originally announced March 2014.
-
Chaotic Systems with Absorption
Authors:
Eduardo G. Altmann,
Jefferson S. E. Portela,
Tamás Tél
Abstract:
Motivated by applications in optics and acoustics we develop a dynamical-system approach to describe absorption in chaotic systems. We introduce an operator formalism from which we obtain (i) a general formula for the escape rate $κ$ in terms of the natural conditionally-invariant measure of the system; (ii) an increased multifractality when compared to the spectrum of dimensions $D_q$ obtained wi…
▽ More
Motivated by applications in optics and acoustics we develop a dynamical-system approach to describe absorption in chaotic systems. We introduce an operator formalism from which we obtain (i) a general formula for the escape rate $κ$ in terms of the natural conditionally-invariant measure of the system; (ii) an increased multifractality when compared to the spectrum of dimensions $D_q$ obtained without taking absorption and return times into account; and (iii) a generalization of the Kantz-Grassberger formula that expresses $D_1$ in terms of $κ$, the positive Lyapunov exponent, the average return time, and a new quantity, the reflection rate. Simulations in the cardioid billiard confirm these results.
△ Less
Submitted 24 October, 2013; v1 submitted 14 August, 2013;
originally announced August 2013.
-
Optimal noise maximizes collective motion in heterogeneous media
Authors:
Oleksandr Chepizhko,
Eduardo G. Altmann,
Fernando Peruani
Abstract:
We study the effect of spatial heterogeneity on the collective motion of self-propelled particles (SPPs). The heterogeneity is modeled as a random distribution of either static or diffusive obstacles, which the SPPs avoid while trying to align their movements. We find that such obstacles have a dramatic effect on the collective dynamics of usual SPP models. In particular, we report about the exist…
▽ More
We study the effect of spatial heterogeneity on the collective motion of self-propelled particles (SPPs). The heterogeneity is modeled as a random distribution of either static or diffusive obstacles, which the SPPs avoid while trying to align their movements. We find that such obstacles have a dramatic effect on the collective dynamics of usual SPP models. In particular, we report about the existence of an optimal (angular) noise amplitude that maximizes collective motion. We also show that while at low obstacle densities the system exhibits long-range order, in strongly heterogeneous media collective motion is quasi-long-range and exists only for noise values in between two critical noise values, with the system being disordered at both, large and low noise amplitudes. Since most real system have spatial heterogeneities, the finding of an optimal noise intensity has immediate practical and fundamental implications for the design and evolution of collective motion strategies.
△ Less
Submitted 24 May, 2013;
originally announced May 2013.
-
Probing the statistical properties of unknown texts: application to the Voynich Manuscript
Authors:
Diego R. Amancio,
Eduardo G. Altmann,
Diego Rybski,
Osvaldo N. Oliveira Jr.,
Luciano da F. Costa
Abstract:
While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed investigating the properties of statistical measurements across different languages and texts. In this study we propose a framework that aims at determining if a text is compatible with a natural language and which languages are c…
▽ More
While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed investigating the properties of statistical measurements across different languages and texts. In this study we propose a framework that aims at determining if a text is compatible with a natural language and which languages are closest to it, without any knowledge of the meaning of the words. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing text, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for key-words of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications.
△ Less
Submitted 1 March, 2013;
originally announced March 2013.
-
Monte Carlo Sampling in Fractal Landscapes
Authors:
Jorge C. Leitão,
João M. Viana Parente Lopes,
Eduardo G. Altmann
Abstract:
We propose a flat-histogram Monte Carlo method to efficiently sample fractal landscapes such as escape time functions of open chaotic systems. This is achieved by using a random-walk step which depends on the height of the landscape via the largest Lyapunov exponent of the associated chaotic system. By generalizing the Wang-Landau algorithm, we obtain a method which simultaneously constructs the d…
▽ More
We propose a flat-histogram Monte Carlo method to efficiently sample fractal landscapes such as escape time functions of open chaotic systems. This is achieved by using a random-walk step which depends on the height of the landscape via the largest Lyapunov exponent of the associated chaotic system. By generalizing the Wang-Landau algorithm, we obtain a method which simultaneously constructs the density of states (escape time distribution) and the correct step-length distribution. As a result, averages are obtained in polynomial computational time, a dramatic improvement over the exponential scaling of traditional uniform sampling. Our results are not limited by the dimensionality of the phase space and are confirmed numerically for dimensions as large as 30.
△ Less
Submitted 30 May, 2013; v1 submitted 19 February, 2013;
originally announced February 2013.
-
Identifying trends in word frequency dynamics
Authors:
Eduardo G. Altmann,
Zakary L. Whichard,
Adilson E. Motter
Abstract:
The word-stock of a language is a complex dynamical system in which words can be created, evolve, and become extinct. Even more dynamic are the short-term fluctuations in word usage by individuals in a population. Building on the recent demonstration that word niche is a strong determinant of future rise or fall in word frequency, here we introduce a model that allows us to distinguish persistent…
▽ More
The word-stock of a language is a complex dynamical system in which words can be created, evolve, and become extinct. Even more dynamic are the short-term fluctuations in word usage by individuals in a population. Building on the recent demonstration that word niche is a strong determinant of future rise or fall in word frequency, here we introduce a model that allows us to distinguish persistent from temporary increases in frequency. Our model is illustrated using a 10^8-word database from an online discussion group and a 10^11-word collection of digitized books. The model reveals a strong relation between changes in word dissemination and changes in frequency. Aside from their implications for short-term word frequency dynamics, these observations are potentially important for language evolution as new words must survive in the short term in order to survive in the long term.
△ Less
Submitted 15 February, 2013;
originally announced February 2013.
-
Stochastic model for the vocabulary growth in natural languages
Authors:
Martin Gerlach,
Eduardo G. Altmann
Abstract:
We propose a stochastic model for the number of different words in a given database which incorporates the dependence on the database size and historical changes. The main feature of our model is the existence of two different classes of words: (i) a finite number of core-words which have higher frequency and do not affect the probability of a new word to be used; and (ii) the remaining virtually…
▽ More
We propose a stochastic model for the number of different words in a given database which incorporates the dependence on the database size and historical changes. The main feature of our model is the existence of two different classes of words: (i) a finite number of core-words which have higher frequency and do not affect the probability of a new word to be used; and (ii) the remaining virtually infinite number of noncore-words which have lower frequency and once used reduce the probability of a new word to be used in the future. Our model relies on a careful analysis of the google-ngram database of books published in the last centuries and its main consequence is the generalization of Zipf's and Heaps' law to two scaling regimes. We confirm that these generalizations yield the best simple description of the data among generic descriptive models and that the two free parameters depend only on the language but not on the database. From the point of view of our model the main change on historical time scales is the composition of the specific words included in the finite list of core-words, which we observe to decay exponentially in time with a rate of approximately 30 words per year for English.
△ Less
Submitted 4 April, 2013; v1 submitted 6 December, 2012;
originally announced December 2012.
-
Stochastic perturbations in open chaotic systems: random versus noisy maps
Authors:
Tamas Bodai,
Eduardo G. Altmann,
Antonio Endler
Abstract:
We investigate the effects of random perturbations on fully chaotic open systems. Perturbations can be applied to each trajectory independently (white noise) or simultaneously to all trajectories (random map). We compare these two scenarios by generalizing the theory of open chaotic systems and introducing a time-dependent conditionally-map-invariant measure. For the same perturbation strength we…
▽ More
We investigate the effects of random perturbations on fully chaotic open systems. Perturbations can be applied to each trajectory independently (white noise) or simultaneously to all trajectories (random map). We compare these two scenarios by generalizing the theory of open chaotic systems and introducing a time-dependent conditionally-map-invariant measure. For the same perturbation strength we show that the escape rate of the random map is always larger than that of the noisy map. In random maps we show that the escape rate $κ$ and dimensions $D$ of the relevant fractal sets often depend nonmonotonically on the intensity of the random perturbation. We discuss the accuracy (bias) and precision (variance) of finite-size estimators of $κ$ and $D$, and show that the improvement of the precision of the estimations with the number of trajectories $N$ is extremely slow ($\propto 1/\ln N$). We also argue that the finite-size $D$ estimators are typically biased. General theoretical results are combined with analytical calculations and numerical simulations in area-preserving baker maps.
△ Less
Submitted 10 May, 2015; v1 submitted 4 November, 2012;
originally announced November 2012.
-
Leaking Chaotic Systems
Authors:
Eduardo G. Altmann,
Jefferson S. E. Portela,
Tamás Tél
Abstract:
There are numerous physical situations in which a hole or leak is introduced in an otherwise closed chaotic system. The leak can have a natural origin, it can mimic measurement devices, and it can also be used to reveal dynamical properties of the closed system. In this paper we provide an unified treatment of leaking systems and we review applications to different physical problems, both in the c…
▽ More
There are numerous physical situations in which a hole or leak is introduced in an otherwise closed chaotic system. The leak can have a natural origin, it can mimic measurement devices, and it can also be used to reveal dynamical properties of the closed system. In this paper we provide an unified treatment of leaking systems and we review applications to different physical problems, both in the classical and quantum pictures. Our treatment is based on the transient chaos theory of open systems, which is essential because real leaks have finite size and therefore estimations based on the closed system differ essentially from observations. The field of applications reviewed is very broad, ranging from planetary astronomy and hydrodynamical flows, to plasma physics and quantum fidelity. The theory is expanded and adapted to the case of partial leaks (partial absorption/transmission) with applications to room acoustics and optical microcavities in mind. Simulations in the lima .con family of billiards illustrate the main text. Regarding billiard dynamics, we emphasize that a correct discrete time representation can only be given in terms of the so- called true-time maps, while traditional Poincar é maps lead to erroneous results. We generalize Perron-Frobenius-type operators so that they describe true-time maps with partial leaks.
△ Less
Submitted 6 June, 2013; v1 submitted 1 August, 2012;
originally announced August 2012.
-
On the origin of long-range correlations in texts
Authors:
Eduardo G. Altmann,
Giampaolo Cristadoro,
Mirko Degli Esposti
Abstract:
The complexity of human interactions with social and natural phenomena is mirrored in the way we describe our experiences through natural language. In order to retain and convey such a high dimensional information, the statistical properties of our linguistic output has to be highly correlated in time. An example are the robust observations, still largely not understood, of correlations on arbitra…
▽ More
The complexity of human interactions with social and natural phenomena is mirrored in the way we describe our experiences through natural language. In order to retain and convey such a high dimensional information, the statistical properties of our linguistic output has to be highly correlated in time. An example are the robust observations, still largely not understood, of correlations on arbitrary long scales in literary texts. In this paper we explain how long-range correlations flow from highly structured linguistic levels down to the building blocks of a text (words, letters, etc..). By combining calculations and data analysis we show that correlations take form of a bursty sequence of events once we approach the semantically relevant topics of the text. The mechanisms we identify are fairly general and can be equally applied to other hierarchical settings.
△ Less
Submitted 3 July, 2012;
originally announced July 2012.
-
Analysis of an information-theoretic model for communication
Authors:
Ronald Dickman,
Nicholas R. Moloney,
Eduardo G. Altmann
Abstract:
We study the cost-minimization problem posed by Ferrer i Cancho and Solé in their model of communication that aimed at explaining the origin of Zipf's law [PNAS 100, 788 (2003)]. Direct analysis shows that the minimum cost is $\min {λ, 1-λ}$, where $λ$ determines the relative weights of speaker's and hearer's costs in the total, as shown in several previous works using different approaches. The na…
▽ More
We study the cost-minimization problem posed by Ferrer i Cancho and Solé in their model of communication that aimed at explaining the origin of Zipf's law [PNAS 100, 788 (2003)]. Direct analysis shows that the minimum cost is $\min {λ, 1-λ}$, where $λ$ determines the relative weights of speaker's and hearer's costs in the total, as shown in several previous works using different approaches. The nature and multiplicity of the minimizing solution changes discontinuously at $λ=1/2$, being qualitatively different for $λ< 1/2$, $λ> 1/2$, and $λ=1/2$. Zipf's law is found only in a vanishing fraction of the minimum-cost solutions at $λ= 1/2$ and therefore is not explained by this model. Imposing the further condition of equal costs yields distributions substantially closer to Zipf's law, but significant differences persist. We also investigate the solutions reached by the previously used minimization algorithm and find that they correctly recover global minimum states at the transition.
△ Less
Submitted 30 November, 2012; v1 submitted 1 July, 2012;
originally announced July 2012.
-
Comparing intermittency and network measurements of words and their dependency on authorship
Authors:
Diego R. Amancio,
Eduardo G. Altmann,
Osvaldo N. Oliveira Jr.,
Luciano da F. Costa
Abstract:
Many features from texts and languages can now be inferred from statistical analyses using concepts from complex networks and dynamical systems. In this paper we quantify how topological properties of word co-occurrence networks and intermittency (or burstiness) in word distribution depend on the style of authors. Our database contains 40 books from 8 authors who lived in the 19th and 20th centuri…
▽ More
Many features from texts and languages can now be inferred from statistical analyses using concepts from complex networks and dynamical systems. In this paper we quantify how topological properties of word co-occurrence networks and intermittency (or burstiness) in word distribution depend on the style of authors. Our database contains 40 books from 8 authors who lived in the 19th and 20th centuries, for which the following network measurements were obtained: clustering coefficient, average shortest path lengths, and betweenness. We found that the two factors with stronger dependency on the authors were the skewness in the distribution of word intermittency and the average shortest paths. Other factors such as the betweeness and the Zipf's law exponent show only weak dependency on authorship. Also assessed was the contribution from each measurement to authorship recognition using three machine learning methods. The best performance was a ca. 65 % accuracy upon combining complex network and intermittency features with the nearest neighbor algorithm. From a detailed analysis of the interdependence of the various metrics it is concluded that the methods used here are complementary for providing short- and long-scale perspectives of texts, which are useful for applications such as identification of topical words and information retrieval.
△ Less
Submitted 27 December, 2011;
originally announced December 2011.
-
Niche as a determinant of word fate in online groups
Authors:
Eduardo G. Altmann,
Janet B. Pierrehumbert,
Adilson E. Motter
Abstract:
Patterns of word use both reflect and influence a myriad of human activities and interactions. Like other entities that are reproduced and evolve, words rise or decline depending upon a complex interplay between {their intrinsic properties and the environments in which they function}. Using Internet discussion communities as model systems, we define the concept of a word niche as the relationship…
▽ More
Patterns of word use both reflect and influence a myriad of human activities and interactions. Like other entities that are reproduced and evolve, words rise or decline depending upon a complex interplay between {their intrinsic properties and the environments in which they function}. Using Internet discussion communities as model systems, we define the concept of a word niche as the relationship between the word and the characteristic features of the environments in which it is used. We develop a method to quantify two important aspects of the size of the word niche: the range of individuals using the word and the range of topics it is used to discuss. Controlling for word frequency, we show that these aspects of the word niche are strong determinants of changes in word frequency. Previous studies have already indicated that word frequency itself is a correlate of word success at historical time scales. Our analysis of changes in word frequencies over time reveals that the relative sizes of word niches are far more important than word frequencies in the dynamics of the entire vocabulary at shorter time scales, as the language adapts to new concepts and social grou**s. We also distinguish endogenous versus exogenous factors as additional contributors to the fates of words, and demonstrate the force of this distinction in the rise of novel words. Our results indicate that short-term nonstationarity in word statistics is strongly driven by individual proclivities, including inclinations to provide novel information and to project a distinctive social identity.
△ Less
Submitted 2 June, 2011; v1 submitted 16 September, 2010;
originally announced September 2010.
-
Noise-enhanced trap** in chaotic scattering
Authors:
Eduardo G. Altmann,
Antonio Endler
Abstract:
We show that noise enhances the trap** of trajectories in scattering systems. In fully chaotic systems, the decay rate can decrease with increasing noise due to a generic mismatch between the noiseless escape rate and the value predicted by the Liouville measure of the exit set. In Hamiltonian systems with mixed phase space we show that noise leads to a slower algebraic decay due to trajectories…
▽ More
We show that noise enhances the trap** of trajectories in scattering systems. In fully chaotic systems, the decay rate can decrease with increasing noise due to a generic mismatch between the noiseless escape rate and the value predicted by the Liouville measure of the exit set. In Hamiltonian systems with mixed phase space we show that noise leads to a slower algebraic decay due to trajectories performing a random walk inside Kolmogorov-Arnold-Moser islands. We argue that these noise-enhanced trap** mechanisms exist in most scattering systems and are likely to be dominant for small noise intensities, which is confirmed through a detailed investigation in the Henon map. Our results can be tested in fluid experiments, affect the fractal Weyl's law of quantum systems, and modify the estimations of chemical reaction rates based on phase-space transition state theory.
△ Less
Submitted 10 December, 2010; v1 submitted 23 August, 2010;
originally announced August 2010.
-
Beyond word frequency: Bursts, lulls, and scaling in the temporal distributions of words
Authors:
Eduardo G. Altmann,
Janet B. Pierrehumbert,
Adilson E. Motter
Abstract:
Background: Zipf's discovery that word frequency distributions obey a power law established parallels between biological and physical processes, and language, laying the groundwork for a complex systems perspective on human communication. More recent research has also identified scaling regularities in the dynamics underlying the successive occurrences of events, suggesting the possibility of si…
▽ More
Background: Zipf's discovery that word frequency distributions obey a power law established parallels between biological and physical processes, and language, laying the groundwork for a complex systems perspective on human communication. More recent research has also identified scaling regularities in the dynamics underlying the successive occurrences of events, suggesting the possibility of similar findings for language as well.
Methodology/Principal Findings: By considering frequent words in USENET discussion groups and in disparate databases where the language has different levels of formality, here we show that the distributions of distances between successive occurrences of the same word display bursty deviations from a Poisson process and are well characterized by a stretched exponential (Weibull) scaling. The extent of this deviation depends strongly on semantic type -- a measure of the logicality of each word -- and less strongly on frequency. We develop a generative model of this behavior that fully determines the dynamics of word usage.
Conclusions/Significance: Recurrence patterns of words are well described by a stretched exponential distribution of recurrence times, an empirical scaling that cannot be anticipated from Zipf's law. Because the use of words provides a uniquely precise and powerful lens on human thought and activity, our findings also have implications for other overt manifestations of collective human dynamics.
△ Less
Submitted 11 November, 2009; v1 submitted 15 January, 2009;
originally announced January 2009.
-
Non-Hamiltonian dynamics in optical microcavities resulting from wave-inspired corrections to geometric optics
Authors:
Eduardo G. Altmann,
Gianluigi Del Magno,
Martina Hentschel
Abstract:
We introduce and investigate billiard systems with an adjusted ray dynamics that accounts for modifications of the conventional reflection of rays due to universal wave effects. We show that even small modifications of the specular reflection law have dramatic consequences on the phase space of classical billiards. These include the creation of regions of non-Hamiltonian dynamics, the breakdown…
▽ More
We introduce and investigate billiard systems with an adjusted ray dynamics that accounts for modifications of the conventional reflection of rays due to universal wave effects. We show that even small modifications of the specular reflection law have dramatic consequences on the phase space of classical billiards. These include the creation of regions of non-Hamiltonian dynamics, the breakdown of symmetries, and changes in the stability and morphology of periodic orbits. Focusing on optical microcavities, we show that our adjusted dynamics provides the missing ray counterpart to previously observed wave phenomena and we describe how to observe its signatures in experiments. Our findings also apply to acoustic and ultrasound waves and are important in all situations where wavelengths are comparable to system sizes, an increasingly likely situation considering the systematic reduction of the size of electronic and photonic devices.
△ Less
Submitted 19 September, 2008; v1 submitted 29 May, 2008;
originally announced May 2008.
-
Emission from dielectric cavities in terms of invariant sets of the chaotic ray dynamics
Authors:
Eduardo G. Altmann
Abstract:
In this paper, the chaotic ray dynamics inside dielectric cavities is described by the properties of an invariant chaotic saddle. I show that the localization of the far field emission in specific directions is related to the filamentary pattern of the saddle's unstable manifold, along which the energy inside the cavity is distributed. For cavities with mixed phase space, the chaotic saddle is d…
▽ More
In this paper, the chaotic ray dynamics inside dielectric cavities is described by the properties of an invariant chaotic saddle. I show that the localization of the far field emission in specific directions is related to the filamentary pattern of the saddle's unstable manifold, along which the energy inside the cavity is distributed. For cavities with mixed phase space, the chaotic saddle is divided in hyperbolic and non-hyperbolic components, related, respectively, to the intermediate exponential (t<t_c) and the asymptotic power-law (t>t_c) decay of the energy inside the cavity. The alignment of the manifolds of the two components of the saddle explains why even if the energy concentration inside the cavity dramatically changes from t<t_c to t>t_c, the far field emission changes only slightly. Simulations in the annular billiard confirm and illustrate the predictions.
△ Less
Submitted 1 February, 2009; v1 submitted 14 May, 2008;
originally announced May 2008.
-
Precursors of extreme increments
Authors:
Sarah Hallerberg,
Eduardo G. Altmann,
Detlef Holstein,
Holger Kantz
Abstract:
We investigate precursors and predictability of extreme increments in a time series. The events we are focusing on consist in large increments within successive time steps. We are especially interested in understanding how the quality of the predictions depends on the strategy to choose precursors, on the size of the event and on the correlation strength. We study the prediction of extreme incre…
▽ More
We investigate precursors and predictability of extreme increments in a time series. The events we are focusing on consist in large increments within successive time steps. We are especially interested in understanding how the quality of the predictions depends on the strategy to choose precursors, on the size of the event and on the correlation strength. We study the prediction of extreme increments analytically in an AR(1) process, and numerically in wind speed recordings and long-range correlated ARMA data. We evaluate the success of predictions via receiver operator characteristics (ROC-curves). Furthermore, we observe an increase of the quality of predictions with increasing event size and with decreasing correlation in all examples. Both effects can be understood by using the likelihood ratio as a summary index for smooth ROC-curves.
△ Less
Submitted 12 September, 2006; v1 submitted 20 April, 2006;
originally announced April 2006.
-
Reactions to extreme events: moving threshold model
Authors:
Eduardo G. Altmann,
Sarah Hallerberg,
Holger Kantz
Abstract:
In spite of precautions to avoid the harmful effects of extreme events, we experience recurrently phenomena that overcome the preventive barriers. These barriers usually increase drastically right after the occurrence of such extreme events, but steadily decay in their absence. In this paper we consider a simple model that mimics the evolution of the protection barriers to study the efficiency o…
▽ More
In spite of precautions to avoid the harmful effects of extreme events, we experience recurrently phenomena that overcome the preventive barriers. These barriers usually increase drastically right after the occurrence of such extreme events, but steadily decay in their absence. In this paper we consider a simple model that mimics the evolution of the protection barriers to study the efficiency of the system's reaction to extreme events and how it changes our perception of the sequence of extreme events itself. We obtain that the usual method of fighting extreme events introduces a periodicity in their occurrence and is generally less efficient than the use of a constant barrier. On the other hand, it shows a good adaptation to the presence of slow non-stationarities.
△ Less
Submitted 23 August, 2005;
originally announced August 2005.
-
Recurrence time analysis, long-term correlations, and extreme events
Authors:
Eduardo G. Altmann,
Holger Kantz
Abstract:
The recurrence times between extreme events have been the central point of statistical analyses in many different areas of science. Simultaneously, the Poincaré recurrence time has been extensively used to characterize nonlinear dynamical systems. We compare the main properties of these statistical methods pointing out their consequences for the recurrence analysis performed in time series. In p…
▽ More
The recurrence times between extreme events have been the central point of statistical analyses in many different areas of science. Simultaneously, the Poincaré recurrence time has been extensively used to characterize nonlinear dynamical systems. We compare the main properties of these statistical methods pointing out their consequences for the recurrence analysis performed in time series. In particular, we analyze the dependence of the mean recurrence time and of the recurrence time statistics on the probability density function, on the interval whereto the recurrences are observed, and on the temporal correlations of time series. In the case of long-term correlations, we verify the validity of the stretched exponential distribution, which is uniquely defined by the exponent $γ$, at the same time showing that it is restricted to the class of linear long-term correlated processes. Simple transformations are able to modify the correlations of time series leading to stretched exponentials recurrence time statistics with different $γ$, which shows a lack of invariance under the change of observables.
△ Less
Submitted 8 March, 2005;
originally announced March 2005.