-
Recovering lost and absent information in temporal networks
Authors:
James P. Bagrow,
Sune Lehmann
Abstract:
The full range of activity in a temporal network is captured in its edge activity data -- time series encoding the tie strengths or on-off dynamics of each edge in the network. However, in many practical applications, edge-level data are unavailable, and the network analyses must rely instead on node activity data which aggregates the edge-activity data and thus is less informative. This raises th…
▽ More
The full range of activity in a temporal network is captured in its edge activity data -- time series encoding the tie strengths or on-off dynamics of each edge in the network. However, in many practical applications, edge-level data are unavailable, and the network analyses must rely instead on node activity data which aggregates the edge-activity data and thus is less informative. This raises the question: Is it possible to use the static network to recover the richer edge activities from the node activities? Here we show that recovery is possible, often with a surprising degree of accuracy given how much information is lost, and that the recovered data are useful for subsequent network analysis tasks. Recovery is more difficult when network density increases, either topologically or dynamically, but exploiting dynamical and topological sparsity enables effective solutions to the recovery problem. We formally characterize the difficulty of the recovery problem both theoretically and empirically, proving the conditions under which recovery errors can be bounded and showing that, even when these conditions are not met, good quality solutions can still be derived. Effective recovery carries both promise and peril, as it enables deeper scientific study of complex systems but in the context of social systems also raises privacy concerns when social information can be aggregated across multiple data sources.
△ Less
Submitted 22 July, 2021;
originally announced July 2021.
-
Contrasting social and non-social sources of predictability in human mobility
Authors:
Zexun Chen,
Sean Kelty,
Brooke Foucault Welles,
James P. Bagrow,
Ronaldo Menezes,
Gourab Ghoshal
Abstract:
Social structures influence a variety of human behaviors including mobility patterns, but the extent to which one individual's movements can predict another's remains an open question. Further, latent information about an individual's mobility can be present in the mobility patterns of both social and non-social ties, a distinction that has not yet been addressed. Here we develop a "colocation" ne…
▽ More
Social structures influence a variety of human behaviors including mobility patterns, but the extent to which one individual's movements can predict another's remains an open question. Further, latent information about an individual's mobility can be present in the mobility patterns of both social and non-social ties, a distinction that has not yet been addressed. Here we develop a "colocation" network to distinguish the mobility patterns of an ego's social ties from those of non-social colocators, individuals not socially connected to the ego but who nevertheless arrive at a location at the same time as the ego. We apply entropy and predictability measures to analyse and bound the predictive information of an individual's mobility pattern and the flow of that information from their top social ties and from their non-social colocators. While social ties generically provide more information than non-social colocators, we find that significant information is present in the aggregation of non-social colocators: 3-7 colocators can provide as much predictive information as the top social tie, and colocators can replace up to 85% of the predictive information about an ego, compared with social ties that can replace up to 94% of the ego's predictability. The presence of predictive information among non-social colocators raises privacy concerns: given the increasing availability of real-time mobility traces from smartphones, individuals sharing data may be providing actionable information not just about their own movements but the movements of others whose data are absent, both known and unknown individuals.
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
The sociospatial factors of death: Analyzing effects of geospatially-distributed variables in a Bayesian mortality model for Hong Kong
Authors:
Thayer Alshaabi,
David Rushing Dewhurst,
James P. Bagrow,
Peter Sheridan Dodds,
Christopher M. Danforth
Abstract:
Human mortality is in part a function of multiple socioeconomic factors that differ both spatially and temporally. Adjusting for other covariates, the human lifespan is positively associated with household wealth. However, the extent to which mortality in a geographical region is a function of socioeconomic factors in both that region and its neighbors is unclear. There is also little information…
▽ More
Human mortality is in part a function of multiple socioeconomic factors that differ both spatially and temporally. Adjusting for other covariates, the human lifespan is positively associated with household wealth. However, the extent to which mortality in a geographical region is a function of socioeconomic factors in both that region and its neighbors is unclear. There is also little information on the temporal components of this relationship. Using the districts of Hong Kong over multiple census years as a case study, we demonstrate that there are differences in how wealth indicator variables are associated with longevity in (a) areas that are affluent but neighbored by socially deprived districts versus (b) wealthy areas surrounded by similarly wealthy districts. We also show that the inclusion of spatially-distributed variables reduces uncertainty in mortality rate predictions in each census year when compared with a baseline model. Our results suggest that geographic mortality models should incorporate nonlocal information (e.g., spatial neighbors) to lower the variance of their mortality estimates, and point to a more in-depth analysis of sociospatial spillover effects on mortality rates.
△ Less
Submitted 25 January, 2021; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Complex contagion features without social reinforcement in a model of social information flow
Authors:
Tyson Pond,
Saranzaya Magsarjav,
Tobin South,
Lewis Mitchell,
James P. Bagrow
Abstract:
Contagion models are a primary lens through which we understand the spread of information over social networks. However, simple contagion models cannot reproduce the complex features observed in real-world data, leading to research on more complicated complex contagion models. A noted feature of complex contagion is social reinforcement that individuals require multiple exposures to information be…
▽ More
Contagion models are a primary lens through which we understand the spread of information over social networks. However, simple contagion models cannot reproduce the complex features observed in real-world data, leading to research on more complicated complex contagion models. A noted feature of complex contagion is social reinforcement that individuals require multiple exposures to information before they begin to spread it themselves. Here we show that the quoter model, a model of the social flow of written information over a network, displays features of complex contagion, including the weakness of long ties and that increased density inhibits rather than promotes information flow. Interestingly, the quoter model exhibits these features despite having no explicit social reinforcement mechanism, unlike complex contagion models. Our results highlight the need to complement contagion models with an information-theoretic view of information spreading to better understand how network properties affect information flow and what are the most necessary ingredients when modeling social behavior.
△ Less
Submitted 26 February, 2020; v1 submitted 12 February, 2020;
originally announced February 2020.
-
UAFS: Uncertainty-Aware Feature Selection for Problems with Missing Data
Authors:
Andrew J. Becker,
James P. Bagrow
Abstract:
Missing data are a concern in many real world data sets and imputation methods are often needed to estimate the values of missing data, but data sets with excessive missingness and high dimensionality challenge most approaches to imputation. Here we show that appropriate feature selection can be an effective preprocessing step for imputation, allowing for more accurate imputation and subsequent mo…
▽ More
Missing data are a concern in many real world data sets and imputation methods are often needed to estimate the values of missing data, but data sets with excessive missingness and high dimensionality challenge most approaches to imputation. Here we show that appropriate feature selection can be an effective preprocessing step for imputation, allowing for more accurate imputation and subsequent model predictions. The key feature of this preprocessing is that it incorporates uncertainty: by accounting for uncertainty due to missingness when selecting features we can reduce the degree of missingness while also limiting the number of uninformative features being used to make predictive models. We introduce a method to perform uncertainty-aware feature selection (UAFS), provide a theoretical motivation, and test UAFS on both real and synthetic problems, demonstrating that across a variety of data sets and levels of missingness we can improve the accuracy of imputations. Improved imputation due to UAFS also results in improved prediction accuracy when performing supervised learning using these imputed data sets. Our UAFS method is general and can be fruitfully coupled with a variety of imputation methods.
△ Less
Submitted 20 April, 2021; v1 submitted 2 April, 2019;
originally announced April 2019.
-
An information-theoretic, all-scales approach to comparing networks
Authors:
James P. Bagrow,
Erik M. Bollt
Abstract:
As network research becomes more sophisticated, it is more common than ever for researchers to find themselves not studying a single network but needing to analyze sets of networks. An important task when working with sets of networks is network comparison, develo** a similarity or distance measure between networks so that meaningful comparisons can be drawn. The best means to accomplish this ta…
▽ More
As network research becomes more sophisticated, it is more common than ever for researchers to find themselves not studying a single network but needing to analyze sets of networks. An important task when working with sets of networks is network comparison, develo** a similarity or distance measure between networks so that meaningful comparisons can be drawn. The best means to accomplish this task remains an open area of research. Here we introduce a new measure to compare networks, the Network Portrait Divergence, that is mathematically principled, incorporates the topological characteristics of networks at all structural scales, and is general-purpose and applicable to all types of networks. An important feature of our measure that enables many of its useful properties is that it is based on a graph invariant, the network portrait. We test our measure on both synthetic graphs and real world networks taken from protein interaction data, neuroscience, and computational social science applications. The Network Portrait Divergence reveals important characteristics of multilayer and temporal networks extracted from data.
△ Less
Submitted 25 July, 2019; v1 submitted 10 April, 2018;
originally announced April 2018.
-
The quoter model: a paradigmatic model of the social flow of written information
Authors:
James P. Bagrow,
Lewis Mitchell
Abstract:
We propose a model for the social flow of information in the form of text data, which simulates the posting and sharing of short social media posts. Nodes in a graph representing a social network take turns generating words, leading to a symbolic time series associated with each node. Information propagates over the graph via a quoting mechanism, where nodes randomly copy short segments of text fr…
▽ More
We propose a model for the social flow of information in the form of text data, which simulates the posting and sharing of short social media posts. Nodes in a graph representing a social network take turns generating words, leading to a symbolic time series associated with each node. Information propagates over the graph via a quoting mechanism, where nodes randomly copy short segments of text from each other. We characterize information flows from these text via information-theoretic estimators, and we derive analytic relationships between model parameters and the values of these estimators. We explore and validate the model with simulations on small network motifs and larger random graphs. Tractable models such as ours that generate symbolic data while controlling the information flow allow us to test and compare measures of information flow applicable to real social media data. In particular, by choosing different network structures, we can develop test scenarios to determine whether or not measures of information flow can distinguish between true and spurious interactions, and how topological network properties relate to information flow.
△ Less
Submitted 11 July, 2018; v1 submitted 1 November, 2017;
originally announced November 2017.
-
Crowdsourcing Predictors of Residential Electric Energy Usage
Authors:
Mark D. Wagy,
Josh C. Bongard,
James P. Bagrow,
Paul D. H. Hines
Abstract:
Crowdsourcing has been successfully applied in many domains including astronomy, cryptography and biology. In order to test its potential for useful application in a Smart Grid context, this paper investigates the extent to which a crowd can contribute predictive hypotheses to a model of residential electric energy consumption. In this experiment, the crowd generated hypotheses about factors that…
▽ More
Crowdsourcing has been successfully applied in many domains including astronomy, cryptography and biology. In order to test its potential for useful application in a Smart Grid context, this paper investigates the extent to which a crowd can contribute predictive hypotheses to a model of residential electric energy consumption. In this experiment, the crowd generated hypotheses about factors that make one home different from another in terms of monthly energy usage. To implement this concept, we deployed a web-based system within which 627 residential electricity customers posed 632 questions that they thought predictive of energy usage. While this occurred, the same group provided 110,573 answers to these questions as they accumulated. Thus users both suggested the hypotheses that drive a predictive model and provided the data upon which the model is built. We used the resulting question and answer data to build a predictive model of monthly electric energy consumption, using random forest regression. Because of the sparse nature of the answer data, careful statistical work was needed to ensure that these models are valid. The results indicate that the crowd can generate useful hypotheses, despite the sparse nature of the dataset.
△ Less
Submitted 8 September, 2017;
originally announced September 2017.
-
Information flow reveals prediction limits in online social activity
Authors:
James P. Bagrow,
Xipei Liu,
Lewis Mitchell
Abstract:
Modern society depends on the flow of information over online social networks, and users of popular platforms generate significant behavioral data about themselves and their social ties. However, it remains unclear what fundamental limits exist when using these data to predict the activities and interests of individuals, and to what accuracy such predictions can be made using an individual's socia…
▽ More
Modern society depends on the flow of information over online social networks, and users of popular platforms generate significant behavioral data about themselves and their social ties. However, it remains unclear what fundamental limits exist when using these data to predict the activities and interests of individuals, and to what accuracy such predictions can be made using an individual's social ties. Here we show that 95% of the potential predictive accuracy for an individual is achievable using their social ties only, without requiring that individual's data. We use information theoretic tools to estimate the predictive information within the writings of Twitter users, providing an upper bound on the available predictive information that holds for any predictive or machine learning methods. As few as 8-9 of an individual's contacts are sufficient to obtain predictability comparable to that of the individual alone. Distinct temporal and social effects are visible by measuring information flow along social ties, allowing us to better study the dynamics of online activity. Our results have distinct privacy implications: information is so strongly embedded in a social network that in principle one can profile an individual from their available social ties even when the individual forgoes the platform completely.
△ Less
Submitted 9 February, 2019; v1 submitted 15 August, 2017;
originally announced August 2017.
-
Information spreading during emergencies and anomalous events
Authors:
James P. Bagrow
Abstract:
The most critical time for information to spread is in the aftermath of a serious emergency, crisis, or disaster. Individuals affected by such situations can now turn to an array of communication channels, from mobile phone calls and text messages to social media posts, when alerting social ties. These channels drastically improve the speed of information in a time-sensitive event, and provide ext…
▽ More
The most critical time for information to spread is in the aftermath of a serious emergency, crisis, or disaster. Individuals affected by such situations can now turn to an array of communication channels, from mobile phone calls and text messages to social media posts, when alerting social ties. These channels drastically improve the speed of information in a time-sensitive event, and provide extant records of human dynamics during and afterward the event. Retrospective analysis of such anomalous events provides researchers with a class of "found experiments" that may be used to better understand social spreading. In this chapter, we study information spreading due to a number of emergency events, including the Boston Marathon Bombing and a plane crash at a western European airport. We also contrast the different information which may be gleaned by social media data compared with mobile phone data and we estimate the rate of anomalous events in a mobile phone dataset using a proposed anomaly detection method.
△ Less
Submitted 21 March, 2017;
originally announced March 2017.
-
Which friends are more popular than you? Contact strength and the friendship paradox in social networks
Authors:
James P. Bagrow,
Christopher M. Danforth,
Lewis Mitchell
Abstract:
The friendship paradox states that in a social network, egos tend to have lower degree than their alters, or, "your friends have more friends than you do". Most research has focused on the friendship paradox and its implications for information transmission, but treating the network as static and unweighted. Yet, people can dedicate only a finite fraction of their attention budget to each social i…
▽ More
The friendship paradox states that in a social network, egos tend to have lower degree than their alters, or, "your friends have more friends than you do". Most research has focused on the friendship paradox and its implications for information transmission, but treating the network as static and unweighted. Yet, people can dedicate only a finite fraction of their attention budget to each social interaction: a high-degree individual may have less time to dedicate to individual social links, forcing them to modulate the quantities of contact made to their different social ties. Here we study the friendship paradox in the context of differing contact volumes between egos and alters, finding a connection between contact volume and the strength of the friendship paradox. The most frequently contacted alters exhibit a less pronounced friendship paradox compared with the ego, whereas less-frequently contacted alters are more likely to be high degree and give rise to the paradox. We argue therefore for a more nuanced version of the friendship paradox: "your closest friends have slightly more friends than you do", and in certain networks even: "your best friend has no more friends than you do". We demonstrate that this relationship is robust, holding in both a social media and a mobile phone dataset. These results have implications for information transfer and influence in social networks, which we explore using a simple dynamical model.
△ Less
Submitted 18 March, 2017;
originally announced March 2017.
-
Reply & Supply: Efficient crowdsourcing when workers do more than answer questions
Authors:
Thomas C. McAndrew,
Elizaveta A. Guseva,
James P. Bagrow
Abstract:
Crowdsourcing works by distributing many small tasks to large numbers of workers, yet the true potential of crowdsourcing lies in workers doing more than performing simple tasks---they can apply their experience and creativity to provide new and unexpected information to the crowdsourcer. One such case is when workers not only answer a crowdsourcer's questions but also contribute new questions for…
▽ More
Crowdsourcing works by distributing many small tasks to large numbers of workers, yet the true potential of crowdsourcing lies in workers doing more than performing simple tasks---they can apply their experience and creativity to provide new and unexpected information to the crowdsourcer. One such case is when workers not only answer a crowdsourcer's questions but also contribute new questions for subsequent crowd analysis, leading to a growing set of questions. This growth creates an inherent bias for early questions since a question introduced earlier by a worker can be answered by more subsequent workers than a question introduced later. Here we study how to perform efficient crowdsourcing with such growing question sets. By modeling question sets as networks of interrelated questions, we introduce algorithms to help curtail the growth bias by efficiently distributing workers between exploring new questions and addressing current questions. Experiments and simulations demonstrate that these algorithms can efficiently explore an unbounded set of questions without losing confidence in crowd answers.
△ Less
Submitted 14 August, 2017; v1 submitted 3 November, 2016;
originally announced November 2016.
-
Transitions in climate and energy discourse between Hurricanes Katrina and Sandy
Authors:
Emily M. Cody,
Jennie C. Stephens,
James P. Bagrow,
Peter Sheridan Dodds,
Christopher M. Danforth
Abstract:
Although climate change and energy are intricately linked, their explicit connection is not always prominent in public discourse and the media. Disruptive extreme weather events, including hurricanes, focus public attention in new and different ways, offering a unique window of opportunity to analyze how a focusing event influences public discourse. Media coverage of extreme weather events simulta…
▽ More
Although climate change and energy are intricately linked, their explicit connection is not always prominent in public discourse and the media. Disruptive extreme weather events, including hurricanes, focus public attention in new and different ways, offering a unique window of opportunity to analyze how a focusing event influences public discourse. Media coverage of extreme weather events simultaneously shapes and reflects public discourse on climate issues. Here we analyze climate and energy newspaper coverage of Hurricanes Katrina (2005) and Sandy (2012) using topic models, mathematical techniques used to discover abstract topics within a set of documents. Our results demonstrate that post-Katrina media coverage does not contain a climate change topic, and the energy topic is limited to discussion of energy prices, markets, and the economy with almost no explicit linkages made between energy and climate change. In contrast, post-Sandy media coverage does contain a prominent climate change topic, a distinct energy topic, as well as integrated representation of climate change and energy, indicating a shift in climate and energy reporting between Hurricane Katrina and Hurricane Sandy.
△ Less
Submitted 25 April, 2016; v1 submitted 19 October, 2015;
originally announced October 2015.
-
Reply to Garcia et al.: Common mistakes in measuring frequency dependent word characteristics
Authors:
P. S. Dodds,
E. M. Clark,
S. Desu,
M. R. Frank,
A. J. Reagan,
J. R. Williams,
L. Mitchell,
K. D. Harris,
I. M. Kloumann,
J. P. Bagrow,
K. Megerdoomian,
M. T. McMahon,
B. F. Tivnan,
C. M. Danforth
Abstract:
We demonstrate that the concerns expressed by Garcia et al. are misplaced, due to (1) a misreading of our findings in [1]; (2) a widespread failure to examine and present words in support of asserted summary quantities based on word usage frequencies; and (3) a range of misconceptions about word usage frequency, word rank, and expert-constructed word lists. In particular, we show that the English…
▽ More
We demonstrate that the concerns expressed by Garcia et al. are misplaced, due to (1) a misreading of our findings in [1]; (2) a widespread failure to examine and present words in support of asserted summary quantities based on word usage frequencies; and (3) a range of misconceptions about word usage frequency, word rank, and expert-constructed word lists. In particular, we show that the English component of our study compares well statistically with two related surveys, that no survey design influence is apparent, and that estimates of measurement error do not explain the positivity biases reported in our work and that of others. We further demonstrate that for the frequency dependence of positivity---of which we explored the nuances in great detail in [1]---Garcia et al. did not perform a reanalysis of our data---they instead carried out an analysis of a different, statistically improper data set and introduced a nonlinearity before performing linear regression.
△ Less
Submitted 28 May, 2015; v1 submitted 25 May, 2015;
originally announced May 2015.
-
Robustness of Spatial Micronetworks
Authors:
Thomas C. McAndrew,
Christopher M. Danforth,
James P. Bagrow
Abstract:
Power lines, roadways, pipelines and other physical infrastructure are critical to modern society. These structures may be viewed as spatial networks where geographic distances play a role in the functionality and construction cost of links. Traditionally, studies of network robustness have primarily considered the connectedness of large, random networks. Yet for spatial infrastructure physical di…
▽ More
Power lines, roadways, pipelines and other physical infrastructure are critical to modern society. These structures may be viewed as spatial networks where geographic distances play a role in the functionality and construction cost of links. Traditionally, studies of network robustness have primarily considered the connectedness of large, random networks. Yet for spatial infrastructure physical distances must also play a role in network robustness. Understanding the robustness of small spatial networks is particularly important with the increasing interest in microgrids, small-area distributed power grids that are well suited to using renewable energy resources. We study the random failures of links in small networks where functionality depends on both spatial distance and topological connectedness. By introducing a percolation model where the failure of each link is proportional to its spatial length, we find that, when failures depend on spatial distances, networks are more fragile than expected. Accounting for spatial effects in both construction and robustness is important for designing efficient microgrids and other network infrastructure.
△ Less
Submitted 23 January, 2015;
originally announced January 2015.
-
Constructing a taxonomy of fine-grained human movement and activity motifs through social media
Authors:
Morgan R. Frank,
Jake Ryland Williams,
Lewis Mitchell,
James P. Bagrow,
Peter Sheridan Dodds,
Christopher M. Danforth
Abstract:
Profiting from the emergence of web-scale social data sets, numerous recent studies have systematically explored human mobility patterns over large populations and large time scales. Relatively little attention, however, has been paid to mobility and activity over smaller time-scales, such as a day. Here, we use Twitter to identify people's frequently visited locations along with their likely acti…
▽ More
Profiting from the emergence of web-scale social data sets, numerous recent studies have systematically explored human mobility patterns over large populations and large time scales. Relatively little attention, however, has been paid to mobility and activity over smaller time-scales, such as a day. Here, we use Twitter to identify people's frequently visited locations along with their likely activities as a function of time of day and day of week, capitalizing on both the content and geolocation of messages. We subsequently characterize people's transition pattern motifs and demonstrate that spatial information is encoded in word choice.
△ Less
Submitted 11 May, 2015; v1 submitted 28 September, 2014;
originally announced October 2014.
-
Text mixing shapes the anatomy of rank-frequency distributions: A modern Zipfian mechanics for natural language
Authors:
Jake Ryland Williams,
James P. Bagrow,
Christopher M. Danforth,
Peter Sheridan Dodds
Abstract:
Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf's law which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this `law' of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora over the last 15 years have revealed the existence o…
▽ More
Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf's law which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this `law' of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora over the last 15 years have revealed the existence of two scaling regimes. These regimes have thus far been explained by a hypothesis suggesting a separability of languages into core and non-core lexica. Here, we present and defend an alternative hypothesis, that the two scaling regimes result from the act of aggregating texts. We observe that text mixing leads to an effective decay of word introduction, which we show provides accurate predictions of the location and severity of breaks in scaling. Upon examining large corpora from 10 languages in the Project Gutenberg eBooks collection (eBooks), we find emphatic empirical support for the universality of our claim.
△ Less
Submitted 30 January, 2015; v1 submitted 12 September, 2014;
originally announced September 2014.
-
Understanding the group dynamics and success of teams
Authors:
Michael Klug,
James P. Bagrow
Abstract:
Complex problems often require coordinated group effort and can consume significant resources, yet our understanding of how teams form and succeed has been limited by a lack of large-scale, quantitative data. We analyze activity traces and success levels for ~150,000 self-organized, online team projects. While larger teams tend to be more successful, workload is highly focused across the team, wit…
▽ More
Complex problems often require coordinated group effort and can consume significant resources, yet our understanding of how teams form and succeed has been limited by a lack of large-scale, quantitative data. We analyze activity traces and success levels for ~150,000 self-organized, online team projects. While larger teams tend to be more successful, workload is highly focused across the team, with only a few members performing most work. We find that highly successful teams are significantly more focused than average teams of the same size, that their members have worked on more diverse sets of projects, and the members of highly successful teams are more likely to be core members or 'leads' of other teams. The relations between team success and size, focus and especially team experience cannot be explained by confounding factors such as team age, external contributions from non-team members, nor by group mechanisms such as social loafing. Taken together, these features point to organizational principles that may maximize the success of collaborative endeavors.
△ Less
Submitted 20 April, 2016; v1 submitted 10 July, 2014;
originally announced July 2014.
-
Zipf's law holds for phrases, not words
Authors:
Jake Ryland Williams,
Paul R. Lessard,
Suma Desu,
Eric Clark,
James P. Bagrow,
Christopher M. Danforth,
Peter Sheridan Dodds
Abstract:
With Zipf's law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically…
▽ More
With Zipf's law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically that Zipf's law for phrases extends over as many as nine orders of rank magnitude. In doing so, we develop a principled and scalable statistical mechanical method of random text partitioning, which opens up a rich frontier of rigorous text analysis via a rank ordering of mixed length phrases.
△ Less
Submitted 4 March, 2015; v1 submitted 19 June, 2014;
originally announced June 2014.
-
Human language reveals a universal positivity bias
Authors:
Peter Sheridan Dodds,
Eric M. Clark,
Suma Desu,
Morgan R. Frank,
Andrew J. Reagan,
Jake Ryland Williams,
Lewis Mitchell,
Kameron Decker Harris,
Isabel M. Kloumann,
James P. Bagrow,
Karine Megerdoomian,
Matthew T. McMahon,
Brian F. Tivnan,
Christopher M. Danforth
Abstract:
Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (1) the words of natural human language possess a universal positivity bias; (2) the estimated emotional content of words is consistent between languages under translation; and (3) this positivity bias i…
▽ More
Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (1) the words of natural human language possess a universal positivity bias; (2) the estimated emotional content of words is consistent between languages under translation; and (3) this positivity bias is strongly independent of frequency of word usage. Alongside these general regularities, we describe inter-language variations in the emotional spectrum of languages which allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts.
△ Less
Submitted 15 June, 2014;
originally announced June 2014.
-
Quantifying Information Flow During Emergencies
Authors:
Liang Gao,
Chaoming Song,
Ziyou Gao,
Albert-László Barabási,
James P. Bagrow,
Dashun Wang
Abstract:
Recent advances on human dynamics have focused on the normal patterns of human activities, with the quantitative understanding of human behavior under extreme events remaining a crucial missing chapter. This has a wide array of potential applications, ranging from emergency response and detection to traffic control and management. Previous studies have shown that human communications are both temp…
▽ More
Recent advances on human dynamics have focused on the normal patterns of human activities, with the quantitative understanding of human behavior under extreme events remaining a crucial missing chapter. This has a wide array of potential applications, ranging from emergency response and detection to traffic control and management. Previous studies have shown that human communications are both temporally and spatially localized following the onset of emergencies, indicating that social propagation is a primary means to propagate situational awareness. We study real anomalous events using country-wide mobile phone data, finding that information flow during emergencies is dominated by repeated communications. We further demonstrate that the observed communication patterns cannot be explained by inherent reciprocity in social networks, and are universal across different demographics.
△ Less
Submitted 7 January, 2014;
originally announced January 2014.
-
Shadow networks: Discovering hidden nodes with models of information flow
Authors:
James P. Bagrow,
Suma Desu,
Morgan R. Frank,
Narine Manukyan,
Lewis Mitchell,
Andrew Reagan,
Eric E. Bloedorn,
Lashon B. Booker,
Luther K. Branting,
Michael J. Smith,
Brian F. Tivnan,
Christopher M. Danforth,
Peter S. Dodds,
Joshua C. Bongard
Abstract:
Complex, dynamic networks underlie many systems, and understanding these networks is the concern of a great span of important scientific and engineering problems. Quantitative description is crucial for this understanding yet, due to a range of measurement problems, many real network datasets are incomplete. Here we explore how accidentally missing or deliberately hidden nodes may be detected in n…
▽ More
Complex, dynamic networks underlie many systems, and understanding these networks is the concern of a great span of important scientific and engineering problems. Quantitative description is crucial for this understanding yet, due to a range of measurement problems, many real network datasets are incomplete. Here we explore how accidentally missing or deliberately hidden nodes may be detected in networks by the effect of their absence on predictions of the speed with which information flows through the network. We use Symbolic Regression (SR) to learn models relating information flow to network topology. These models show localized, systematic, and non-random discrepancies when applied to test networks with intentionally masked nodes, demonstrating the ability to detect the presence of missing nodes and where in the network those nodes are likely to reside.
△ Less
Submitted 20 December, 2013;
originally announced December 2013.
-
Robustness of skeletons and salient features in networks
Authors:
Louis M. Shekhtman,
James P. Bagrow,
Dirk Brockmann
Abstract:
Real world network datasets often contain a wealth of complex topological information. In the face of these data, researchers often employ methods to extract reduced networks containing the most important structures or pathways, sometimes known as `skeletons' or `backbones'. Numerous such methods have been developed. Yet data are often noisy or incomplete, with unknown numbers of missing or spurio…
▽ More
Real world network datasets often contain a wealth of complex topological information. In the face of these data, researchers often employ methods to extract reduced networks containing the most important structures or pathways, sometimes known as `skeletons' or `backbones'. Numerous such methods have been developed. Yet data are often noisy or incomplete, with unknown numbers of missing or spurious links. Relatively little effort has gone into understanding how salient network extraction methods perform in the face of noisy or incomplete networks. We study this problem by comparing how the salient features extracted by two popular methods change when networks are perturbed, either by deleting nodes or links, or by randomly rewiring links. Our results indicate that simple, global statistics for skeletons can be accurately inferred even for noisy and incomplete network data, but it is crucial to have complete, reliable data to use the exact topologies of skeletons or backbones. These results also help us understand how skeletons respond to damage to the network itself, as in an attack scenario.
△ Less
Submitted 15 September, 2013;
originally announced September 2013.
-
Natural emergence of clusters and bursts in network evolution
Authors:
James P. Bagrow,
Dirk Brockmann
Abstract:
Network models with preferential attachment, where new nodes are injected into the network and form links with existing nodes proportional to their current connectivity, have been well studied for some time. Extensions have been introduced where nodes attach proportionally to arbitrary fitness functions. However, in these models, attaching to a node always increases the ability of that node to gai…
▽ More
Network models with preferential attachment, where new nodes are injected into the network and form links with existing nodes proportional to their current connectivity, have been well studied for some time. Extensions have been introduced where nodes attach proportionally to arbitrary fitness functions. However, in these models, attaching to a node always increases the ability of that node to gain more links in the future. We study network growth where nodes attach proportionally to the clustering coefficients, or local densities of triangles, of existing nodes. Attaching to a node typically lowers its clustering coefficient, in contrast to preferential attachment or rich-get-richer models. This simple modification naturally leads to a variety of rich phenomena, including aging, non-Poissonian bursty dynamics, and community formation. This theoretical model shows that complex network structure can be generated without artificially imposing multiple dynamical mechanisms and may reveal potentially overlooked mechanisms present in complex systems.
△ Less
Submitted 25 June, 2013; v1 submitted 14 September, 2012;
originally announced September 2012.
-
The role of caretakers in disease dynamics
Authors:
Charleston Noble,
James P. Bagrow,
Dirk Brockmann
Abstract:
One of the key challenges in modeling the dynamics of contagion phenomena is to understand how the structure of social interactions shapes the time course of a disease. Complex network theory has provided significant advances in this context. However, awareness of an epidemic in a population typically yields behavioral changes that correspond to changes in the network structure on which the diseas…
▽ More
One of the key challenges in modeling the dynamics of contagion phenomena is to understand how the structure of social interactions shapes the time course of a disease. Complex network theory has provided significant advances in this context. However, awareness of an epidemic in a population typically yields behavioral changes that correspond to changes in the network structure on which the disease evolves. This feedback mechanism has not been investigated in depth. For example, one would intuitively expect susceptible individuals to avoid other infecteds. However, doctors treating patients or parents tending sick children may also increase the amount of contact made with an infecteds, in an effort to speed up recovery but also exposing themselves to higher risks of infection. We study the role of these caretaker links in an adaptive network models where individuals react to a disease by increasing or decreasing the amount of contact they make with infected individuals. We find that pure avoidance, with only few caretaker links, is the best strategy for curtailing an SIS disease in networks that possess a large topological variability. In more homogeneous networks, disease prevalence is decreased for low concentrations of caretakers whereas a high prevalence emerges if caretaker concentration passes a well defined critical value.
△ Less
Submitted 11 September, 2012;
originally announced September 2012.
-
Is coaching experience associated with effective use of timeouts in basketball?
Authors:
Serguei Saavedra,
Satyam Mukherjee,
James P. Bagrow
Abstract:
Experience is an important asset in almost any professional activity. In basketball, there is believed to be a positive association between coaching experience and effective use of team timeouts. Here, we analyze both the extent to which a team's change in scoring margin per possession after timeouts deviate from the team's average scoring margin per possession---what we called timeout factor, and…
▽ More
Experience is an important asset in almost any professional activity. In basketball, there is believed to be a positive association between coaching experience and effective use of team timeouts. Here, we analyze both the extent to which a team's change in scoring margin per possession after timeouts deviate from the team's average scoring margin per possession---what we called timeout factor, and the extent to which this performance measure is associated with coaching experience across all teams in the National Basketball Association over the 2009-2012 seasons. We find that timeout factor plays a minor role in the scoring dynamics of basketball. Surprisingly, we find that timeout factor is negatively associated with coaching experience. Our findings support empirical studies showing that, under certain conditions, mentors early in their careers can have a stronger positive impact on their teams than later in their careers.
△ Less
Submitted 21 September, 2012; v1 submitted 7 May, 2012;
originally announced May 2012.
-
Mesoscopic structure and social aspects of human mobility
Authors:
James P. Bagrow,
Yu-Ru Lin
Abstract:
The individual movements of large numbers of people are important in many contexts, from urban planning to disease spreading. Datasets that capture human mobility are now available and many interesting features have been discovered, including the ultra-slow spatial growth of individual mobility. However, the detailed substructures and spatiotemporal flows of mobility - the sets and sequences of vi…
▽ More
The individual movements of large numbers of people are important in many contexts, from urban planning to disease spreading. Datasets that capture human mobility are now available and many interesting features have been discovered, including the ultra-slow spatial growth of individual mobility. However, the detailed substructures and spatiotemporal flows of mobility - the sets and sequences of visited locations - have not been well studied. We show that individual mobility is dominated by small groups of frequently visited, dynamically close locations, forming primary "habitats" capturing typical daily activity, along with subsidiary habitats representing additional travel. These habitats do not correspond to typical contexts such as home or work. The temporal evolution of mobility within habitats, which constitutes most motion, is universal across habitats and exhibits scaling patterns both distinct from all previous observations and unpredicted by current models. The delay to enter subsidiary habitats is a primary factor in the spatiotemporal growth of human travel. Interestingly, habitats correlate with non-mobility dynamics such as communication activity, implying that habitats may influence processes such as information spreading and revealing new connections between human mobility and social networks.
△ Less
Submitted 7 June, 2012; v1 submitted 1 February, 2012;
originally announced February 2012.
-
Communities and bottlenecks: Trees and treelike networks have high modularity
Authors:
James P. Bagrow
Abstract:
Much effort has gone into understanding the modular nature of complex networks. Communities, also known as clusters or modules, are typically considered to be densely interconnected groups of nodes that are only sparsely connected to other groups in the network. Discovering high quality communities is a difficult and important problem in a number of areas. The most popular approach is the objectiv…
▽ More
Much effort has gone into understanding the modular nature of complex networks. Communities, also known as clusters or modules, are typically considered to be densely interconnected groups of nodes that are only sparsely connected to other groups in the network. Discovering high quality communities is a difficult and important problem in a number of areas. The most popular approach is the objective function known as modularity, used both to discover communities and to measure their strength. To understand the modular structure of networks it is then crucial to know how such functions evaluate different topologies, what features they account for, and what implicit assumptions they may make. We show that trees and treelike networks can have unexpectedly and often arbitrarily high values of modularity. This is surprising since trees are maximally sparse connected graphs and are not typically considered to possess modular structure, yet the nonlocal null model used by modularity assigns low probabilities, and thus high significance, to the densities of these sparse tree communities. We further study the practical performance of popular methods on model trees and on a genealogical data set and find that the discovered communities also have very high modularity, often approaching its maximum value. Statistical tests reveal the communities in trees to be significant, in contrast with known results for partitions of sparse, random graphs.
△ Less
Submitted 25 June, 2012; v1 submitted 3 January, 2012;
originally announced January 2012.
-
Flavor network and the principles of food pairing
Authors:
Yong-Yeol Ahn,
Sebastian E. Ahnert,
James P. Bagrow,
Albert-László Barabási
Abstract:
The cultural diversity of culinary practice, as illustrated by the variety of regional cuisines, raises the question of whether there are any general patterns that determine the ingredient combinations used in food today or principles that transcend individual tastes and recipes. We introduce a flavor network that captures the flavor compounds shared by culinary ingredients. Western cuisines show…
▽ More
The cultural diversity of culinary practice, as illustrated by the variety of regional cuisines, raises the question of whether there are any general patterns that determine the ingredient combinations used in food today or principles that transcend individual tastes and recipes. We introduce a flavor network that captures the flavor compounds shared by culinary ingredients. Western cuisines show a tendency to use ingredient pairs that share many flavor compounds, supporting the so-called food pairing hypothesis. By contrast, East Asian cuisines tend to avoid compound sharing ingredients. Given the increasing availability of information on food preparation, our data-driven investigation opens new avenues towards a systematic understanding of culinary practice.
△ Less
Submitted 25 November, 2011;
originally announced November 2011.
-
More Voices Than Ever? Quantifying Media Bias in Networks
Authors:
Yu-Ru Lin,
James P. Bagrow,
David Lazer
Abstract:
Social media, such as blogs, are often seen as democratic entities that allow more voices to be heard than the conventional mass or elite media. Some also feel that social media exhibits a balancing force against the arguably slanted elite media. A systematic comparison between social and mainstream media is necessary but challenging due to the scale and dynamic nature of modern communication. Her…
▽ More
Social media, such as blogs, are often seen as democratic entities that allow more voices to be heard than the conventional mass or elite media. Some also feel that social media exhibits a balancing force against the arguably slanted elite media. A systematic comparison between social and mainstream media is necessary but challenging due to the scale and dynamic nature of modern communication. Here we propose empirical measures to quantify the extent and dynamics of social (blog) and mainstream (news) media bias. We focus on a particular form of bias---coverage quantity---as applied to stories about the 111th US Congress. We compare observed coverage of Members of Congress against a null model of unbiased coverage, testing for biases with respect to political party, popular front runners, regions of the country, and more. Our measures suggest distinct characteristics in news and blog media. A simple generative model, in agreement with data, reveals differences in the process of coverage selection between the two media.
△ Less
Submitted 4 November, 2011;
originally announced November 2011.
-
Collective response of human populations to large-scale emergencies
Authors:
James P. Bagrow,
Dashun Wang,
Albert-László Barabási
Abstract:
Despite recent advances in uncovering the quantitative features of stationary human activity patterns, many applications, from pandemic prediction to emergency response, require an understanding of how these patterns change when the population encounters unfamiliar conditions. To explore societal response to external perturbations we identified real-time changes in communication and mobility patte…
▽ More
Despite recent advances in uncovering the quantitative features of stationary human activity patterns, many applications, from pandemic prediction to emergency response, require an understanding of how these patterns change when the population encounters unfamiliar conditions. To explore societal response to external perturbations we identified real-time changes in communication and mobility patterns in the vicinity of eight emergencies, such as bomb attacks and earthquakes, comparing these with eight non-emergencies, like concerts and sporting events. We find that communication spikes accompanying emergencies are both spatially and temporally localized, but information about emergencies spreads globally, resulting in communication avalanches that engage in a significant manner the social network of eyewitnesses. These results offer a quantitative view of behavioral changes in human activity under extreme conditions, with potential long-term impact on emergency detection and response.
△ Less
Submitted 3 June, 2011;
originally announced June 2011.
-
Robustness and modular structure in networks
Authors:
James P. Bagrow,
Sune Lehmann,
Yong-Yeol Ahn
Abstract:
Complex networks have recently attracted much interest due to their prevalence in nature and our daily lives [1, 2]. A critical property of a network is its resilience to random breakdown and failure [3-6], typically studied as a percolation problem [7-9] or by modeling cascading failures [10-12]. Many complex systems, from power grids and the Internet to the brain and society [13-15], can be mode…
▽ More
Complex networks have recently attracted much interest due to their prevalence in nature and our daily lives [1, 2]. A critical property of a network is its resilience to random breakdown and failure [3-6], typically studied as a percolation problem [7-9] or by modeling cascading failures [10-12]. Many complex systems, from power grids and the Internet to the brain and society [13-15], can be modeled using modular networks comprised of small, densely connected groups of nodes [16, 17]. These modules often overlap, with network elements belonging to multiple modules [18, 19]. Yet existing work on robustness has not considered the role of overlap**, modular structure. Here we study the robustness of these systems to the failure of elements. We show analytically and empirically that it is possible for the modules themselves to become uncoupled or non-overlap** well before the network disintegrates. If overlap** modular organization plays a role in overall functionality, networks may be far more vulnerable than predicted by conventional percolation theory.
△ Less
Submitted 6 January, 2016; v1 submitted 24 February, 2011;
originally announced February 2011.
-
Investigating Bimodal Clustering in Human Mobility
Authors:
James P. Bagrow,
Tal Koren
Abstract:
We apply a simple clustering algorithm to a large dataset of cellular telecommunication records, reducing the complexity of mobile phone users' full trajectories and allowing for simple statistics to characterize their properties. For the case of two clusters, we quantify how clustered human mobility is, how much of a user's spatial dispersion is due to motion between clusters, and how spatially…
▽ More
We apply a simple clustering algorithm to a large dataset of cellular telecommunication records, reducing the complexity of mobile phone users' full trajectories and allowing for simple statistics to characterize their properties. For the case of two clusters, we quantify how clustered human mobility is, how much of a user's spatial dispersion is due to motion between clusters, and how spatially and temporally separated clusters are from one another.
△ Less
Submitted 3 November, 2009;
originally announced November 2009.
-
Link communities reveal multiscale complexity in networks
Authors:
Yong-Yeol Ahn,
James P. Bagrow,
Sune Lehmann
Abstract:
Networks have become a key approach to understanding systems of interacting objects, unifying the study of diverse phenomena including biological organisms and human society. One crucial step when studying the structure and dynamics of networks is to identify communities: groups of related nodes that correspond to functional subunits such as protein complexes or social spheres. Communities in netw…
▽ More
Networks have become a key approach to understanding systems of interacting objects, unifying the study of diverse phenomena including biological organisms and human society. One crucial step when studying the structure and dynamics of networks is to identify communities: groups of related nodes that correspond to functional subunits such as protein complexes or social spheres. Communities in networks often overlap such that nodes simultaneously belong to several groups. Meanwhile, many networks are known to possess hierarchical organization, where communities are recursively grouped into a hierarchical structure. However, the fact that many real networks have communities with pervasive overlap, where each and every node belongs to more than one group, has the consequence that a global hierarchy of nodes cannot capture the relationships between overlap** groups. Here we reinvent communities as groups of links rather than nodes and show that this unorthodox approach successfully reconciles the antagonistic organizing principles of overlap** communities and hierarchy. In contrast to the existing literature, which has entirely focused on grou** nodes, link communities naturally incorporate overlap while revealing hierarchical organization. We find relevant link communities in many networks, including major biological networks such as protein-protein interaction and metabolic networks, and show that a large social network contains hierarchically organized community structures spanning inner-city to regional scales while maintaining pervasive overlap. Our results imply that link communities are fundamental building blocks that reveal overlap and hierarchical organization in networks to be two aspects of the same phenomenon.
△ Less
Submitted 14 September, 2010; v1 submitted 18 March, 2009;
originally announced March 2009.
-
Dynamic Computation of Network Statistics via Updating Schema
Authors:
Jie Sun,
James P. Bagrow,
Erik M. Bollt,
Joesph D. Skufca
Abstract:
In this paper we derive an updating scheme for calculating some important network statistics such as degree, clustering coefficient, etc., aiming at reduce the amount of computation needed to track the evolving behavior of large networks; and more importantly, to provide efficient methods for potential use of modeling the evolution of networks. Using the updating scheme, the network statistics c…
▽ More
In this paper we derive an updating scheme for calculating some important network statistics such as degree, clustering coefficient, etc., aiming at reduce the amount of computation needed to track the evolving behavior of large networks; and more importantly, to provide efficient methods for potential use of modeling the evolution of networks. Using the updating scheme, the network statistics can be computed and updated easily and much faster than re-calculating each time for large evolving networks. The update formula can also be used to determine which edge/node will lead to the extremal change of network statistics, providing a way of predicting or designing evolution rule of networks.
△ Less
Submitted 26 September, 2008;
originally announced September 2008.
-
Evaluating Local Community Methods in Networks
Authors:
James P. Bagrow
Abstract:
We present a new benchmarking procedure that is unambiguous and specific to local community-finding methods, allowing one to compare the accuracy of various methods. We apply this to new and existing algorithms. A simple class of synthetic benchmark networks is also developed, capable of testing properties specific to these local methods.
We present a new benchmarking procedure that is unambiguous and specific to local community-finding methods, allowing one to compare the accuracy of various methods. We apply this to new and existing algorithms. A simple class of synthetic benchmark networks is also developed, capable of testing properties specific to these local methods.
△ Less
Submitted 15 November, 2007; v1 submitted 26 June, 2007;
originally announced June 2007.
-
On the Google-Fame of Scientists and Other Populations
Authors:
James P. Bagrow,
Daniel ben-Avraham
Abstract:
We study the fame distribution of scientists and other social groups as measured by the number of Google hits garnered by individuals in the population. Past studies have found that the fame distribution decays either in power-law [arXiv:cond-mat/0310049] or exponential [Europhys. Lett., 67, (4) 511-516 (2004)] fashion, depending on whether individuals in the social group in question enjoy true…
▽ More
We study the fame distribution of scientists and other social groups as measured by the number of Google hits garnered by individuals in the population. Past studies have found that the fame distribution decays either in power-law [arXiv:cond-mat/0310049] or exponential [Europhys. Lett., 67, (4) 511-516 (2004)] fashion, depending on whether individuals in the social group in question enjoy true fame or not. In our present study we examine critically Google counts as well as the methods of data analysis. While the previous findings are corroborated in our present study, we find that, in most situations, the data available does not allow for sharp conclusions.
△ Less
Submitted 5 April, 2005;
originally announced April 2005.