Search | arXiv e-print repository

Recovering lost and absent information in temporal networks

Abstract: The full range of activity in a temporal network is captured in its edge activity data -- time series encoding the tie strengths or on-off dynamics of each edge in the network. However, in many practical applications, edge-level data are unavailable, and the network analyses must rely instead on node activity data which aggregates the edge-activity data and thus is less informative. This raises th… ▽ More The full range of activity in a temporal network is captured in its edge activity data -- time series encoding the tie strengths or on-off dynamics of each edge in the network. However, in many practical applications, edge-level data are unavailable, and the network analyses must rely instead on node activity data which aggregates the edge-activity data and thus is less informative. This raises the question: Is it possible to use the static network to recover the richer edge activities from the node activities? Here we show that recovery is possible, often with a surprising degree of accuracy given how much information is lost, and that the recovered data are useful for subsequent network analysis tasks. Recovery is more difficult when network density increases, either topologically or dynamically, but exploiting dynamical and topological sparsity enables effective solutions to the recovery problem. We formally characterize the difficulty of the recovery problem both theoretically and empirically, proving the conditions under which recovery errors can be bounded and showing that, even when these conditions are not met, good quality solutions can still be derived. Effective recovery carries both promise and peril, as it enables deeper scientific study of complex systems but in the context of social systems also raises privacy concerns when social information can be aggregated across multiple data sources. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: 19 pages, 5 figures, 1 table, plus supporting information

arXiv:2104.13282 [pdf, other]

doi 10.1038/s41467-022-29592-y

Contrasting social and non-social sources of predictability in human mobility

Authors: Zexun Chen, Sean Kelty, Brooke Foucault Welles, James P. Bagrow, Ronaldo Menezes, Gourab Ghoshal

Abstract: Social structures influence a variety of human behaviors including mobility patterns, but the extent to which one individual's movements can predict another's remains an open question. Further, latent information about an individual's mobility can be present in the mobility patterns of both social and non-social ties, a distinction that has not yet been addressed. Here we develop a "colocation" ne… ▽ More Social structures influence a variety of human behaviors including mobility patterns, but the extent to which one individual's movements can predict another's remains an open question. Further, latent information about an individual's mobility can be present in the mobility patterns of both social and non-social ties, a distinction that has not yet been addressed. Here we develop a "colocation" network to distinguish the mobility patterns of an ego's social ties from those of non-social colocators, individuals not socially connected to the ego but who nevertheless arrive at a location at the same time as the ego. We apply entropy and predictability measures to analyse and bound the predictive information of an individual's mobility pattern and the flow of that information from their top social ties and from their non-social colocators. While social ties generically provide more information than non-social colocators, we find that significant information is present in the aggregation of non-social colocators: 3-7 colocators can provide as much predictive information as the top social tie, and colocators can replace up to 85% of the predictive information about an ego, compared with social ties that can replace up to 94% of the ego's predictability. The presence of predictive information among non-social colocators raises privacy concerns: given the increasing availability of real-time mobility traces from smartphones, individuals sharing data may be providing actionable information not just about their own movements but the movements of others whose data are absent, both known and unknown individuals. △ Less

Submitted 27 April, 2021; originally announced April 2021.

Comments: 20 pages, 6 figures

arXiv:2103.12820 [pdf, other]

A Review & Framework for Modeling Complex Engineered System Development Processes

Authors: John Meluso, Jesse Austin-Breneman, James P. Bagrow, Laurent Hébert-Dufresne

Abstract: Develo** complex engineered systems (CES) poses significant challenges for engineers, managers, designers, and businesspeople alike due to the inherent complexity of the systems and contexts involved. Furthermore, experts have expressed great interest in filling the gap in theory about how CES develop. This article begins to address that gap in two ways. First, it reviews the numerous definition… ▽ More Develo** complex engineered systems (CES) poses significant challenges for engineers, managers, designers, and businesspeople alike due to the inherent complexity of the systems and contexts involved. Furthermore, experts have expressed great interest in filling the gap in theory about how CES develop. This article begins to address that gap in two ways. First, it reviews the numerous definitions of CES along with existing theory and methods on CES development processes. Then, it proposes the ComplEx System Integrated Utilities Model (CESIUM), a novel framework for exploring how numerous system and development process characteristics may affect the performance of CES. CESIUM creates simulated representations of a system architecture, the corresponding engineering organization, and the new product development process through which the organization designs the system. It does so by representing the system as a network of interdependent artifacts designed by agents. Agents iteratively design their artifacts through optimization and share information with other agents, thereby advancing the CES toward a solution. This paper describes the model, conducts a sensitivity analysis, provides validation, and suggests directions for future study. △ Less

Submitted 24 March, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

arXiv:2103.11007 [pdf, other]

doi 10.1109/MSR52588.2021.00036

Which contributions count? Analysis of attribution in open source

Authors: Jean-Gabriel Young, Amanda Casari, Katie McLaughlin, Milo Z. Trujillo, Laurent Hébert-Dufresne, James P. Bagrow

Abstract: Open source software projects usually acknowledge contributions with text files, websites, and other idiosyncratic methods. These data sources are hard to mine, which is why contributorship is most frequently measured through changes to repositories, such as commits, pushes, or patches. Recently, some open source projects have taken to recording contributor actions with standardized systems; this… ▽ More Open source software projects usually acknowledge contributions with text files, websites, and other idiosyncratic methods. These data sources are hard to mine, which is why contributorship is most frequently measured through changes to repositories, such as commits, pushes, or patches. Recently, some open source projects have taken to recording contributor actions with standardized systems; this opens up a unique opportunity to understand how community-generated notions of contributorship map onto codebases as the measure of contribution. Here, we characterize contributor acknowledgment models in open source by analyzing thousands of projects that use a model called All Contributors to acknowledge diverse contributions like outreach, finance, infrastructure, and community management. We analyze the life cycle of projects through this model's lens and contrast its representation of contributorship with the picture given by other methods of acknowledgment, including GitHub's top committers indicator and contributions derived from actions taken on the platform. We find that community-generated systems of contribution acknowledgment make work like idea generation or bug finding more visible, which generates a more extensive picture of collaboration. Further, we find that models requiring explicit attribution lead to more clearly defined boundaries around what is and what is not a contribution. △ Less

Submitted 19 March, 2021; originally announced March 2021.

Comments: Extended version of a paper accepted at MSR 2021

Journal ref: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp. 242-253 (2021)

arXiv:2006.08527 [pdf, other]

doi 10.1371/journal.pone.0247795

The sociospatial factors of death: Analyzing effects of geospatially-distributed variables in a Bayesian mortality model for Hong Kong

Authors: Thayer Alshaabi, David Rushing Dewhurst, James P. Bagrow, Peter Sheridan Dodds, Christopher M. Danforth

Abstract: Human mortality is in part a function of multiple socioeconomic factors that differ both spatially and temporally. Adjusting for other covariates, the human lifespan is positively associated with household wealth. However, the extent to which mortality in a geographical region is a function of socioeconomic factors in both that region and its neighbors is unclear. There is also little information… ▽ More Human mortality is in part a function of multiple socioeconomic factors that differ both spatially and temporally. Adjusting for other covariates, the human lifespan is positively associated with household wealth. However, the extent to which mortality in a geographical region is a function of socioeconomic factors in both that region and its neighbors is unclear. There is also little information on the temporal components of this relationship. Using the districts of Hong Kong over multiple census years as a case study, we demonstrate that there are differences in how wealth indicator variables are associated with longevity in (a) areas that are affluent but neighbored by socially deprived districts versus (b) wealthy areas surrounded by similarly wealthy districts. We also show that the inclusion of spatially-distributed variables reduces uncertainty in mortality rate predictions in each census year when compared with a baseline model. Our results suggest that geographic mortality models should incorporate nonlocal information (e.g., spatial neighbors) to lower the variance of their mortality estimates, and point to a more in-depth analysis of sociospatial spillover effects on mortality rates. △ Less

Submitted 25 January, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: 26 pages (15 main, 11 appendix), 22 figures (6 main, 11 appendix), 2 tables

arXiv:2002.05035 [pdf, other]

doi 10.3390/e22030265

Complex contagion features without social reinforcement in a model of social information flow

Authors: Tyson Pond, Saranzaya Magsarjav, Tobin South, Lewis Mitchell, James P. Bagrow

Abstract: Contagion models are a primary lens through which we understand the spread of information over social networks. However, simple contagion models cannot reproduce the complex features observed in real-world data, leading to research on more complicated complex contagion models. A noted feature of complex contagion is social reinforcement that individuals require multiple exposures to information be… ▽ More Contagion models are a primary lens through which we understand the spread of information over social networks. However, simple contagion models cannot reproduce the complex features observed in real-world data, leading to research on more complicated complex contagion models. A noted feature of complex contagion is social reinforcement that individuals require multiple exposures to information before they begin to spread it themselves. Here we show that the quoter model, a model of the social flow of written information over a network, displays features of complex contagion, including the weakness of long ties and that increased density inhibits rather than promotes information flow. Interestingly, the quoter model exhibits these features despite having no explicit social reinforcement mechanism, unlike complex contagion models. Our results highlight the need to complement contagion models with an information-theoretic view of information spreading to better understand how network properties affect information flow and what are the most necessary ingredients when modeling social behavior. △ Less

Submitted 26 February, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

Comments: 18 pages, 9 figures, 1 table

Journal ref: Entropy 2020, 22(3), 265

arXiv:1912.05045 [pdf, other]

doi 10.1371/journal.pone.0244245

Efficient crowdsourcing of crowd-generated microtasks

Authors: Abigail Hotaling, James P. Bagrow

Abstract: Allowing members of the crowd to propose novel microtasks for one another is an effective way to combine the efficiencies of traditional microtask work with the inventiveness and hypothesis generation potential of human workers. However, microtask proposal leads to a growing set of tasks that may overwhelm limited crowdsourcer resources. Crowdsourcers can employ methods to utilize their resources… ▽ More Allowing members of the crowd to propose novel microtasks for one another is an effective way to combine the efficiencies of traditional microtask work with the inventiveness and hypothesis generation potential of human workers. However, microtask proposal leads to a growing set of tasks that may overwhelm limited crowdsourcer resources. Crowdsourcers can employ methods to utilize their resources efficiently, but algorithmic approaches to efficient crowdsourcing generally require a fixed task set of known size. In this paper, we introduce *cost forecasting* as a means for a crowdsourcer to use efficient crowdsourcing algorithms with a growing set of microtasks. Cost forecasting allows the crowdsourcer to decide between eliciting new tasks from the crowd or receiving responses to existing tasks based on whether or not new tasks will cost less to complete than existing tasks, efficiently balancing resources as crowdsourcing occurs. Experiments with real and synthetic crowdsourcing data show that cost forecasting leads to improved accuracy. Accuracy and efficiency gains for crowd-generated microtasks hold the promise to further leverage the creativity and wisdom of the crowd, with applications such as generating more informative and diverse training data for machine learning applications and improving the performance of user-generated content and question-answering platforms. △ Less

Submitted 21 December, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

Comments: 12 pages, 5 figures

Journal ref: PLoS ONE 15(12): e0244245, 2020

arXiv:1911.11395 [pdf, other]

doi 10.1098/rsif.2020.0667

Creativity in temporal social networks: How divergent thinking is impacted by one's choice of peers

Authors: Raiyan Abdul Baten, Daryl Bagley, Ashely Tenesaca, Famous Clark, James P. Bagrow, Gourab Ghoshal, Mohammed Ehsan Hoque

Abstract: Creativity is viewed as one of the most important skills in the context of future-of-work. In this paper, we explore how the dynamic (self-organizing) nature of social networks impacts the fostering of creative ideas. We run 6 trials (N=288) of a web-based experiment involving divergent ideation tasks. We find that network connections gradually adapt to individual creative performances, as the par… ▽ More Creativity is viewed as one of the most important skills in the context of future-of-work. In this paper, we explore how the dynamic (self-organizing) nature of social networks impacts the fostering of creative ideas. We run 6 trials (N=288) of a web-based experiment involving divergent ideation tasks. We find that network connections gradually adapt to individual creative performances, as the participants predominantly seek to follow high-performing peers for creative inspirations. We unearth both opportunities and bottlenecks afforded by such self-organization. While exposure to high-performing peers is associated with better creative performances of the followers, we see a counter-effect that choosing to follow the same peers introduces semantic similarities in the followers' ideas. We formulate an agent-based simulation model to capture these intuitions in a tractable manner, and experiment with corner cases of various simulation parameters to assess the generality of the findings. Our findings may help design large-scale interventions to improve the creative aptitude of people interacting in a social network. △ Less

Submitted 7 December, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

Journal ref: J. R. Soc. Interface, 17: 20200667, 2020

arXiv:1904.01385

UAFS: Uncertainty-Aware Feature Selection for Problems with Missing Data

Authors: Andrew J. Becker, James P. Bagrow

Abstract: Missing data are a concern in many real world data sets and imputation methods are often needed to estimate the values of missing data, but data sets with excessive missingness and high dimensionality challenge most approaches to imputation. Here we show that appropriate feature selection can be an effective preprocessing step for imputation, allowing for more accurate imputation and subsequent mo… ▽ More Missing data are a concern in many real world data sets and imputation methods are often needed to estimate the values of missing data, but data sets with excessive missingness and high dimensionality challenge most approaches to imputation. Here we show that appropriate feature selection can be an effective preprocessing step for imputation, allowing for more accurate imputation and subsequent model predictions. The key feature of this preprocessing is that it incorporates uncertainty: by accounting for uncertainty due to missingness when selecting features we can reduce the degree of missingness while also limiting the number of uninformative features being used to make predictive models. We introduce a method to perform uncertainty-aware feature selection (UAFS), provide a theoretical motivation, and test UAFS on both real and synthetic problems, demonstrating that across a variety of data sets and levels of missingness we can improve the accuracy of imputations. Improved imputation due to UAFS also results in improved prediction accuracy when performing supervised learning using these imputed data sets. Our UAFS method is general and can be fruitfully coupled with a variety of imputation methods. △ Less

Submitted 20 April, 2021; v1 submitted 2 April, 2019; originally announced April 2019.

Comments: Withdrawn due to errors in theoretical derivations

arXiv:1812.06038 [pdf, other]

Inferring the size of the causal universe: features and fusion of causal attribution networks

Authors: Daniel Berenberg, James P. Bagrow

Abstract: Cause-and-effect reasoning, the attribution of effects to causes, is one of the most powerful and unique skills humans possess. Multiple surveys are map** out causal attributions as networks, but it is unclear how well these efforts can be combined. Further, the total size of the collective causal attribution network held by humans is currently unknown, making it challenging to assess the progre… ▽ More Cause-and-effect reasoning, the attribution of effects to causes, is one of the most powerful and unique skills humans possess. Multiple surveys are map** out causal attributions as networks, but it is unclear how well these efforts can be combined. Further, the total size of the collective causal attribution network held by humans is currently unknown, making it challenging to assess the progress of these surveys. Here we study three causal attribution networks to determine how well they can be combined into a single network. Combining these networks requires dealing with ambiguous nodes, as nodes represent written descriptions of causes and effects and different descriptions may exist for the same concept. We introduce NetFUSES, a method for combining networks with ambiguous nodes. Crucially, treating the different causal attributions networks as independent samples allows us to use their overlap to estimate the total size of the collective causal attribution network. We find that existing surveys capture 5.77% $\pm$ 0.781% of the $\approx$293 000 causes and effects estimated to exist, and 0.198% $\pm$ 0.174% of the $\approx$10 200 000 attributed cause-effect relationships. △ Less

Submitted 14 December, 2018; originally announced December 2018.

Comments: 15 pages, 4 figures, 2 tables

arXiv:1810.03163 [pdf, other]

doi 10.1145/3274293

Efficient Crowd Exploration of Large Networks: The Case of Causal Attribution

Authors: Daniel Berenberg, James P. Bagrow

Abstract: Accurately and efficiently crowdsourcing complex, open-ended tasks can be difficult, as crowd participants tend to favor short, repetitive "microtasks". We study the crowdsourcing of large networks where the crowd provides the network topology via microtasks. Crowds can explore many types of social and information networks, but we focus on the network of causal attributions, an important network t… ▽ More Accurately and efficiently crowdsourcing complex, open-ended tasks can be difficult, as crowd participants tend to favor short, repetitive "microtasks". We study the crowdsourcing of large networks where the crowd provides the network topology via microtasks. Crowds can explore many types of social and information networks, but we focus on the network of causal attributions, an important network that signifies cause-and-effect relationships. We conduct experiments on Amazon Mechanical Turk (AMT) testing how workers propose and validate individual causal relationships and introduce a method for independent crowd workers to explore large networks. The core of the method, Iterative Pathway Refinement, is a theoretically-principled mechanism for efficient exploration via microtasks. We evaluate the method using synthetic networks and apply it on AMT to extract a large-scale causal attribution network, then investigate the structure of this network as well as the activity patterns and efficiency of the workers who constructed this network. Worker interactions reveal important characteristics of causal perception and the network data they generate can improve our understanding of causality and causal inference. △ Less

Submitted 7 October, 2018; originally announced October 2018.

Comments: 25 pages, 14 figures, in CSCW'18

Journal ref: PACM on Human-Computer Interaction, Vol. 2, No. CSCW, Article 24. Publication date: November 2018

arXiv:1805.06879 [pdf, other]

Neural language representations predict outcomes of scientific research

Authors: James P. Bagrow, Daniel Berenberg, Joshua Bongard

Abstract: Many research fields codify their findings in standard formats, often by reporting correlations between quantities of interest. But the space of all testable correlates is far larger than scientific resources can currently address, so the ability to accurately predict correlations would be useful to plan research and allocate resources. Using a dataset of approximately 170,000 correlational findin… ▽ More Many research fields codify their findings in standard formats, often by reporting correlations between quantities of interest. But the space of all testable correlates is far larger than scientific resources can currently address, so the ability to accurately predict correlations would be useful to plan research and allocate resources. Using a dataset of approximately 170,000 correlational findings extracted from leading social science journals, we show that a trained neural network can accurately predict the reported correlations using only the text descriptions of the correlates. Accurate predictive models such as these can guide scientists towards promising untested correlates, better quantify the information gained from new findings, and has implications for moving artificial intelligence systems from predicting structures to predicting relationships in the real world. △ Less

Submitted 17 May, 2018; originally announced May 2018.

Comments: 8 pages, 3 figures, plus supporting material

arXiv:1804.03665 [pdf, other]

doi 10.1007/s41109-019-0156-x

An information-theoretic, all-scales approach to comparing networks

Authors: James P. Bagrow, Erik M. Bollt

Abstract: As network research becomes more sophisticated, it is more common than ever for researchers to find themselves not studying a single network but needing to analyze sets of networks. An important task when working with sets of networks is network comparison, develo** a similarity or distance measure between networks so that meaningful comparisons can be drawn. The best means to accomplish this ta… ▽ More As network research becomes more sophisticated, it is more common than ever for researchers to find themselves not studying a single network but needing to analyze sets of networks. An important task when working with sets of networks is network comparison, develo** a similarity or distance measure between networks so that meaningful comparisons can be drawn. The best means to accomplish this task remains an open area of research. Here we introduce a new measure to compare networks, the Network Portrait Divergence, that is mathematically principled, incorporates the topological characteristics of networks at all structural scales, and is general-purpose and applicable to all types of networks. An important feature of our measure that enables many of its useful properties is that it is based on a graph invariant, the network portrait. We test our measure on both synthetic graphs and real world networks taken from protein interaction data, neuroscience, and computational social science applications. The Network Portrait Divergence reveals important characteristics of multilayer and temporal networks extracted from data. △ Less

Submitted 25 July, 2019; v1 submitted 10 April, 2018; originally announced April 2018.

Comments: 22 pages (double-spaced), 7 figures

Journal ref: Applied Network Science, 4 (1): 45 (2019)

arXiv:1802.05101 [pdf, other]

doi 10.7717/peerj-cs.296

Democratizing AI: Non-expert design of prediction tasks

Authors: James P. Bagrow

Abstract: Non-experts have long made important contributions to machine learning (ML) by contributing training data, and recent work has shown that non-experts can also help with feature engineering by suggesting novel predictive features. However, non-experts have only contributed features to prediction tasks already posed by experienced ML practitioners. Here we study how non-experts can design prediction… ▽ More Non-experts have long made important contributions to machine learning (ML) by contributing training data, and recent work has shown that non-experts can also help with feature engineering by suggesting novel predictive features. However, non-experts have only contributed features to prediction tasks already posed by experienced ML practitioners. Here we study how non-experts can design prediction tasks themselves, what types of tasks non-experts will design, and whether predictive models can be automatically trained on data sourced for their tasks. We use a crowdsourcing platform where non-experts design predictive tasks that are then categorized and ranked by the crowd. Crowdsourced data are collected for top-ranked tasks and predictive models are then trained and evaluated automatically using those data. We show that individuals without ML experience can collectively construct useful datasets and that predictive models can be learned on these datasets, but challenges remain. The prediction tasks designed by non-experts covered a broad range of domains, from politics and current events to health behavior, demographics, and more. Proper instructions are crucial for non-experts, so we also conducted a randomized trial to understand how different instructions may influence the types of prediction tasks being proposed. In general, understanding better how non-experts can contribute to ML can further leverage advances in Automatic ML and has important implications as ML continues to drive workplace automation. △ Less

Submitted 7 September, 2020; v1 submitted 14 February, 2018; originally announced February 2018.

Comments: 17 pages, 6 figures

Journal ref: PeerJ Computer Science, 6: e296, 2020

arXiv:1711.00326 [pdf, other]

doi 10.1063/1.5011403

The quoter model: a paradigmatic model of the social flow of written information

Authors: James P. Bagrow, Lewis Mitchell

Abstract: We propose a model for the social flow of information in the form of text data, which simulates the posting and sharing of short social media posts. Nodes in a graph representing a social network take turns generating words, leading to a symbolic time series associated with each node. Information propagates over the graph via a quoting mechanism, where nodes randomly copy short segments of text fr… ▽ More We propose a model for the social flow of information in the form of text data, which simulates the posting and sharing of short social media posts. Nodes in a graph representing a social network take turns generating words, leading to a symbolic time series associated with each node. Information propagates over the graph via a quoting mechanism, where nodes randomly copy short segments of text from each other. We characterize information flows from these text via information-theoretic estimators, and we derive analytic relationships between model parameters and the values of these estimators. We explore and validate the model with simulations on small network motifs and larger random graphs. Tractable models such as ours that generate symbolic data while controlling the information flow allow us to test and compare measures of information flow applicable to real social media data. In particular, by choosing different network structures, we can develop test scenarios to determine whether or not measures of information flow can distinguish between true and spurious interactions, and how topological network properties relate to information flow. △ Less

Submitted 11 July, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

Comments: 11 pages, 9 figures

Journal ref: Chaos 28, 075304 (2018)

arXiv:1709.02739 [pdf, other]

doi 10.1109/JSYST.2017.2778144

Crowdsourcing Predictors of Residential Electric Energy Usage

Authors: Mark D. Wagy, Josh C. Bongard, James P. Bagrow, Paul D. H. Hines

Abstract: Crowdsourcing has been successfully applied in many domains including astronomy, cryptography and biology. In order to test its potential for useful application in a Smart Grid context, this paper investigates the extent to which a crowd can contribute predictive hypotheses to a model of residential electric energy consumption. In this experiment, the crowd generated hypotheses about factors that… ▽ More Crowdsourcing has been successfully applied in many domains including astronomy, cryptography and biology. In order to test its potential for useful application in a Smart Grid context, this paper investigates the extent to which a crowd can contribute predictive hypotheses to a model of residential electric energy consumption. In this experiment, the crowd generated hypotheses about factors that make one home different from another in terms of monthly energy usage. To implement this concept, we deployed a web-based system within which 627 residential electricity customers posed 632 questions that they thought predictive of energy usage. While this occurred, the same group provided 110,573 answers to these questions as they accumulated. Thus users both suggested the hypotheses that drive a predictive model and provided the data upon which the model is built. We used the resulting question and answer data to build a predictive model of monthly electric energy consumption, using random forest regression. Because of the sparse nature of the answer data, careful statistical work was needed to ensure that these models are valid. The results indicate that the crowd can generate useful hypotheses, despite the sparse nature of the dataset. △ Less

Submitted 8 September, 2017; originally announced September 2017.

Comments: 11 pages, 7 figures

Journal ref: IEEE Systems Journal, 2018

arXiv:1708.04575 [pdf, other]

doi 10.1038/s41562-018-0510-5

Information flow reveals prediction limits in online social activity

Authors: James P. Bagrow, Xipei Liu, Lewis Mitchell

Abstract: Modern society depends on the flow of information over online social networks, and users of popular platforms generate significant behavioral data about themselves and their social ties. However, it remains unclear what fundamental limits exist when using these data to predict the activities and interests of individuals, and to what accuracy such predictions can be made using an individual's socia… ▽ More Modern society depends on the flow of information over online social networks, and users of popular platforms generate significant behavioral data about themselves and their social ties. However, it remains unclear what fundamental limits exist when using these data to predict the activities and interests of individuals, and to what accuracy such predictions can be made using an individual's social ties. Here we show that 95% of the potential predictive accuracy for an individual is achievable using their social ties only, without requiring that individual's data. We use information theoretic tools to estimate the predictive information within the writings of Twitter users, providing an upper bound on the available predictive information that holds for any predictive or machine learning methods. As few as 8-9 of an individual's contacts are sufficient to obtain predictability comparable to that of the individual alone. Distinct temporal and social effects are visible by measuring information flow along social ties, allowing us to better study the dynamics of online activity. Our results have distinct privacy implications: information is so strongly embedded in a social network that in principle one can profile an individual from their available social ties even when the individual forgoes the platform completely. △ Less

Submitted 9 February, 2019; v1 submitted 15 August, 2017; originally announced August 2017.

Comments: 15 pages, 4 figures, supplementary information included

Journal ref: Nature Human Behaviour 3 (2019) 122-128

arXiv:1707.06939 [pdf, other]

doi 10.15346/hc.v6i1.3

Autocompletion interfaces make crowd workers slower, but their use promotes response diversity

Authors: Xipei Liu, James P. Bagrow

Abstract: Creative tasks such as ideation or question proposal are powerful applications of crowdsourcing, yet the quantity of workers available for addressing practical problems is often insufficient. To enable scalable crowdsourcing thus requires gaining all possible efficiency and information from available workers. One option for text-focused tasks is to allow assistive technology, such as an autocomple… ▽ More Creative tasks such as ideation or question proposal are powerful applications of crowdsourcing, yet the quantity of workers available for addressing practical problems is often insufficient. To enable scalable crowdsourcing thus requires gaining all possible efficiency and information from available workers. One option for text-focused tasks is to allow assistive technology, such as an autocompletion user interface (AUI), to help workers input text responses. But support for the efficacy of AUIs is mixed. Here we designed and conducted a randomized experiment where workers were asked to provide short text responses to given questions. Our experimental goal was to determine if an AUI helps workers respond more quickly and with improved consistency by mitigating typos and misspellings. Surprisingly, we found that neither occurred: workers assigned to the AUI treatment were slower than those assigned to the non-AUI control and their responses were more diverse, not less, than those of the control. Both the lexical and semantic diversities of responses were higher, with the latter measured using word2vec. A crowdsourcer interested in worker speed may want to avoid using an AUI, but using an AUI to boost response diversity may be valuable to crowdsourcers interested in receiving as much novel information from workers as possible. △ Less

Submitted 21 July, 2017; originally announced July 2017.

Comments: 12 pages, 6 figures

Journal ref: Human Computation 6:1:42-55 (2019)

arXiv:1703.07362 [pdf, other]

Information spreading during emergencies and anomalous events

Authors: James P. Bagrow

Abstract: The most critical time for information to spread is in the aftermath of a serious emergency, crisis, or disaster. Individuals affected by such situations can now turn to an array of communication channels, from mobile phone calls and text messages to social media posts, when alerting social ties. These channels drastically improve the speed of information in a time-sensitive event, and provide ext… ▽ More The most critical time for information to spread is in the aftermath of a serious emergency, crisis, or disaster. Individuals affected by such situations can now turn to an array of communication channels, from mobile phone calls and text messages to social media posts, when alerting social ties. These channels drastically improve the speed of information in a time-sensitive event, and provide extant records of human dynamics during and afterward the event. Retrospective analysis of such anomalous events provides researchers with a class of "found experiments" that may be used to better understand social spreading. In this chapter, we study information spreading due to a number of emergency events, including the Boston Marathon Bombing and a plane crash at a western European airport. We also contrast the different information which may be gleaned by social media data compared with mobile phone data and we estimate the rate of anomalous events in a mobile phone dataset using a proposed anomaly detection method. △ Less

Submitted 21 March, 2017; originally announced March 2017.

Comments: 19 pages, 11 figures

arXiv:1703.06361 [pdf, other]

Which friends are more popular than you? Contact strength and the friendship paradox in social networks

Authors: James P. Bagrow, Christopher M. Danforth, Lewis Mitchell

Abstract: The friendship paradox states that in a social network, egos tend to have lower degree than their alters, or, "your friends have more friends than you do". Most research has focused on the friendship paradox and its implications for information transmission, but treating the network as static and unweighted. Yet, people can dedicate only a finite fraction of their attention budget to each social i… ▽ More The friendship paradox states that in a social network, egos tend to have lower degree than their alters, or, "your friends have more friends than you do". Most research has focused on the friendship paradox and its implications for information transmission, but treating the network as static and unweighted. Yet, people can dedicate only a finite fraction of their attention budget to each social interaction: a high-degree individual may have less time to dedicate to individual social links, forcing them to modulate the quantities of contact made to their different social ties. Here we study the friendship paradox in the context of differing contact volumes between egos and alters, finding a connection between contact volume and the strength of the friendship paradox. The most frequently contacted alters exhibit a less pronounced friendship paradox compared with the ego, whereas less-frequently contacted alters are more likely to be high degree and give rise to the paradox. We argue therefore for a more nuanced version of the friendship paradox: "your closest friends have slightly more friends than you do", and in certain networks even: "your best friend has no more friends than you do". We demonstrate that this relationship is robust, holding in both a social media and a mobile phone dataset. These results have implications for information transfer and influence in social networks, which we explore using a simple dynamical model. △ Less

Submitted 18 March, 2017; originally announced March 2017.

arXiv:1611.00954 [pdf, other]

doi 10.1371/journal.pone.0182662

Reply & Supply: Efficient crowdsourcing when workers do more than answer questions

Authors: Thomas C. McAndrew, Elizaveta A. Guseva, James P. Bagrow

Abstract: Crowdsourcing works by distributing many small tasks to large numbers of workers, yet the true potential of crowdsourcing lies in workers doing more than performing simple tasks---they can apply their experience and creativity to provide new and unexpected information to the crowdsourcer. One such case is when workers not only answer a crowdsourcer's questions but also contribute new questions for… ▽ More Crowdsourcing works by distributing many small tasks to large numbers of workers, yet the true potential of crowdsourcing lies in workers doing more than performing simple tasks---they can apply their experience and creativity to provide new and unexpected information to the crowdsourcer. One such case is when workers not only answer a crowdsourcer's questions but also contribute new questions for subsequent crowd analysis, leading to a growing set of questions. This growth creates an inherent bias for early questions since a question introduced earlier by a worker can be answered by more subsequent workers than a question introduced later. Here we study how to perform efficient crowdsourcing with such growing question sets. By modeling question sets as networks of interrelated questions, we introduce algorithms to help curtail the growth bias by efficiently distributing workers between exploring new questions and addressing current questions. Experiments and simulations demonstrate that these algorithms can efficiently explore an unbounded set of questions without losing confidence in crowd answers. △ Less

Submitted 14 August, 2017; v1 submitted 3 November, 2016; originally announced November 2016.

Comments: 20 pages, 6 figures, 1 table

Journal ref: PLoS ONE, 12(8): e0182662, 2017

arXiv:1604.05781 [pdf, other]

doi 10.1109/ASONAM.2016.7752284

What we write about when we write about causality: Features of causal statements across large-scale social discourse

Authors: Thomas C. McAndrew, Joshua C. Bongard, Christopher M. Danforth, Peter S. Dodds, Paul D. H. Hines, James P. Bagrow

Abstract: Identifying and communicating relationships between causes and effects is important for understanding our world, but is affected by language structure, cognitive and emotional biases, and the properties of the communication medium. Despite the increasing importance of social media, much remains unknown about causal statements made online. To study real-world causal attribution, we extract a large-… ▽ More Identifying and communicating relationships between causes and effects is important for understanding our world, but is affected by language structure, cognitive and emotional biases, and the properties of the communication medium. Despite the increasing importance of social media, much remains unknown about causal statements made online. To study real-world causal attribution, we extract a large-scale corpus of causal statements made on the Twitter social network platform as well as a comparable random control corpus. We compare causal and control statements using statistical language and sentiment analysis tools. We find that causal statements have a number of significant lexical and grammatical differences compared with controls and tend to be more negative in sentiment than controls. Causal statements made online tend to focus on news and current events, medicine and health, or interpersonal relationships, as shown by topic models. By quantifying the features and potential biases of causality communication, this study improves our understanding of the accuracy of information and opinions found online. △ Less

Submitted 21 April, 2016; v1 submitted 19 April, 2016; originally announced April 2016.

Journal ref: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), San Francisco, CA, 2016, pp. 519-524

arXiv:1601.07969 [pdf, other]

Zipf's law is a consequence of coherent language production

Authors: Jake Ryland Williams, James P. Bagrow, Andrew J. Reagan, Sharon E. Alajajian, Christopher M. Danforth, Peter Sheridan Dodds

Abstract: The task of text segmentation may be undertaken at many levels in text analysis---paragraphs, sentences, words, or even letters. Here, we focus on a relatively fine scale of segmentation, hypothesizing it to be in accord with a stochastic model of language generation, as the smallest scale where independent units of meaning are produced. Our goals in this letter include the development of methods… ▽ More The task of text segmentation may be undertaken at many levels in text analysis---paragraphs, sentences, words, or even letters. Here, we focus on a relatively fine scale of segmentation, hypothesizing it to be in accord with a stochastic model of language generation, as the smallest scale where independent units of meaning are produced. Our goals in this letter include the development of methods for the segmentation of these minimal independent units, which produce feature-representations of texts that align with the independence assumption of the bag-of-terms model, commonly used for prediction and classification in computational text analysis. We also propose the measurement of texts' association (with respect to realized segmentations) to the model of language generation. We find (1) that our segmentations of phrases exhibit much better associations to the generation model than words and (2), that texts which are well fit are generally topically homogeneous. Because our generative model produces Zipf's law, our study further suggests that Zipf's law may be a consequence of homogeneity in language production. △ Less

Submitted 5 August, 2016; v1 submitted 28 January, 2016; originally announced January 2016.

Comments: 5 pages, 4 figures

arXiv:1510.07494 [pdf, other]

Transitions in climate and energy discourse between Hurricanes Katrina and Sandy

Authors: Emily M. Cody, Jennie C. Stephens, James P. Bagrow, Peter Sheridan Dodds, Christopher M. Danforth

Abstract: Although climate change and energy are intricately linked, their explicit connection is not always prominent in public discourse and the media. Disruptive extreme weather events, including hurricanes, focus public attention in new and different ways, offering a unique window of opportunity to analyze how a focusing event influences public discourse. Media coverage of extreme weather events simulta… ▽ More Although climate change and energy are intricately linked, their explicit connection is not always prominent in public discourse and the media. Disruptive extreme weather events, including hurricanes, focus public attention in new and different ways, offering a unique window of opportunity to analyze how a focusing event influences public discourse. Media coverage of extreme weather events simultaneously shapes and reflects public discourse on climate issues. Here we analyze climate and energy newspaper coverage of Hurricanes Katrina (2005) and Sandy (2012) using topic models, mathematical techniques used to discover abstract topics within a set of documents. Our results demonstrate that post-Katrina media coverage does not contain a climate change topic, and the energy topic is limited to discussion of energy prices, markets, and the economy with almost no explicit linkages made between energy and climate change. In contrast, post-Sandy media coverage does contain a prominent climate change topic, a distinct energy topic, as well as integrated representation of climate change and energy, indicating a shift in climate and energy reporting between Hurricane Katrina and Hurricane Sandy. △ Less

Submitted 25 April, 2016; v1 submitted 19 October, 2015; originally announced October 2015.

arXiv:1505.06750 [pdf, other]

doi 10.1073/pnas.1505647112

Reply to Garcia et al.: Common mistakes in measuring frequency dependent word characteristics

Authors: P. S. Dodds, E. M. Clark, S. Desu, M. R. Frank, A. J. Reagan, J. R. Williams, L. Mitchell, K. D. Harris, I. M. Kloumann, J. P. Bagrow, K. Megerdoomian, M. T. McMahon, B. F. Tivnan, C. M. Danforth

Abstract: We demonstrate that the concerns expressed by Garcia et al. are misplaced, due to (1) a misreading of our findings in [1]; (2) a widespread failure to examine and present words in support of asserted summary quantities based on word usage frequencies; and (3) a range of misconceptions about word usage frequency, word rank, and expert-constructed word lists. In particular, we show that the English… ▽ More We demonstrate that the concerns expressed by Garcia et al. are misplaced, due to (1) a misreading of our findings in [1]; (2) a widespread failure to examine and present words in support of asserted summary quantities based on word usage frequencies; and (3) a range of misconceptions about word usage frequency, word rank, and expert-constructed word lists. In particular, we show that the English component of our study compares well statistically with two related surveys, that no survey design influence is apparent, and that estimates of measurement error do not explain the positivity biases reported in our work and that of others. We further demonstrate that for the frequency dependence of positivity---of which we explored the nuances in great detail in [1]---Garcia et al. did not perform a reanalysis of our data---they instead carried out an analysis of a different, statistically improper data set and introduced a nonlinearity before performing linear regression. △ Less

Submitted 28 May, 2015; v1 submitted 25 May, 2015; originally announced May 2015.

Comments: 5 pages, 2 figures, 1 table. Expanded version of reply appearing in PNAS 2015

arXiv:1503.02120 [pdf, other]

Identifying missing dictionary entries with frequency-conserving context models

Authors: Jake Ryland Williams, Eric M. Clark, James P. Bagrow, Christopher M. Danforth, Peter Sheridan Dodds

Abstract: In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these methods for organization fall into a family defined by word ordering. Unlike demographic or spatial partitions of data, these collocation models are of special importance for their universal applicability. While we are interested here in tex… ▽ More In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these methods for organization fall into a family defined by word ordering. Unlike demographic or spatial partitions of data, these collocation models are of special importance for their universal applicability. While we are interested here in text and have framed our treatment appropriately, our work is potentially applicable to other areas of research (e.g., speech, genomics, and mobility patterns) where one has ordered categorical data, (e.g., sounds, genes, and locations). Our approach focuses on the phrase (whether word or larger) as the primary meaning-bearing lexical unit and object of study. To do so, we employ our previously developed framework for generating word-conserving phrase-frequency data. Upon training our model with the Wiktionary---an extensive, online, collaborative, and open-source dictionary that contains over 100,000 phrasal-definitions---we develop highly effective filters for the identification of meaningful, missing phrase-entries. With our predictions we then engage the editorial community of the Wiktionary and propose short lists of potential missing entries for definition, develo** a breakthrough, lexical extraction technique, and expanding our knowledge of the defined English lexicon of phrases. △ Less

Submitted 28 July, 2015; v1 submitted 6 March, 2015; originally announced March 2015.

Comments: 16 pages, 6 figures, and 7 tables

arXiv:1501.05976 [pdf, other]

doi 10.1103/PhysRevE.91.042813

Robustness of Spatial Micronetworks

Authors: Thomas C. McAndrew, Christopher M. Danforth, James P. Bagrow

Abstract: Power lines, roadways, pipelines and other physical infrastructure are critical to modern society. These structures may be viewed as spatial networks where geographic distances play a role in the functionality and construction cost of links. Traditionally, studies of network robustness have primarily considered the connectedness of large, random networks. Yet for spatial infrastructure physical di… ▽ More Power lines, roadways, pipelines and other physical infrastructure are critical to modern society. These structures may be viewed as spatial networks where geographic distances play a role in the functionality and construction cost of links. Traditionally, studies of network robustness have primarily considered the connectedness of large, random networks. Yet for spatial infrastructure physical distances must also play a role in network robustness. Understanding the robustness of small spatial networks is particularly important with the increasing interest in microgrids, small-area distributed power grids that are well suited to using renewable energy resources. We study the random failures of links in small networks where functionality depends on both spatial distance and topological connectedness. By introducing a percolation model where the failure of each link is proportional to its spatial length, we find that, when failures depend on spatial distances, networks are more fragile than expected. Accounting for spatial effects in both construction and robustness is important for designing efficient microgrids and other network infrastructure. △ Less

Submitted 23 January, 2015; originally announced January 2015.

Comments: 15 pages, 8 figures

Journal ref: Phys. Rev. E 91, 042813 2015

arXiv:1410.1393 [pdf, other]

Constructing a taxonomy of fine-grained human movement and activity motifs through social media

Authors: Morgan R. Frank, Jake Ryland Williams, Lewis Mitchell, James P. Bagrow, Peter Sheridan Dodds, Christopher M. Danforth

Abstract: Profiting from the emergence of web-scale social data sets, numerous recent studies have systematically explored human mobility patterns over large populations and large time scales. Relatively little attention, however, has been paid to mobility and activity over smaller time-scales, such as a day. Here, we use Twitter to identify people's frequently visited locations along with their likely acti… ▽ More Profiting from the emergence of web-scale social data sets, numerous recent studies have systematically explored human mobility patterns over large populations and large time scales. Relatively little attention, however, has been paid to mobility and activity over smaller time-scales, such as a day. Here, we use Twitter to identify people's frequently visited locations along with their likely activities as a function of time of day and day of week, capitalizing on both the content and geolocation of messages. We subsequently characterize people's transition pattern motifs and demonstrate that spatial information is encoded in word choice. △ Less

Submitted 11 May, 2015; v1 submitted 28 September, 2014; originally announced October 2014.

arXiv:1409.3870 [pdf, other]

doi 10.1103/PhysRevE.91.052811

Text mixing shapes the anatomy of rank-frequency distributions: A modern Zipfian mechanics for natural language

Authors: Jake Ryland Williams, James P. Bagrow, Christopher M. Danforth, Peter Sheridan Dodds

Abstract: Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf's law which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this `law' of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora over the last 15 years have revealed the existence o… ▽ More Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf's law which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this `law' of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora over the last 15 years have revealed the existence of two scaling regimes. These regimes have thus far been explained by a hypothesis suggesting a separability of languages into core and non-core lexica. Here, we present and defend an alternative hypothesis, that the two scaling regimes result from the act of aggregating texts. We observe that text mixing leads to an effective decay of word introduction, which we show provides accurate predictions of the location and severity of breaks in scaling. Upon examining large corpora from 10 languages in the Project Gutenberg eBooks collection (eBooks), we find emphatic empirical support for the universality of our claim. △ Less

Submitted 30 January, 2015; v1 submitted 12 September, 2014; originally announced September 2014.

Comments: 9 pages, 6 figures, and 1 table

Journal ref: Phys. Rev. E 91, 052811 (2015)

arXiv:1407.2893 [pdf, other]

doi 10.1098/rsos.160007

Understanding the group dynamics and success of teams

Authors: Michael Klug, James P. Bagrow

Abstract: Complex problems often require coordinated group effort and can consume significant resources, yet our understanding of how teams form and succeed has been limited by a lack of large-scale, quantitative data. We analyze activity traces and success levels for ~150,000 self-organized, online team projects. While larger teams tend to be more successful, workload is highly focused across the team, wit… ▽ More Complex problems often require coordinated group effort and can consume significant resources, yet our understanding of how teams form and succeed has been limited by a lack of large-scale, quantitative data. We analyze activity traces and success levels for ~150,000 self-organized, online team projects. While larger teams tend to be more successful, workload is highly focused across the team, with only a few members performing most work. We find that highly successful teams are significantly more focused than average teams of the same size, that their members have worked on more diverse sets of projects, and the members of highly successful teams are more likely to be core members or 'leads' of other teams. The relations between team success and size, focus and especially team experience cannot be explained by confounding factors such as team age, external contributions from non-team members, nor by group mechanisms such as social loafing. Taken together, these features point to organizational principles that may maximize the success of collaborative endeavors. △ Less

Submitted 20 April, 2016; v1 submitted 10 July, 2014; originally announced July 2014.

Comments: 20 pages, 4 figures, supporting information included

Journal ref: R. Soc. open sci. 2016 3 160007

arXiv:1406.5181 [pdf, other]

Zipf's law holds for phrases, not words

Authors: Jake Ryland Williams, Paul R. Lessard, Suma Desu, Eric Clark, James P. Bagrow, Christopher M. Danforth, Peter Sheridan Dodds

Abstract: With Zipf's law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically… ▽ More With Zipf's law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically that Zipf's law for phrases extends over as many as nine orders of rank magnitude. In doing so, we develop a principled and scalable statistical mechanical method of random text partitioning, which opens up a rich frontier of rigorous text analysis via a rank ordering of mixed length phrases. △ Less

Submitted 4 March, 2015; v1 submitted 19 June, 2014; originally announced June 2014.

Comments: Manuscript: 6 pages, 3 figures; Supplementary Information: 8 pages, 18 tables

arXiv:1406.3855 [pdf, other]

Human language reveals a universal positivity bias

Authors: Peter Sheridan Dodds, Eric M. Clark, Suma Desu, Morgan R. Frank, Andrew J. Reagan, Jake Ryland Williams, Lewis Mitchell, Kameron Decker Harris, Isabel M. Kloumann, James P. Bagrow, Karine Megerdoomian, Matthew T. McMahon, Brian F. Tivnan, Christopher M. Danforth

Abstract: Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (1) the words of natural human language possess a universal positivity bias; (2) the estimated emotional content of words is consistent between languages under translation; and (3) this positivity bias i… ▽ More Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (1) the words of natural human language possess a universal positivity bias; (2) the estimated emotional content of words is consistent between languages under translation; and (3) this positivity bias is strongly independent of frequency of word usage. Alongside these general regularities, we describe inter-language variations in the emotional spectrum of languages which allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts. △ Less

Submitted 15 June, 2014; originally announced June 2014.

Comments: Manuscript: 7 pages, 4 figures; Supplementary Material: 49 pages, 43 figures, 6 tables. Online appendices available at http://www.uvm.edu/storylab/share/papers/dodds2014a/

arXiv:1401.1274 [pdf, ps, other]

doi 10.1038/srep03997

Quantifying Information Flow During Emergencies

Authors: Liang Gao, Chaoming Song, Ziyou Gao, Albert-László Barabási, James P. Bagrow, Dashun Wang

Abstract: Recent advances on human dynamics have focused on the normal patterns of human activities, with the quantitative understanding of human behavior under extreme events remaining a crucial missing chapter. This has a wide array of potential applications, ranging from emergency response and detection to traffic control and management. Previous studies have shown that human communications are both temp… ▽ More Recent advances on human dynamics have focused on the normal patterns of human activities, with the quantitative understanding of human behavior under extreme events remaining a crucial missing chapter. This has a wide array of potential applications, ranging from emergency response and detection to traffic control and management. Previous studies have shown that human communications are both temporally and spatially localized following the onset of emergencies, indicating that social propagation is a primary means to propagate situational awareness. We study real anomalous events using country-wide mobile phone data, finding that information flow during emergencies is dominated by repeated communications. We further demonstrate that the observed communication patterns cannot be explained by inherent reciprocity in social networks, and are universal across different demographics. △ Less

Submitted 7 January, 2014; originally announced January 2014.

Comments: Under review in Scientific Reports

Journal ref: Scientific Reports 4, 3997 2014

arXiv:1312.6122 [pdf, other]

Shadow networks: Discovering hidden nodes with models of information flow

Authors: James P. Bagrow, Suma Desu, Morgan R. Frank, Narine Manukyan, Lewis Mitchell, Andrew Reagan, Eric E. Bloedorn, Lashon B. Booker, Luther K. Branting, Michael J. Smith, Brian F. Tivnan, Christopher M. Danforth, Peter S. Dodds, Joshua C. Bongard

Abstract: Complex, dynamic networks underlie many systems, and understanding these networks is the concern of a great span of important scientific and engineering problems. Quantitative description is crucial for this understanding yet, due to a range of measurement problems, many real network datasets are incomplete. Here we explore how accidentally missing or deliberately hidden nodes may be detected in n… ▽ More Complex, dynamic networks underlie many systems, and understanding these networks is the concern of a great span of important scientific and engineering problems. Quantitative description is crucial for this understanding yet, due to a range of measurement problems, many real network datasets are incomplete. Here we explore how accidentally missing or deliberately hidden nodes may be detected in networks by the effect of their absence on predictions of the speed with which information flows through the network. We use Symbolic Regression (SR) to learn models relating information flow to network topology. These models show localized, systematic, and non-random discrepancies when applied to test networks with intentionally masked nodes, demonstrating the ability to detect the presence of missing nodes and where in the network those nodes are likely to reside. △ Less

Submitted 20 December, 2013; originally announced December 2013.

Comments: 12 pages, 3 figures

arXiv:1309.3797 [pdf, other]

doi 10.1093/comnet/cnt019

Robustness of skeletons and salient features in networks

Authors: Louis M. Shekhtman, James P. Bagrow, Dirk Brockmann

Abstract: Real world network datasets often contain a wealth of complex topological information. In the face of these data, researchers often employ methods to extract reduced networks containing the most important structures or pathways, sometimes known as `skeletons' or `backbones'. Numerous such methods have been developed. Yet data are often noisy or incomplete, with unknown numbers of missing or spurio… ▽ More Real world network datasets often contain a wealth of complex topological information. In the face of these data, researchers often employ methods to extract reduced networks containing the most important structures or pathways, sometimes known as `skeletons' or `backbones'. Numerous such methods have been developed. Yet data are often noisy or incomplete, with unknown numbers of missing or spurious links. Relatively little effort has gone into understanding how salient network extraction methods perform in the face of noisy or incomplete networks. We study this problem by comparing how the salient features extracted by two popular methods change when networks are perturbed, either by deleting nodes or links, or by randomly rewiring links. Our results indicate that simple, global statistics for skeletons can be accurately inferred even for noisy and incomplete network data, but it is crucial to have complete, reliable data to use the exact topologies of skeletons or backbones. These results also help us understand how skeletons respond to damage to the network itself, as in an attack scenario. △ Less

Submitted 15 September, 2013; originally announced September 2013.

arXiv:1209.3307 [pdf, other]

doi 10.1103/PhysRevX.3.021016

Natural emergence of clusters and bursts in network evolution

Authors: James P. Bagrow, Dirk Brockmann

Abstract: Network models with preferential attachment, where new nodes are injected into the network and form links with existing nodes proportional to their current connectivity, have been well studied for some time. Extensions have been introduced where nodes attach proportionally to arbitrary fitness functions. However, in these models, attaching to a node always increases the ability of that node to gai… ▽ More Network models with preferential attachment, where new nodes are injected into the network and form links with existing nodes proportional to their current connectivity, have been well studied for some time. Extensions have been introduced where nodes attach proportionally to arbitrary fitness functions. However, in these models, attaching to a node always increases the ability of that node to gain more links in the future. We study network growth where nodes attach proportionally to the clustering coefficients, or local densities of triangles, of existing nodes. Attaching to a node typically lowers its clustering coefficient, in contrast to preferential attachment or rich-get-richer models. This simple modification naturally leads to a variety of rich phenomena, including aging, non-Poissonian bursty dynamics, and community formation. This theoretical model shows that complex network structure can be generated without artificially imposing multiple dynamical mechanisms and may reveal potentially overlooked mechanisms present in complex systems. △ Less

Submitted 25 June, 2013; v1 submitted 14 September, 2012; originally announced September 2012.

Comments: 6 pages, 4 figures

Journal ref: Phys. Rev. X 3, 021016 (2013)

arXiv:1209.2419 [pdf, other]

doi 10.1007/s10955-013-0787-8

The role of caretakers in disease dynamics

Authors: Charleston Noble, James P. Bagrow, Dirk Brockmann

Abstract: One of the key challenges in modeling the dynamics of contagion phenomena is to understand how the structure of social interactions shapes the time course of a disease. Complex network theory has provided significant advances in this context. However, awareness of an epidemic in a population typically yields behavioral changes that correspond to changes in the network structure on which the diseas… ▽ More One of the key challenges in modeling the dynamics of contagion phenomena is to understand how the structure of social interactions shapes the time course of a disease. Complex network theory has provided significant advances in this context. However, awareness of an epidemic in a population typically yields behavioral changes that correspond to changes in the network structure on which the disease evolves. This feedback mechanism has not been investigated in depth. For example, one would intuitively expect susceptible individuals to avoid other infecteds. However, doctors treating patients or parents tending sick children may also increase the amount of contact made with an infecteds, in an effort to speed up recovery but also exposing themselves to higher risks of infection. We study the role of these caretaker links in an adaptive network models where individuals react to a disease by increasing or decreasing the amount of contact they make with infected individuals. We find that pure avoidance, with only few caretaker links, is the best strategy for curtailing an SIS disease in networks that possess a large topological variability. In more homogeneous networks, disease prevalence is decreased for low concentrations of caretakers whereas a high prevalence emerges if caretaker concentration passes a well defined critical value. △ Less

Submitted 11 September, 2012; originally announced September 2012.

Comments: 8 pages, 9 figures

arXiv:1205.1492 [pdf, ps, other]

doi 10.1038/srep00676

Is coaching experience associated with effective use of timeouts in basketball?

Authors: Serguei Saavedra, Satyam Mukherjee, James P. Bagrow

Abstract: Experience is an important asset in almost any professional activity. In basketball, there is believed to be a positive association between coaching experience and effective use of team timeouts. Here, we analyze both the extent to which a team's change in scoring margin per possession after timeouts deviate from the team's average scoring margin per possession---what we called timeout factor, and… ▽ More Experience is an important asset in almost any professional activity. In basketball, there is believed to be a positive association between coaching experience and effective use of team timeouts. Here, we analyze both the extent to which a team's change in scoring margin per possession after timeouts deviate from the team's average scoring margin per possession---what we called timeout factor, and the extent to which this performance measure is associated with coaching experience across all teams in the National Basketball Association over the 2009-2012 seasons. We find that timeout factor plays a minor role in the scoring dynamics of basketball. Surprisingly, we find that timeout factor is negatively associated with coaching experience. Our findings support empirical studies showing that, under certain conditions, mentors early in their careers can have a stronger positive impact on their teams than later in their careers. △ Less

Submitted 21 September, 2012; v1 submitted 7 May, 2012; originally announced May 2012.

Comments: Scientific Reports 2, Article number: 676 (2012)

arXiv:1202.0224 [pdf, other]

doi 10.1371/journal.pone.0037676

Mesoscopic structure and social aspects of human mobility

Authors: James P. Bagrow, Yu-Ru Lin

Abstract: The individual movements of large numbers of people are important in many contexts, from urban planning to disease spreading. Datasets that capture human mobility are now available and many interesting features have been discovered, including the ultra-slow spatial growth of individual mobility. However, the detailed substructures and spatiotemporal flows of mobility - the sets and sequences of vi… ▽ More The individual movements of large numbers of people are important in many contexts, from urban planning to disease spreading. Datasets that capture human mobility are now available and many interesting features have been discovered, including the ultra-slow spatial growth of individual mobility. However, the detailed substructures and spatiotemporal flows of mobility - the sets and sequences of visited locations - have not been well studied. We show that individual mobility is dominated by small groups of frequently visited, dynamically close locations, forming primary "habitats" capturing typical daily activity, along with subsidiary habitats representing additional travel. These habitats do not correspond to typical contexts such as home or work. The temporal evolution of mobility within habitats, which constitutes most motion, is universal across habitats and exhibits scaling patterns both distinct from all previous observations and unpredicted by current models. The delay to enter subsidiary habitats is a primary factor in the spatiotemporal growth of human travel. Interestingly, habitats correlate with non-mobility dynamics such as communication activity, implying that habitats may influence processes such as information spreading and revealing new connections between human mobility and social networks. △ Less

Submitted 7 June, 2012; v1 submitted 1 February, 2012; originally announced February 2012.

Comments: 7 pages, 5 figures (main text); 11 pages, 9 figures, 1 table (supporting information)

Journal ref: PLoS ONE 7(5): e37676, 2012

arXiv:1201.0745 [pdf, other]

doi 10.1103/PhysRevE.85.066118

Communities and bottlenecks: Trees and treelike networks have high modularity

Authors: James P. Bagrow

Abstract: Much effort has gone into understanding the modular nature of complex networks. Communities, also known as clusters or modules, are typically considered to be densely interconnected groups of nodes that are only sparsely connected to other groups in the network. Discovering high quality communities is a difficult and important problem in a number of areas. The most popular approach is the objectiv… ▽ More Much effort has gone into understanding the modular nature of complex networks. Communities, also known as clusters or modules, are typically considered to be densely interconnected groups of nodes that are only sparsely connected to other groups in the network. Discovering high quality communities is a difficult and important problem in a number of areas. The most popular approach is the objective function known as modularity, used both to discover communities and to measure their strength. To understand the modular structure of networks it is then crucial to know how such functions evaluate different topologies, what features they account for, and what implicit assumptions they may make. We show that trees and treelike networks can have unexpectedly and often arbitrarily high values of modularity. This is surprising since trees are maximally sparse connected graphs and are not typically considered to possess modular structure, yet the nonlocal null model used by modularity assigns low probabilities, and thus high significance, to the densities of these sparse tree communities. We further study the practical performance of popular methods on model trees and on a genealogical data set and find that the discovered communities also have very high modularity, often approaching its maximum value. Statistical tests reveal the communities in trees to be significant, in contrast with known results for partitions of sparse, random graphs. △ Less

Submitted 25 June, 2012; v1 submitted 3 January, 2012; originally announced January 2012.

Comments: 9 pages, 5 figures

Journal ref: Phys. Rev. E 85, 066118 (2012)

arXiv:1111.6074 [pdf, other]

Flavor network and the principles of food pairing

Authors: Yong-Yeol Ahn, Sebastian E. Ahnert, James P. Bagrow, Albert-László Barabási

Abstract: The cultural diversity of culinary practice, as illustrated by the variety of regional cuisines, raises the question of whether there are any general patterns that determine the ingredient combinations used in food today or principles that transcend individual tastes and recipes. We introduce a flavor network that captures the flavor compounds shared by culinary ingredients. Western cuisines show… ▽ More The cultural diversity of culinary practice, as illustrated by the variety of regional cuisines, raises the question of whether there are any general patterns that determine the ingredient combinations used in food today or principles that transcend individual tastes and recipes. We introduce a flavor network that captures the flavor compounds shared by culinary ingredients. Western cuisines show a tendency to use ingredient pairs that share many flavor compounds, supporting the so-called food pairing hypothesis. By contrast, East Asian cuisines tend to avoid compound sharing ingredients. Given the increasing availability of information on food preparation, our data-driven investigation opens new avenues towards a systematic understanding of culinary practice. △ Less

Submitted 25 November, 2011; originally announced November 2011.

Comments: 39 pages, 15 figures

ACM Class: H.2.8

arXiv:1111.1227 [pdf, other]

More Voices Than Ever? Quantifying Media Bias in Networks

Authors: Yu-Ru Lin, James P. Bagrow, David Lazer

Abstract: Social media, such as blogs, are often seen as democratic entities that allow more voices to be heard than the conventional mass or elite media. Some also feel that social media exhibits a balancing force against the arguably slanted elite media. A systematic comparison between social and mainstream media is necessary but challenging due to the scale and dynamic nature of modern communication. Her… ▽ More Social media, such as blogs, are often seen as democratic entities that allow more voices to be heard than the conventional mass or elite media. Some also feel that social media exhibits a balancing force against the arguably slanted elite media. A systematic comparison between social and mainstream media is necessary but challenging due to the scale and dynamic nature of modern communication. Here we propose empirical measures to quantify the extent and dynamics of social (blog) and mainstream (news) media bias. We focus on a particular form of bias---coverage quantity---as applied to stories about the 111th US Congress. We compare observed coverage of Members of Congress against a null model of unbiased coverage, testing for biases with respect to political party, popular front runners, regions of the country, and more. Our measures suggest distinct characteristics in news and blog media. A simple generative model, in agreement with data, reveals differences in the process of coverage selection between the two media. △ Less

Submitted 4 November, 2011; originally announced November 2011.

Comments: 10 Pages, 7 figures, appeared in ICWSM 2011

Journal ref: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM 2011), 17-21 July 2011, Barcelona, Spain

arXiv:1106.0560 [pdf, other]

doi 10.1371/journal.pone.0017680

Collective response of human populations to large-scale emergencies

Authors: James P. Bagrow, Dashun Wang, Albert-László Barabási

Abstract: Despite recent advances in uncovering the quantitative features of stationary human activity patterns, many applications, from pandemic prediction to emergency response, require an understanding of how these patterns change when the population encounters unfamiliar conditions. To explore societal response to external perturbations we identified real-time changes in communication and mobility patte… ▽ More Despite recent advances in uncovering the quantitative features of stationary human activity patterns, many applications, from pandemic prediction to emergency response, require an understanding of how these patterns change when the population encounters unfamiliar conditions. To explore societal response to external perturbations we identified real-time changes in communication and mobility patterns in the vicinity of eight emergencies, such as bomb attacks and earthquakes, comparing these with eight non-emergencies, like concerts and sporting events. We find that communication spikes accompanying emergencies are both spatially and temporally localized, but information about emergencies spreads globally, resulting in communication avalanches that engage in a significant manner the social network of eyewitnesses. These results offer a quantitative view of behavioral changes in human activity under extreme conditions, with potential long-term impact on emergency detection and response. △ Less

Submitted 3 June, 2011; originally announced June 2011.

Journal ref: PLoS ONE 6(3): e17680, 2011

arXiv:1102.5085 [pdf, other]

doi 10.1017/nws.2015.21

Robustness and modular structure in networks

Authors: James P. Bagrow, Sune Lehmann, Yong-Yeol Ahn

Abstract: Complex networks have recently attracted much interest due to their prevalence in nature and our daily lives [1, 2]. A critical property of a network is its resilience to random breakdown and failure [3-6], typically studied as a percolation problem [7-9] or by modeling cascading failures [10-12]. Many complex systems, from power grids and the Internet to the brain and society [13-15], can be mode… ▽ More Complex networks have recently attracted much interest due to their prevalence in nature and our daily lives [1, 2]. A critical property of a network is its resilience to random breakdown and failure [3-6], typically studied as a percolation problem [7-9] or by modeling cascading failures [10-12]. Many complex systems, from power grids and the Internet to the brain and society [13-15], can be modeled using modular networks comprised of small, densely connected groups of nodes [16, 17]. These modules often overlap, with network elements belonging to multiple modules [18, 19]. Yet existing work on robustness has not considered the role of overlap**, modular structure. Here we study the robustness of these systems to the failure of elements. We show analytically and empirically that it is possible for the modules themselves to become uncoupled or non-overlap** well before the network disintegrates. If overlap** modular organization plays a role in overall functionality, networks may be far more vulnerable than predicted by conventional percolation theory. △ Less

Submitted 6 January, 2016; v1 submitted 24 February, 2011; originally announced February 2011.

Comments: 14 pages, 9 figures

Journal ref: Network Science, 3 (4): 509-525 (2015)

arXiv:0911.0674 [pdf, ps, other]

doi 10.1109/CSE.2009.283

Investigating Bimodal Clustering in Human Mobility

Authors: James P. Bagrow, Tal Koren

Abstract: We apply a simple clustering algorithm to a large dataset of cellular telecommunication records, reducing the complexity of mobile phone users' full trajectories and allowing for simple statistics to characterize their properties. For the case of two clusters, we quantify how clustered human mobility is, how much of a user's spatial dispersion is due to motion between clusters, and how spatially… ▽ More We apply a simple clustering algorithm to a large dataset of cellular telecommunication records, reducing the complexity of mobile phone users' full trajectories and allowing for simple statistics to characterize their properties. For the case of two clusters, we quantify how clustered human mobility is, how much of a user's spatial dispersion is due to motion between clusters, and how spatially and temporally separated clusters are from one another. △ Less

Submitted 3 November, 2009; originally announced November 2009.

Comments: 4 pages, 2 figures

Journal ref: International Conference on Computational Science and Engineering, 4: 944-947, 2009

arXiv:0903.3178 [pdf, other]

doi 10.1038/nature09182

Link communities reveal multiscale complexity in networks

Authors: Yong-Yeol Ahn, James P. Bagrow, Sune Lehmann

Abstract: Networks have become a key approach to understanding systems of interacting objects, unifying the study of diverse phenomena including biological organisms and human society. One crucial step when studying the structure and dynamics of networks is to identify communities: groups of related nodes that correspond to functional subunits such as protein complexes or social spheres. Communities in netw… ▽ More Networks have become a key approach to understanding systems of interacting objects, unifying the study of diverse phenomena including biological organisms and human society. One crucial step when studying the structure and dynamics of networks is to identify communities: groups of related nodes that correspond to functional subunits such as protein complexes or social spheres. Communities in networks often overlap such that nodes simultaneously belong to several groups. Meanwhile, many networks are known to possess hierarchical organization, where communities are recursively grouped into a hierarchical structure. However, the fact that many real networks have communities with pervasive overlap, where each and every node belongs to more than one group, has the consequence that a global hierarchy of nodes cannot capture the relationships between overlap** groups. Here we reinvent communities as groups of links rather than nodes and show that this unorthodox approach successfully reconciles the antagonistic organizing principles of overlap** communities and hierarchy. In contrast to the existing literature, which has entirely focused on grou** nodes, link communities naturally incorporate overlap while revealing hierarchical organization. We find relevant link communities in many networks, including major biological networks such as protein-protein interaction and metabolic networks, and show that a large social network contains hierarchically organized community structures spanning inner-city to regional scales while maintaining pervasive overlap. Our results imply that link communities are fundamental building blocks that reveal overlap and hierarchical organization in networks to be two aspects of the same phenomenon. △ Less

Submitted 14 September, 2010; v1 submitted 18 March, 2009; originally announced March 2009.

Comments: Main text and supplementary information

Journal ref: Nature 466, 761-764 (2010)

arXiv:0809.4707 [pdf, other]

doi 10.1103/PhysRevE.79.036116

Dynamic Computation of Network Statistics via Updating Schema

Authors: Jie Sun, James P. Bagrow, Erik M. Bollt, Joesph D. Skufca

Abstract: In this paper we derive an updating scheme for calculating some important network statistics such as degree, clustering coefficient, etc., aiming at reduce the amount of computation needed to track the evolving behavior of large networks; and more importantly, to provide efficient methods for potential use of modeling the evolution of networks. Using the updating scheme, the network statistics c… ▽ More In this paper we derive an updating scheme for calculating some important network statistics such as degree, clustering coefficient, etc., aiming at reduce the amount of computation needed to track the evolving behavior of large networks; and more importantly, to provide efficient methods for potential use of modeling the evolution of networks. Using the updating scheme, the network statistics can be computed and updated easily and much faster than re-calculating each time for large evolving networks. The update formula can also be used to determine which edge/node will lead to the extremal change of network statistics, providing a way of predicting or designing evolution rule of networks. △ Less

Submitted 26 September, 2008; originally announced September 2008.

Comments: 17 pages, 6 figures

Journal ref: Phys. Rev. E 79, 036116 (2009) [9 pages]

arXiv:0805.0807 [pdf, ps, other]

doi 10.1155/2008/346543

Kleinberg navigation on anisotropic lattices

Authors: J. Mauricio Campuzano, James P. Bagrow, Daniel ben-Avraham

Abstract: We study the Kleinberg problem of navigation in Small World networks when the underlying lattice is stretched along a preferred direction. Extensive simulations confirm that maximally efficient navigation is attained when the length $r$ of long-range links is taken from the distribution $P({\bf r})\sim r^{-α}$, when the exponent $α$ is equal to 2, the dimension of the underlying lattice, regardl… ▽ More We study the Kleinberg problem of navigation in Small World networks when the underlying lattice is stretched along a preferred direction. Extensive simulations confirm that maximally efficient navigation is attained when the length $r$ of long-range links is taken from the distribution $P({\bf r})\sim r^{-α}$, when the exponent $α$ is equal to 2, the dimension of the underlying lattice, regardless of the amount of anisotropy, but only in the limit of infinite lattice size, $L\to\infty$. For finite size lattices we find an optimal $α(L)$ that depends strongly on $L$. The convergence to $α=2$ as $L\to\infty$ shows interesting power-law dependence on the anisotropy strength. △ Less

Submitted 6 May, 2008; originally announced May 2008.

Comments: 6 pages, 4 figures, data included with source

Journal ref: Research Letters in Physics 2008, 346543 (2008)

arXiv:0712.2220 [pdf, ps, other]

doi 10.1088/1751-8113/41/18/185001

Phase transition in the rich-get-richer mechanism due to finite-size effects

Authors: James P. Bagrow, Jie Sun, Daniel ben-Avraham

Abstract: The rich-get-richer mechanism (agents increase their ``wealth'' randomly at a rate proportional to their holdings) is often invoked to explain the Pareto power-law distribution observed in many physical situations, such as the degree distribution of growing scale free nets. We use two different analytical approaches, as well as numerical simulations, to study the case where the number of agents… ▽ More The rich-get-richer mechanism (agents increase their ``wealth'' randomly at a rate proportional to their holdings) is often invoked to explain the Pareto power-law distribution observed in many physical situations, such as the degree distribution of growing scale free nets. We use two different analytical approaches, as well as numerical simulations, to study the case where the number of agents is fixed and finite (but large), and the rich-get-richer mechanism is invoked a fraction r of the time (the remainder of the time wealth is disbursed by a homogeneous process). At short times, we recover the Pareto law observed for an unbounded number of agents. In later times, the (moving) distribution can be scaled to reveal a phase transition with a Gaussian asymptotic form for r < 1/2 and a Pareto-like tail (on the positive side) and a novel stretched exponential decay (on the negative side) for r > 1/2. △ Less

Submitted 3 May, 2008; v1 submitted 13 December, 2007; originally announced December 2007.

Comments: 9 pages, 1 figure, code and data included with source. Update corrects typos, adds journal-ref

Journal ref: J. Phys. A: Math. Theor. 41 (2008) 185001

arXiv:0706.3880 [pdf, ps, other]

doi 10.1088/1742-5468/2008/05/P05001

Evaluating Local Community Methods in Networks

Authors: James P. Bagrow

Abstract: We present a new benchmarking procedure that is unambiguous and specific to local community-finding methods, allowing one to compare the accuracy of various methods. We apply this to new and existing algorithms. A simple class of synthetic benchmark networks is also developed, capable of testing properties specific to these local methods. We present a new benchmarking procedure that is unambiguous and specific to local community-finding methods, allowing one to compare the accuracy of various methods. We apply this to new and existing algorithms. A simple class of synthetic benchmark networks is also developed, capable of testing properties specific to these local methods. △ Less

Submitted 15 November, 2007; v1 submitted 26 June, 2007; originally announced June 2007.

Comments: 8 pages, 9 figures, code included with source

Journal ref: J. Stat. Mech. (2008) P05001

Showing 1–50 of 53 results for author: Bagrow, J P