-
Connectivity and Community Structure of Online and Register-based Social Networks
Authors:
Márton Menyhért,
Eszter Bokányi,
Rense Corten,
Eelke M. Heemskerk,
Yuliia Kazmina,
Frank W. Takes
Abstract:
The dominance of online social media data as a source of population-scale social network studies has recently been challenged by networks constructed from government-curated register data. In this paper, we investigate how the two compare, focusing on aggregations of the Dutch online social network (OSN) Hyves and a register-based social network (RSN) of the Netherlands. First and foremost, we fin…
▽ More
The dominance of online social media data as a source of population-scale social network studies has recently been challenged by networks constructed from government-curated register data. In this paper, we investigate how the two compare, focusing on aggregations of the Dutch online social network (OSN) Hyves and a register-based social network (RSN) of the Netherlands. First and foremost, we find that the connectivity of the two population-scale networks is strikingly similar, especially between closeby municipalities, with more long-distance ties captured by the OSN. This result holds when correcting for population density and geographical distance, notwithstanding that these two patterns appear to be the main drivers of connectivity. Second, we show that the community structure of neither network follows strict administrative geographical delineations (e.g., provinces). Instead, communities appear to either center around large metropolitan areas or, outside of the country's most urbanized area, are comprised of large blocks of interdependent municipalities. Interestingly, beyond population and distance-related patterns, communities also highlight the persistence of deeply rooted historical and sociocultural communities based on religion. The results of this study suggest that both online social networks and register-based social networks are valuable resources for insights into the social network structure of an entire population.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Urban highways are barriers to social ties
Authors:
Luca Maria Aiello,
Anastassia Vybornova,
Sándor Juhász,
Michael Szell,
Eszter Bokányi
Abstract:
Urban highways are common, especially in the US, making cities more car-centric. They promise the annihilation of distance but obstruct pedestrian mobility, thus playing a key role in limiting social interactions locally. Although this limiting role is widely acknowledged in urban studies, the quantitative relationship between urban highways and social ties is barely tested. Here we define a Barri…
▽ More
Urban highways are common, especially in the US, making cities more car-centric. They promise the annihilation of distance but obstruct pedestrian mobility, thus playing a key role in limiting social interactions locally. Although this limiting role is widely acknowledged in urban studies, the quantitative relationship between urban highways and social ties is barely tested. Here we define a Barrier Score that relates massive, geolocated online social network data to highways in the 50 largest US cities. At the unprecedented granularity of individual social ties, we show that urban highways are associated with decreased social connectivity. This barrier effect is especially strong for short distances and consistent with historical cases of highways that were built to purposefully disrupt or isolate Black neighborhoods. By combining spatial infrastructure with social tie data, our method adds a new dimension to demographic studies of social segregation. Our study can inform reparative planning for an evidence-based reduction of spatial inequality, and more generally, support a better integration of the social fabric in urban planning.
△ Less
Submitted 18 April, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
Socio-economic Segregation in a Population-Scale Social Network
Authors:
Yuliia Kazmina,
Eelke M. Heemskerk,
Eszter Bokanyi,
Frank W. Takes
Abstract:
We propose a social network-aware approach to studying socio-economic segregation. The key question that we address is whether patterns of segregation are more pronounced in social networks than the common spatial neighborhood-focused manifestations of segregation. We, therefore, conduct a population-scale social network analysis to study socio-economic segregation at a comprehensive and highly gr…
▽ More
We propose a social network-aware approach to studying socio-economic segregation. The key question that we address is whether patterns of segregation are more pronounced in social networks than the common spatial neighborhood-focused manifestations of segregation. We, therefore, conduct a population-scale social network analysis to study socio-economic segregation at a comprehensive and highly granular social network level: 17.2 million registered residents of the Netherlands that are connected through around 1.3 billion ties distributed over four distinct tie types. We take income assortativity as a measure of socio-economic segregation, compare a social network and spatial neighborhood approach, and find that the social network structure exhibits two times as much segregation. As such, this work challenges the dominance of the spatial perspective on segregation in both literature and policymaking. While at a particular scale of spatial aggregation (e.g., the geographical neighborhood), patterns of socio-economic segregation may appear relatively minimal, they may in fact persist in the underlying social network structure. Furthermore, we discover higher socio-economic segregation in larger cities, shedding a different light on the common view of cities as hubs for diverse socio-economic mixing. A population-scale social network perspective hence offers a way to uncover hitherto 'hidden' segregation that extends beyond spatial neighborhoods and infiltrates multiple aspects of human life.
△ Less
Submitted 4 October, 2023; v1 submitted 3 May, 2023;
originally announced May 2023.
-
The anatomy of a population-scale social network
Authors:
Eszter Bokányi,
Eelke M. Heemskerk,
Frank W. Takes
Abstract:
Large-scale human social network structure is typically inferred from digital trace samples of online social media platforms or mobile communication data. Instead, here we investigate the social network structure of a complete population, where people are connected by high-quality links sourced from administrative registers of family, household, work, school, and next-door neighbors. We examine th…
▽ More
Large-scale human social network structure is typically inferred from digital trace samples of online social media platforms or mobile communication data. Instead, here we investigate the social network structure of a complete population, where people are connected by high-quality links sourced from administrative registers of family, household, work, school, and next-door neighbors. We examine this multilayer social opportunity structure through three common concepts in network analysis: degree, closure, and distance. Findings present how particular network layers contribute to presumably universal scale-free and small-world properties of networks. Furthermore, we suggest a novel measure of excess closure and apply this in a life-course perspective to show how the social opportunity structure of individuals varies along age, socio-economic status, and education level. Our work provides new entry points to understand individual socio-economic failure and success as well as persistent societal problems of inequality and segregation.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Real-time estimation of the effective reproduction number of COVID-19 from behavioral data
Authors:
Eszter Bokányi,
Zsolt Vizi,
Júlia Koltai,
Gergely Röst,
Márton Karsai
Abstract:
Near-real time estimations of the effective reproduction number are among the most important tools to track the progression of a pandemic and to inform policy makers and the general public. However, these estimations rely on reported case numbers, commonly recorded with significant biases. The epidemic outcome is strongly influenced by the dynamics of social contacts, which are neglected in conven…
▽ More
Near-real time estimations of the effective reproduction number are among the most important tools to track the progression of a pandemic and to inform policy makers and the general public. However, these estimations rely on reported case numbers, commonly recorded with significant biases. The epidemic outcome is strongly influenced by the dynamics of social contacts, which are neglected in conventional surveillance systems as their real-time observation is challenging. Here, we propose a concept using online and offline behavioral data, recording age-stratified contact matrices at a daily rate. Modeling the epidemic using the reconstructed matrices we dynamically estimate the effective reproduction number during the two first waves of the COVID-19 pandemic in Hungary. Our results demonstrate how behavioral data can be used to build alternative monitoring systems complementing the established public health surveillance. They can identify and provide better signals during periods when official estimates appear unreliable due to observational biases.
△ Less
Submitted 21 July, 2022;
originally announced July 2022.
-
Spatially concentrated social capital of urban residents
Authors:
Ádám J. Kovács,
Sándor Juhász,
Eszter Bokányi,
Balázs Lengyel
Abstract:
Social connections that span across diverse urban neighborhoods can help individual prosperity by mobilizing social capital in cities. Yet, how the detailed spatial structure of social capital varies in lower- and higher-income urban neighborhoods is less understood. This paper demonstrates that the social capital measured on social networks is spatially more concentrated for residents of lower-in…
▽ More
Social connections that span across diverse urban neighborhoods can help individual prosperity by mobilizing social capital in cities. Yet, how the detailed spatial structure of social capital varies in lower- and higher-income urban neighborhoods is less understood. This paper demonstrates that the social capital measured on social networks is spatially more concentrated for residents of lower-income neighborhoods than for residents of higher-income neighborhoods. We map the micro-geography of individual online social connections in the 50 largest metropolitan areas of the US using a large-scale geolocalized Twitter dataset. We then analyze the spatial dimension of individual social capital by the share of friends, closure, and share of supported ties within circles of short distance radiuses (1, 5, and 10~km) around users' home location. We compare residents from below-median income neighborhoods with above-median income neighborhoods, and find that users living in relatively poorer neighborhoods have a significantly higher share of connections in close proximity. Moreover, their network is more cohesive and supported within a short distance from their home. These patterns prevail across the 50 largest US metropolitan areas with only a few exceptions. The found disparities in the micro-geographic concentration of social capital can feed segregation and income inequality within cities harming social circles of low-income residents.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
Urban hierarchy and spatial diffusion over the innovation life cycle
Authors:
Eszter Bokányi,
Martin Novák,
Ákos Jakobi,
Balázs Lengyel
Abstract:
Successful innovations achieve large geographical coverage by spreading across settlements and distances. For decades, spatial diffusion has been argued to take place along the urban hierarchy such that the innovation first spreads from large to medium cities then later from medium to small cities. Yet, the role of geographical distance, the other major factor of spatial diffusion, was difficult t…
▽ More
Successful innovations achieve large geographical coverage by spreading across settlements and distances. For decades, spatial diffusion has been argued to take place along the urban hierarchy such that the innovation first spreads from large to medium cities then later from medium to small cities. Yet, the role of geographical distance, the other major factor of spatial diffusion, was difficult to identify in hierarchical diffusion due to missing data on spreading events. In this paper, we exploit spatial patterns of individual invitations on a social media platform sent from registered users to new users over the entire life cycle of the platform. This enables us to disentangle the role of urban hierarchy and the role of distance by observing the source and target locations of flows over an unprecedented timescale. We demonstrate that hierarchical diffusion greatly overlaps with diffusion to close distances and these factors co-evolve over the life cycle; thus, their joint analysis is necessary. Then, a regression framework is applied to estimate the number of invitations sent between pairs of towns by years in the life cycle with the population sizes of the source and target towns, their combinations, and the distance between them. We confirm that hierarchical diffusion prevails initially across large towns only but emerges in the full spectrum of settlements in the middle of the life cycle when adoption accelerates. Unlike in previous gravity estimations, we find that after an intensifying role of distance in the middle of the life cycle a surprisingly weak distance effect characterizes the last years of diffusion. Our results stress the dominance of urban hierarchy in spatial diffusion and inform future predictions of innovation adoption at local scales.
△ Less
Submitted 22 June, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Universal role of commuting in the reduction of social assortativity in cities
Authors:
Eszter Bokányi,
Sándor Juhász,
Márton Karsai,
Balázs Lengyel
Abstract:
Millions commute to work every day in cities and interact with colleagues, customers, providers, friends, and strangers. Commuting facilitates the mixing of people from distant and diverse neighborhoods, but whether this has an imprint on social inclusion or instead, connections remain assortative is less explored. In this paper, we aim to better understand income sorting in social networks inside…
▽ More
Millions commute to work every day in cities and interact with colleagues, customers, providers, friends, and strangers. Commuting facilitates the mixing of people from distant and diverse neighborhoods, but whether this has an imprint on social inclusion or instead, connections remain assortative is less explored. In this paper, we aim to better understand income sorting in social networks inside cities and investigate how commuting distance conditions the online social ties of Twitter users in the 50 largest metropolitan areas of the United States. Home and work locations are identified from geolocated tweets that enable us to infer the socio-economic status of individuals. Our results show that an above-median commuting distance in cities is associated with more diverse individual networks in terms of connected peers and their income. The degree that distant commutes link neighborhoods of different socio-economic backgrounds greatly varies by city size and structure. However, we find that above-median commutes are associated with a nearly uniform, moderate reduction of social tie assortativity across the top 50 US cities suggesting a universal role of commuting in integrating disparate social networks in cities. Our results inform policy that facilitating access across distant neighborhoods can advance the social inclusion of low-income groups.
△ Less
Submitted 14 October, 2021; v1 submitted 4 May, 2021;
originally announced May 2021.
-
Ride-share matching algorithms generate income inequality
Authors:
Eszter Bokányi,
Anikó Hannák
Abstract:
Despite the potential of online sharing economy platforms such as Uber, Lyft, or Foodora to democratize the labor market, these services are often accused of fostering unfair working conditions and low wages. These problems have been recognized by researchers and regulators but the size and complexity of these socio-technical systems, combined with the lack of transparency about algorithmic practi…
▽ More
Despite the potential of online sharing economy platforms such as Uber, Lyft, or Foodora to democratize the labor market, these services are often accused of fostering unfair working conditions and low wages. These problems have been recognized by researchers and regulators but the size and complexity of these socio-technical systems, combined with the lack of transparency about algorithmic practices, makes it difficult to understand system dynamics and large-scale behavior. This paper combines approaches from complex systems and algorithmic fairness to investigate the effect of algorithm design decisions on wage inequality in ride-hailing markets. We first present a computational model that includes conditions about locations of drivers and passengers, traffic, the layout of the city, and the algorithm that matches requests with drivers. We calibrate the model with parameters derived from empirical data. Our simulations show that small changes in the system parameters can cause large deviations in the income distributions of drivers, leading to a highly unpredictable system which often distributes vastly different incomes to identically performing drivers. As suggested by recent studies about feedback loops in algorithmic systems, these initial income differences can result in enforced and long-term wage gaps.
△ Less
Submitted 31 August, 2020; v1 submitted 29 May, 2019;
originally announced May 2019.
-
Scaling in Words on Twitter
Authors:
Eszter Bokányi,
Dániel Kondor,
Gábor Vattay
Abstract:
Scaling properties of language are a useful tool for understanding generative processes in texts. We investigate the scaling relations in citywise Twitter corpora coming from the Metropolitan and Micropolitan Statistical Areas of the United States. We observe a slightly superlinear urban scaling with the city population for the total volume of the tweets and words created in a city. We then find t…
▽ More
Scaling properties of language are a useful tool for understanding generative processes in texts. We investigate the scaling relations in citywise Twitter corpora coming from the Metropolitan and Micropolitan Statistical Areas of the United States. We observe a slightly superlinear urban scaling with the city population for the total volume of the tweets and words created in a city. We then find that a certain core vocabulary follows the scaling relationship of that of the bulk text, but most words are sensitive to city size, exhibiting a super- or a sublinear urban scaling. For both regimes we can offer a plausible explanation based on the meaning of the words. We also show that the parameters for Zipf's law and Heaps law differ on Twitter from that of other texts, and that the exponent of Zipf's law changes with city size.
△ Less
Submitted 11 March, 2019;
originally announced March 2019.
-
Urban scaling of football followership on Twitter
Authors:
Eszter Bokanyi,
Attila Soti,
Gabor Vattay
Abstract:
Social sciences have an important challenge today to take advantage of new research opportunities provided by large amounts of data generated by online social networks. Because of its marketing value, sports clubs are also motivated in creating and maintaining a stable audience in social media. In this paper, we analyze followers of prominent footballs clubs on Twitter by obtaining their home loca…
▽ More
Social sciences have an important challenge today to take advantage of new research opportunities provided by large amounts of data generated by online social networks. Because of its marketing value, sports clubs are also motivated in creating and maintaining a stable audience in social media. In this paper, we analyze followers of prominent footballs clubs on Twitter by obtaining their home locations. We then measure how city size is connected to the number of followers using the theory of urban scaling. The results show that the scaling exponents of club followers depend on the income of a country. These findings could be used to understand the structure and potential growth areas of global football audiences.
△ Less
Submitted 11 December, 2018;
originally announced December 2018.
-
The role of geography in the complex diffusion of innovations
Authors:
Balázs Lengyel,
Eszter Bokányi,
Riccardo Di Clemente,
János Kertész,
Marta C. González
Abstract:
The urban-rural divide is increasing in modern societies calling for geographical extensions of social influence modelling. Improved understanding of innovation diffusion across locations and through social connections can provide us with new insights into the spread of information, technological progress and economic development. In this work, we analyze the spatial adoption dynamics of iWiW, an…
▽ More
The urban-rural divide is increasing in modern societies calling for geographical extensions of social influence modelling. Improved understanding of innovation diffusion across locations and through social connections can provide us with new insights into the spread of information, technological progress and economic development. In this work, we analyze the spatial adoption dynamics of iWiW, an Online Social Network (OSN) in Hungary and uncover empirical features about the spatial adoption in social networks. During its entire life cycle from 2002 to 2012, iWiW reached up to 300 million friendship ties of 3 million users. We find that the number of adopters as a function of town population follows a scaling law that reveals a strongly concentrated early adoption in large towns and a less concentrated late adoption. We also discover a strengthening distance decay of spread over the life-cycle indicating high fraction of distant diffusion in early stages but the dominance of local diffusion in late stages. The spreading process is modelled within the Bass diffusion framework that enables us to compare the differential equation version with an agent-based version of the model run on the empirical network. Although both models can capture the macro trend of adoption, they have limited capacity to describe the observed trends of urban scaling and distance decay. We find, however that incorporating adoption thresholds, defined by the fraction of social connections that adopt a technology before the individual adopts, improves the network model fit to the urban scaling of early adopters. Controlling for the threshold distribution enables us to eliminate the bias induced by local network structure on predicting local adoption peaks. Finally, we show that geographical features such as distance from the innovation origin and town size influence prediction of adoption peak at local scales.
△ Less
Submitted 27 August, 2020; v1 submitted 4 April, 2018;
originally announced April 2018.
-
Video Pandemics: Worldwide Viral Spreading of Psy's Gangnam Style Video
Authors:
Zsofia Kallus,
Daniel Kondor,
Jozsef Steger,
Istvan Csabai,
Eszter Bokanyi,
Gabor Vattay
Abstract:
Viral videos can reach global penetration traveling through international channels of communication similarly to real diseases starting from a well-localized source. In past centuries, disease fronts propagated in a concentric spatial fashion from the the source of the outbreak via the short range human contact network. The emergence of long-distance air-travel changed these ancient patterns. Howe…
▽ More
Viral videos can reach global penetration traveling through international channels of communication similarly to real diseases starting from a well-localized source. In past centuries, disease fronts propagated in a concentric spatial fashion from the the source of the outbreak via the short range human contact network. The emergence of long-distance air-travel changed these ancient patterns. However, recently, Brockmann and Helbing have shown that concentric propagation waves can be reinstated if propagation time and distance is measured in the flight-time and travel volume weighted underlying air-travel network. Here, we adopt this method for the analysis of viral meme propagation in Twitter messages, and define a similar weighted network distance in the communication network connecting countries and states of the World. We recover a wave-like behavior on average and assess the randomizing effect of non-locality of spreading. We show that similar result can be recovered from Google Trends data as well.
△ Less
Submitted 14 July, 2017;
originally announced July 2017.
-
Universal Scaling Laws in Metro Area Election Results
Authors:
Eszter Bokányi,
Zoltán Szállási,
Gábor Vattay
Abstract:
We explain the anomaly of election results between large cities and rural areas in terms of urban scaling in the 1948-2016 US elections and in the 2016 EU referendum of the UK. The scaling curves are all universal and depend on a single parameter only, and one of the parties always shows superlinear scaling and drives the process, while the sublinear exponent of the other party is merely the conse…
▽ More
We explain the anomaly of election results between large cities and rural areas in terms of urban scaling in the 1948-2016 US elections and in the 2016 EU referendum of the UK. The scaling curves are all universal and depend on a single parameter only, and one of the parties always shows superlinear scaling and drives the process, while the sublinear exponent of the other party is merely the consequence of probability conservation. Based on the recently developed model of urban scaling, we give a microscopic model of voter behavior in which we replace diversity characterizing humans in creative aspects with social diversity and tolerance. The model can also predict new political developments such as the fragmentation of the left and 'the immigration paradox'.
△ Less
Submitted 6 April, 2017; v1 submitted 5 April, 2017;
originally announced April 2017.
-
Prediction of employment and unemployment rates from Twitter daily rhythms in the US
Authors:
Eszter Bokányi,
Zoltán Lábszki,
Gábor Vattay
Abstract:
By modeling macro-economical indicators using digital traces of human activities on mobile or social networks, we can provide important insights to processes previously assessed via paper-based surveys or polls only. We collected aggregated workday activity timelines of US counties from the normalized number of messages sent in each hour on the online social network Twitter. In this paper, we show…
▽ More
By modeling macro-economical indicators using digital traces of human activities on mobile or social networks, we can provide important insights to processes previously assessed via paper-based surveys or polls only. We collected aggregated workday activity timelines of US counties from the normalized number of messages sent in each hour on the online social network Twitter. In this paper, we show how county employment and unemployment statistics are encoded in the daily rhythm of people by decomposing the activity timelines into a linear combination of two dominant patterns. The mixing ratio of these patterns defines a measure for each county, that correlates significantly with employment ($0.46\pm0.02$) and unemployment rates ($-0.34\pm0.02$). Thus, the two dominant activity patterns can be linked to rhythms signaling presence or lack of regular working hours of individuals. The analysis could provide policy makers a better insight into the processes governing employment, where problems could not only be identified based on the number of officially registered unemployed, but also on the basis of the digital footprints people leave on different platforms.
△ Less
Submitted 6 April, 2017; v1 submitted 22 March, 2017;
originally announced March 2017.
-
Race, Religion and the City: Twitter Word Frequency Patterns Reveal Dominant Demographic Dimensions in the United States
Authors:
Eszter Bokányi,
Dániel Kondor,
László Dobos,
Tamás Sebők,
József Stéger,
István Csabai,
Gábor Vattay
Abstract:
Recently, numerous approaches have emerged in the social sciences to exploit the opportunities made possible by the vast amounts of data generated by online social networks (OSNs). Having access to information about users on such a scale opens up a range of possibilities, all without the limitations associated with often slow and expensive paper-based polls. A question that remains to be satisfact…
▽ More
Recently, numerous approaches have emerged in the social sciences to exploit the opportunities made possible by the vast amounts of data generated by online social networks (OSNs). Having access to information about users on such a scale opens up a range of possibilities, all without the limitations associated with often slow and expensive paper-based polls. A question that remains to be satisfactorily addressed, however, is how demography is represented in the OSN content? Here, we study language use in the US using a corpus of text compiled from over half a billion geo-tagged messages from the online microblogging platform Twitter. Our intention is to reveal the most important spatial patterns in language use in an unsupervised manner and relate them to demographics. Our approach is based on Latent Semantic Analysis (LSA) augmented with the Robust Principal Component Analysis (RPCA) methodology. We find spatially correlated patterns that can be interpreted based on the words associated with them. The main language features can be related to slang use, urbanization, travel, religion and ethnicity, the patterns of which are shown to correlate plausibly with traditional census data. Our findings thus validate the concept of demography being represented in OSN language use and show that the traits observed are inherently present in the word frequencies without any previous assumptions about the dataset. Thus, they could form the basis of further research focusing on the evaluation of demographic data estimation from other big data sources, or on the dynamical processes that result in the patterns found here.
△ Less
Submitted 11 May, 2016; v1 submitted 10 May, 2016;
originally announced May 2016.