Search | arXiv e-print repository

doi 10.1057/s41599-023-01611-3

American cultural regions mapped through the lexical analysis of social media

Authors: Thomas Louf, Bruno Gonçalves, Jose J. Ramasco, David Sanchez, Jack Grieve

Abstract: Cultural areas represent a useful concept that cross-fertilizes diverse fields in social sciences. Knowledge of how humans organize and relate their ideas and behavior within a society helps to understand their actions and attitudes towards different issues. However, the selection of common traits that shape a cultural area is somewhat arbitrary. What is needed is a method that can leverage the ma… ▽ More Cultural areas represent a useful concept that cross-fertilizes diverse fields in social sciences. Knowledge of how humans organize and relate their ideas and behavior within a society helps to understand their actions and attitudes towards different issues. However, the selection of common traits that shape a cultural area is somewhat arbitrary. What is needed is a method that can leverage the massive amounts of data coming online, especially through social media, to identify cultural regions without ad-hoc assumptions, biases or prejudices. This work takes a crucial step in this direction by introducing a method to infer cultural regions based on the automatic analysis of large datasets from microblogging posts. The approach presented here is based on the principle that cultural affiliation can be inferred from the topics that people discuss among themselves. Specifically, regional variations in written discourse are measured in American social media. From the frequency distributions of content words in geotagged Tweets, the regional hotspots of words' usage are found, and from there, principal components of regional variation are derived. Through a hierarchical clustering of the data in this lower-dimensional space, this method yields clear cultural areas and the topics of discussion that define them. It uncovers a manifest North-South separation, which is primarily influenced by the African American culture, and further contiguous (East-West) and non-contiguous divisions that provide a comprehensive picture of today's cultural areas in the US. △ Less

Submitted 18 April, 2023; v1 submitted 16 August, 2022; originally announced August 2022.

Comments: 13 pages, 5 figures; contains Supplementary Information

Journal ref: Humanit Soc Sci Commun 10, 133 (2023)

arXiv:2105.01148 [pdf]

Ultrabroadband nanocavity of hyperbolic phonon polaritons in 1D-like α-MoO3

Authors: Ingrid D. Barcelos, Thalita A. Canassa, Rafael A. Mayer, Flavio H. Feres, Eynara G. de Oliveira, Alem-Mar B. Goncalves, Hans A. Bechtel, Raul O. Freitas, Francisco C. B. Maia, Diego C. B. Alves

Abstract: The exploitation of phonon-polaritons in nanostructured materials offers a pathway to manipulate infrared (IR) light for nanophotonic applications. Notably, hyperbolic phonons polaritons (HP2) in polar bidimensional crystals have been used to demonstrate strong electromagnetic field confinement, ultraslow group velocities, and long lifetimes (~ up to 8 ps). Here we present nanobelts of α-phase mol… ▽ More The exploitation of phonon-polaritons in nanostructured materials offers a pathway to manipulate infrared (IR) light for nanophotonic applications. Notably, hyperbolic phonons polaritons (HP2) in polar bidimensional crystals have been used to demonstrate strong electromagnetic field confinement, ultraslow group velocities, and long lifetimes (~ up to 8 ps). Here we present nanobelts of α-phase molybdenum trioxide (α-MoO3) as a low-dimensional medium supporting HP2 modes in the mid- and far-IR ranges. By real-space nanoimaging, with IR illuminations provided by synchrotron and tunable lasers, we observe that such HP2 response happens via formation of Fabry-Perot resonances. We remark an anisotropic propagation which critically depends on the frequency range. Our findings are supported by the convergence of experiment, theory, and numerical simulations. Our work shows that the low dimensionality of natural nanostructured crystals, like α-MoO3 nanobelts, provides an attractive platform to study polaritonic light-matter interactions and offer appealing cavity properties that could be harnessed in future designs of compact nanophotonic devices. △ Less

Submitted 3 May, 2021; originally announced May 2021.

arXiv:1903.07805 [pdf, other]

A nonlinear couple stress-based sandwich beam theory

Authors: Bruno Reinaldo Goncalves, Anssi T. Karttunen, Jani Romanoff

Abstract: A geometrically nonlinear sandwich beam model founded on the modified couple stress Timoshenko beam theory with Kármán kinematics is derived and employed in the analysis of periodic sandwich structures. The constitutive model is based on the mechanical behavior of sandwich beams, with the bending response split into membrane-induced and local bending modes. A micromechanical approach based on the… ▽ More A geometrically nonlinear sandwich beam model founded on the modified couple stress Timoshenko beam theory with Kármán kinematics is derived and employed in the analysis of periodic sandwich structures. The constitutive model is based on the mechanical behavior of sandwich beams, with the bending response split into membrane-induced and local bending modes. A micromechanical approach based on the structural analysis of a unit cell is derived and utilized to obtain the stiffness properties of selected prismatic cores. The model is shown to be equivalent to the classical thick-face sandwich theory for the same basic assumptions. A two-node finite element interpolated with linear and cubic shape functions is proposed and its stiffness and geometric stiffness matrices are derived. Three examples illustrate the model capabilities in predicting deflections, stresses and critical buckling loads of elastic sandwich beams including elastic size effects. Good agreement is obtained throughout in comparisons with more involved finite element models. △ Less

Submitted 18 March, 2019; originally announced March 2019.

Comments: This work has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Action grant agreement No 745770 - SANDFECH - Micromechanics-based finite element modeling of sandwich structures

arXiv:1806.08651 [pdf]

doi 10.1109/TNS.2019.2937367

PCIe Hot Plug support standardization challenges in ATCA

Authors: Miguel Correia, Jorge Sousa, António P. Rodrigues, Paulo F. Carvalho, Bruno Santos, Álvaro Combo, Carlos M. B. A. Correia, Bruno Gonçalves

Abstract: Throughout the last decade, the Advanced Telecommunications Computing Architecture (ATCA) solidified its position as one of the main switched-based crate standards for advanced Physics instrumentation, offering not only highly performant characteristics in data throughput, channel density or power supply/dissipation capabilities, but also special features for high availability (HA), required for l… ▽ More Throughout the last decade, the Advanced Telecommunications Computing Architecture (ATCA) solidified its position as one of the main switched-based crate standards for advanced Physics instrumentation, offering not only highly performant characteristics in data throughput, channel density or power supply/dissipation capabilities, but also special features for high availability (HA), required for latest and upcoming large-scale endeavours, as is the case of ITER. Hot Swap is one of the main HA features in ATCA, allowing for Boards to be replaced in a crate (Shelf), without powering off the whole system. Platforms using the Peripheral Component Interconnect Express (PCIe) protocol on the Fabric Interface must be complemented, at the software level, with the PCIe Hot Plug native feature, currently not specified for the ATCA form-factor. From a customised Hot Plug support implementation for ATCA Node Boards, the paper presents an implementation extension for Hub Boards, allowing Hot Plug of PCIe switching devices, without causing bus enumeration problems. This paper further addresses the main issues concerning an eventual standardization of PCIe Hot Plug support in ATCA, such as the implementability of Hot Plug Elements and the generation and management of Hot Plug Events, aiming to stimulate the discussion within the PICMG community towards a long overdue standardized solution for Hot Plug in ATCA. △ Less

Submitted 22 June, 2018; originally announced June 2018.

Comments: 4 pages, 3 figures, 21st IEEE Real Time Conference - Colonial Williamsburg 9-15 June 2018 Woodlands Conference Center

arXiv:1806.08637 [pdf, other]

doi 10.1109/TNS.2019.2907056

The design and performance of the real time software architecture for the ITER Radial Neutron Camera

Authors: N. Cruz, B. Santos, A. Fernandes, P. F. Carvalho, J. Sousa, B. Gonçalves, M. Riva, C. Centioli, D. Marocco, B. Esposito, C. M. B. Correia, R. C. Pereira

Abstract: The neutron detection system for characterization of emissivity in ITER Tokamak during DD and DT experiments poses serious challenges to the performance of the diagnostic control and data acquisition system (CDAcq). The ongoing design of the ITER Radial Neutron Camera (RNC) diagnostic is composed by 26 lines of sight (LOS) for complete plasma inspection. The CDAcq system aims at meeting the ITER r… ▽ More The neutron detection system for characterization of emissivity in ITER Tokamak during DD and DT experiments poses serious challenges to the performance of the diagnostic control and data acquisition system (CDAcq). The ongoing design of the ITER Radial Neutron Camera (RNC) diagnostic is composed by 26 lines of sight (LOS) for complete plasma inspection. The CDAcq system aims at meeting the ITER requirements of delivering the measurement of the real-time neutron emissivity profile with time resolution and control cycle time of 10 ms at peak event rate of 2 MEvents/s per LOS. This measurement demands the generation of the neutron spectra for each LOS with neutron/gamma discrimination and pile up rejection. The neutron spectra can be totally processed in the host CPU or it can use the processed data coming from the system FPGA. The number of neutron counts extracted from the spectra is then used to calculate the neutron emissivity profile using an inversion algorithm. Moreover, it is required that the event based raw data acquired is made available to the ITER data network without local data storage for post processing. The data production for the 2 MEvents/s rate can go up to a maximum data throughput of 0.5 GB/s per channel. The evaluation of the use of real-time data compression techniques in RNC is also depicted in another contribution. To meet the demands of the project a CDAcq prototype has been used to design and test a high-performance distributed software architecture taking advantage of multi-core CPU technology capable of co** with the requirements. This submission depicts the design of the real-time architecture, the spectra algorithms (pulse height analysis, neutron/gamma discrimination and pile-up correction) and the inversion algorithm to calculate the emissivity profile. Preliminary tests to evaluate the system performance with synthetic data are presented. △ Less

Submitted 22 June, 2018; originally announced June 2018.

Comments: Conference Record - 21st IEEE Real Time Conference, Colonial Williamsburg, USA, 9-15 June 2018

arXiv:1806.06150 [pdf, other]

doi 10.1109/TNS.2019.2903646

FPGA code for the data acquisition and real-time processing prototype of the ITER Radial Neutron Camera

Authors: Ana Fernandes, Nuno Cruz, Bruno Santos, Paulo F. Carvalho, Jorge Sousa, Bruno Gonçalves, Marco Riva, Fabio Pollastrone, Cristina Centioli, Daniele Marocco, Basilio Esposito, Carlos M. B. A. Correia, Rita C. Pereira

Abstract: The main role of the ITER Radial Neutron Camera (RNC) diagnostic is to measure in real-time the plasma neutron emissivity profile at high peak count rates for a time duration up to 500 s. Due to the unprecedented high performance conditions and after the identification of critical problems, a set of activities have been selected, focused on the development of high priority prototypes, capable to d… ▽ More The main role of the ITER Radial Neutron Camera (RNC) diagnostic is to measure in real-time the plasma neutron emissivity profile at high peak count rates for a time duration up to 500 s. Due to the unprecedented high performance conditions and after the identification of critical problems, a set of activities have been selected, focused on the development of high priority prototypes, capable to deliver answers to those problems before the final RNC design. This paper presents one of the selected activities: the design, development and testing of a dedicated FPGA code for the RNC Data Acquisition prototype. The FPGA code aims to acquire, process and store in real-time the neutron and gamma pulses from the detectors located in collimated lines of sight viewing a poloidal plasma section from the ITER Equatorial Port Plug 1. The hardware platform used was an evaluation board from Xilinx (KC705) carrying an IPFN FPGA Mezzanine Card (FMC-AD2-1600) with 2 digitizer channels of 12-bit resolution sampling up to 1.6 GSamples/s. The code performs the proper input signal conditioning using a down-sampled configuration to 400 MSamples/s, apply dedicated algorithms for pulse detection, filtering and pileup detection, and includes two distinct data paths operating simultaneously: i) the event-based data-path for pulse storage; and ii) the real-time processing, with dedicated algorithms for pulse shape discrimination and pulse height spectra. For continuous data throughput both data-paths are streamed to the host through two distinct PCIe x8 Direct Memory Access (DMA) channels. △ Less

Submitted 15 June, 2018; originally announced June 2018.

Comments: 6 pages, 10 figures, 21st IEEE Real Time Conference (RT-2018), Colonial Williamsburg, 9-15 June 2018

arXiv:1806.04671 [pdf, other]

doi 10.1109/TNS.2019.2899319

Real-time data compression for data acquisition systems applied to the ITER Radial Neutron Camera

Authors: B. Santos, N. Cruz, A. Fernandes, P. F. Carvalho, J. Sousa, B. Gonçalves, M. Riva, F. Pollastrone, C. Centioli, D. Marocco, B. Esposito, C. M. B. Correia, R. C. Pereira

Abstract: To achieve the aim of the ITER Radial Neutron Camera Diagnostic, the data acquisition prototype must be compliant with a sustained 2 MHz peak event for each channel with 128 samples of 16 bits per event. The data is acquired and processed using an IPFN FPGA Mezzanine Card (FMC-AD2-1600) with 2 digitizer channels of 12-bit resolution and a sampling rate up to 1.6 GSamples/s mounted in a PCIe evalua… ▽ More To achieve the aim of the ITER Radial Neutron Camera Diagnostic, the data acquisition prototype must be compliant with a sustained 2 MHz peak event for each channel with 128 samples of 16 bits per event. The data is acquired and processed using an IPFN FPGA Mezzanine Card (FMC-AD2-1600) with 2 digitizer channels of 12-bit resolution and a sampling rate up to 1.6 GSamples/s mounted in a PCIe evaluation board from Xilinx (KC705) installed in the host PC. The acquired data in the event-based data-path is streamed to the host through the PCIe x8 Direct Memory Access (DMA) with a maximum data throughput per channel is 0.5 GB/s of raw data (event base), 1 GB/s per digitizer and up to 1.6 GB/s in continuous mode. The prototype architecture comprises an host PC with two KC705 modules and four channels, producing up to 2 GB/s in event mode and up to 3.2 GB/s in continuous mode. To reduce the produced data throughput from host to ITER databases, the real-time data compression was evaluated using the LZ4 lossless compression algorithm, which provides compression speed up to 400 MB/s per core. This paper presents the architecture, implementation and test of the parallel real-time data compression system running in multiple isolated cores. The average space saving and the performance results for long term acquisitions up to 30 minutes, using different data block size and different number of CPUs, is also presented. △ Less

Submitted 21 June, 2018; v1 submitted 11 June, 2018; originally announced June 2018.

Comments: 21st Real Time Conference, June 9th - 15th, Colonial Williamsburg, Virginia, United States

arXiv:1804.07123 [pdf]

Radial structure of vorticity in the plasma boundary of ISTTOK tokamak

Authors: B. Gonçalves, I. Henriques, C. Hidalgo, C. Silva, H. Figueiredo, V. Naulin, A. H. Nielsen, J. T. Mendonça

Abstract: The first experimental measurements of vorticity and vorticity flux in a fusion device were performed in tokamak ISTTOK. This is an important achievement since vorticity plays a key role in the transport of energy and particles in plasmas and fluids. The measurements were performed with a specifically designed array of Langmuir probes in the plasma edge of the small tokamak ISTTOK. The experimenta… ▽ More The first experimental measurements of vorticity and vorticity flux in a fusion device were performed in tokamak ISTTOK. This is an important achievement since vorticity plays a key role in the transport of energy and particles in plasmas and fluids. The measurements were performed with a specifically designed array of Langmuir probes in the plasma edge of the small tokamak ISTTOK. The experimental results presented in this paper, allowing for the first time a direct comparison with theoretical models, show that the vorticity flux feeds into the shear flow in the tokamak plasma edge region. The Probability Distribution Function of the vorticity exhibit fat tails with a q-Gaussian shape typical of a non-equilibrium process. Self-similarity in the probability distribution function of several parameters, including vorticity and vorticity flux, is observed indicating that there is no morphological change in the coherent structures in the plasma boundary region and that the fluctuations in the Reynolds stress, vorticity and vorticity flux follow a universal shape. △ Less

Submitted 4 July, 2018; v1 submitted 19 April, 2018; originally announced April 2018.

Comments: 8 figures

arXiv:1801.09524 [pdf, other]

Foursquare to The Rescue: Predicting Ambulance Calls Across Geographies

Authors: Anastasios Noulas, Colin Moffatt, Desislava Hristova, Bruno Gonçalves

Abstract: Understanding how ambulance incidents are spatially distributed can shed light to the epidemiological dynamics of geographic areas and inform healthcare policy design. Here we analyze a longitudinal dataset of more than four million ambulance calls across a region of twelve million residents in the North West of England. With the aim to explain geographic variations in ambulance call frequencies,… ▽ More Understanding how ambulance incidents are spatially distributed can shed light to the epidemiological dynamics of geographic areas and inform healthcare policy design. Here we analyze a longitudinal dataset of more than four million ambulance calls across a region of twelve million residents in the North West of England. With the aim to explain geographic variations in ambulance call frequencies, we employ a wide range of data layers including open government datasets describing population demographics and socio-economic characteristics, as well as geographic activity in online services such as Foursquare. Working at a fine level of spatial granularity we demonstrate that daytime population levels and the deprivation status of an area are the most important variables when it comes to predicting the volume of ambulance calls at an area. Foursquare check-ins on the other hand complement these government sourced indicators, offering a novel view to population nightlife and commercial activity locally. We demonstrate how check-in activity can provide an edge when predicting certain types of emergency incidents in a multi-variate regression model. △ Less

Submitted 26 February, 2018; v1 submitted 29 January, 2018; originally announced January 2018.

Comments: 10 pages, 9 figures, 4 tables

arXiv:1707.00781 [pdf, other]

doi 10.1371/journal.pone.0197741

Map** the Americanization of English in Space and Time

Authors: Bruno Gonçalves, Lucía Loureiro-Porto, José J. Ramasco, David Sánchez

Abstract: As global political preeminence gradually shifted from the United Kingdom to the United States, so did the capacity to culturally influence the rest of the world. In this work, we analyze how the world-wide varieties of written English are evolving. We study both the spatial and temporal variations of vocabulary and spelling of English using a large corpus of geolocated tweets and the Google Books… ▽ More As global political preeminence gradually shifted from the United Kingdom to the United States, so did the capacity to culturally influence the rest of the world. In this work, we analyze how the world-wide varieties of written English are evolving. We study both the spatial and temporal variations of vocabulary and spelling of English using a large corpus of geolocated tweets and the Google Books datasets corresponding to books published in the US and the UK. The advantage of our approach is that we can address both standard written language (Google Books) and the more colloquial forms of microblogging messages (Twitter). We find that American English is the dominant form of English outside the UK and that its influence is felt even within the UK borders. Finally, we analyze how this trend has evolved over time and the impact that some cultural events have had in sha** it. △ Less

Submitted 28 May, 2018; v1 submitted 3 July, 2017; originally announced July 2017.

Comments: 16 pages, 6 figures, 2 tables. Published version

Journal ref: PLoS ONE 13: e0197741 (2018)

arXiv:1611.01056 [pdf, other]

doi 10.1371/journal.pone.0191612

Immigrant community integration in world cities

Authors: Fabio Lamanna, Maxime Lenormand, María Henar Salas-Olmedo, Gustavo Romanillos, Bruno Gonçalves, José J. Ramasco

Abstract: As a consequence of the accelerated globalization process, today major cities all over the world are characterized by an increasing multiculturalism. The integration of immigrant communities may be affected by social polarization and spatial segregation. How are these dynamics evolving over time? To what extent the different policies launched to tackle these problems are working? These are critica… ▽ More As a consequence of the accelerated globalization process, today major cities all over the world are characterized by an increasing multiculturalism. The integration of immigrant communities may be affected by social polarization and spatial segregation. How are these dynamics evolving over time? To what extent the different policies launched to tackle these problems are working? These are critical questions traditionally addressed by studies based on surveys and census data. Such sources are safe to avoid spurious biases, but the data collection becomes an intensive and rather expensive work. Here, we conduct a comprehensive study on immigrant integration in 53 world cities by introducing an innovative approach: an analysis of the spatio-temporal communication patterns of immigrant and local communities based on language detection in Twitter and on novel metrics of spatial integration. We quantify the "Power of Integration" of cities --their capacity to spatially integrate diverse cultures-- and characterize the relations between different cultures when acting as hosts or immigrants. △ Less

Submitted 14 March, 2018; v1 submitted 3 November, 2016; originally announced November 2016.

Comments: 13 pages, 5 figures + Appendix

Journal ref: PLoS ONE 13, e0191612 (2018)

arXiv:1608.06949 [pdf, other]

doi 10.1109/TVCG.2016.2598585

Urban Pulse: Capturing the Rhythm of Cities

Authors: Fabio Miranda, Harish Doraiswamy, Marcos Lage, Kai Zhao, Bruno Gonçalves, Luc Wilson, Mondrian Hsieh, Cláudio T. Silva

Abstract: Cities are inherently dynamic. Interesting patterns of behavior typically manifest at several key areas of a city over multiple temporal resolutions. Studying these patterns can greatly help a variety of experts ranging from city planners and architects to human behavioral experts. Recent technological innovations have enabled the collection of enormous amounts of data that can help in these studi… ▽ More Cities are inherently dynamic. Interesting patterns of behavior typically manifest at several key areas of a city over multiple temporal resolutions. Studying these patterns can greatly help a variety of experts ranging from city planners and architects to human behavioral experts. Recent technological innovations have enabled the collection of enormous amounts of data that can help in these studies. However, techniques using these data sets typically focus on understanding the data in the context of the city, thus failing to capture the dynamic aspects of the city. The goal of this work is to instead understand the city in the context of multiple urban data sets. To do so, we define the concept of an "urban pulse" which captures the spatio-temporal activity in a city across multiple temporal resolutions. The prominent pulses in a city are obtained using the topology of the data sets, and are characterized as a set of beats. The beats are then used to analyze and compare different pulses. We also design a visual exploration framework that allows users to explore the pulses within and across multiple cities under different conditions. Finally, we present three case studies carried out by experts from two different domains that demonstrate the utility of our framework. △ Less

Submitted 29 December, 2017; v1 submitted 24 August, 2016; originally announced August 2016.

Comments: 10 pages, 10 figures, 1 table. Demo video: https://www.youtube.com/watch?v=J70-Ns0cFnQ . Github project: https://github.com/ViDA-NYU/urban-pulse ; Added github link

Journal ref: IEEE Transactions on Visualization and Computer Graphics (Volume: 23, Issue: 1, Jan. 2017)

arXiv:1606.08207 [pdf, other]

Semantic homophily in online communication: evidence from Twitter

Authors: Sanja Šćepanović, Igor Mishkovski, Bruno Gonçalves, Nguyen Trung Hieu, Pan Hui

Abstract: People are observed to assortatively connect on a set of traits. This phenomenon, termed assortative mixing or sometimes homophily, can be quantified through assortativity coefficient in social networks. Uncovering the exact causes of strong assortative mixing found in social networks has been a research challenge. Among the main suggested causes from sociology are the tendency of similar individu… ▽ More People are observed to assortatively connect on a set of traits. This phenomenon, termed assortative mixing or sometimes homophily, can be quantified through assortativity coefficient in social networks. Uncovering the exact causes of strong assortative mixing found in social networks has been a research challenge. Among the main suggested causes from sociology are the tendency of similar individuals to connect (often itself referred as homophily) and the social influence among already connected individuals. An important question to researchers and in practice can be tackled, as we present here: understanding the exact mechanisms of interplay between these tendencies and the underlying social network structure. Namely, in addition to the mentioned assortativity coefficient, there are several other static and temporal network properties and substructures that can be linked to the tendencies of homophily and social influence in the social network and we herein investigate those. Concretely, we tackle a computer-mediated \textit{communication network} (based on Twitter mentions) and a particular type of assortative mixing that can be inferred from the semantic features of communication content that we term \textit{semantic homophily}. Our work, to the best of our knowledge, is the first to offer an in-depth analysis on semantic homophily in a communication network and the interplay between them. We quantify diverse levels of semantic homophily, identify the semantic aspects that are the drivers of observed homophily, show insights in its temporal evolution and finally, we present its intricate interplay with the communication network on Twitter. By analyzing these mechanisms we increase understanding on what are the semantic aspects that shape and how they shape the human computer-mediated communication. △ Less

Submitted 20 March, 2017; v1 submitted 27 June, 2016; originally announced June 2016.

Comments: 19 pages, 11 figures, 7 tables

arXiv:1604.01800 [pdf, ps, other]

Birefringence phenomena revisited

Authors: Dante D. Pereira, Baltazar J. Ribeiro, Bruno Gonçalves

Abstract: The propagation of electromagnetic waves is investigated in the context of the isotropic and nonlinear dielectric media at rest in the eikonal limit of the geometrical optics. Taking into account the functional dependence $\varepsilon=\varepsilon(E,B)$ and $μ=μ(E,B)$ for the dielectric coefficients, a set of phenomena related to the birefringence of the electromagnetic waves induced by external fi… ▽ More The propagation of electromagnetic waves is investigated in the context of the isotropic and nonlinear dielectric media at rest in the eikonal limit of the geometrical optics. Taking into account the functional dependence $\varepsilon=\varepsilon(E,B)$ and $μ=μ(E,B)$ for the dielectric coefficients, a set of phenomena related to the birefringence of the electromagnetic waves induced by external fields are derived and discussed. Our results contemplate the known cases already reported in the literature: Kerr, Cotton-Mouton, Jones and magnetoelectric effects. Moreover, new effects are presented here as well as the perspectives of its experimental confirmations. △ Less

Submitted 6 April, 2016; originally announced April 2016.

arXiv:1602.02665 [pdf, other]

The happiness paradox: your friends are happier than you

Authors: Johan Bollen, Bruno Gonçalves, Ingrid van de Leemput, Guangchen Ruan

Abstract: Most individuals in social networks experience a so-called Friendship Paradox: they are less popular than their friends on average. This effect may explain recent findings that widespread social network media use leads to reduced happiness. However the relation between popularity and happiness is poorly understood. A Friendship paradox does not necessarily imply a Happiness paradox where most indi… ▽ More Most individuals in social networks experience a so-called Friendship Paradox: they are less popular than their friends on average. This effect may explain recent findings that widespread social network media use leads to reduced happiness. However the relation between popularity and happiness is poorly understood. A Friendship paradox does not necessarily imply a Happiness paradox where most individuals are less happy than their friends. Here we report the first direct observation of a significant Happiness Paradox in a large-scale online social network of $39,110$ Twitter users. Our results reveal that popular individuals are indeed happier and that a majority of individuals experience a significant Happiness paradox. The magnitude of the latter effect is shaped by complex interactions between individual popularity, happiness, and the fact that users cluster assortatively by level of happiness. Our results indicate that the topology of online social networks and the distribution of happiness in some populations can cause widespread psycho-social effects that affect the well-being of billions of individuals. △ Less

Submitted 8 February, 2016; originally announced February 2016.

Comments: 15 pages, 3 figures, 2 tables

arXiv:1601.07741 [pdf, other]

doi 10.1140/epjds/s13688-016-0073-5

Touristic site attractiveness seen through Twitter

Authors: Aleix Bassolas, Maxime Lenormand, Antònia Tugores, Bruno Gonçalves, José J. Ramasco

Abstract: Tourism is becoming a significant contributor to medium and long range travels in an increasingly globalized world. Leisure traveling has an important impact on the local and global economy as well as on the environment. The study of touristic trips is thus raising a considerable interest. In this work, we apply a method to assess the attractiveness of 20 of the most popular touristic sites worldw… ▽ More Tourism is becoming a significant contributor to medium and long range travels in an increasingly globalized world. Leisure traveling has an important impact on the local and global economy as well as on the environment. The study of touristic trips is thus raising a considerable interest. In this work, we apply a method to assess the attractiveness of 20 of the most popular touristic sites worldwide using geolocated tweets as a proxy for human mobility. We first rank the touristic sites based on the spatial distribution of the visitors' place of residence. The Taj Mahal, the Pisa Tower and the Eiffel Tower appear consistently in the top 5 in these rankings. We then pass to a coarser scale and classify the travelers by country of residence. Touristic site's visiting figures are then studied by country of residence showing that the Eiffel Tower, Times Square and the London Tower welcome the majority of the visitors of each country. Finally, we build a network linking sites whenever a user has been detected in more than one site. This allow us to unveil relations between touristic sites and find which ones are more tightly interconnected. △ Less

Submitted 25 March, 2016; v1 submitted 28 January, 2016; originally announced January 2016.

Comments: 8 pages and 5 figures

Journal ref: EPJ Data Science 5, 12 (2016)

arXiv:1512.07281 [pdf, other]

doi 10.1145/2872518.2890562

Topical differences between Chinese language Twitter and Sina Weibo

Authors: Qian Zhang, Bruno Gonçalves

Abstract: Sina Weibo, China's most popular microblogging platform, is currently used by over $500M$ users and is considered to be a proxy of Chinese social life. In this study, we contrast the discussions occurring on Sina Weibo and on Chinese language Twitter in order to observe two different strands of Chinese culture: people within China who use Sina Weibo with its government imposed restrictions and tho… ▽ More Sina Weibo, China's most popular microblogging platform, is currently used by over $500M$ users and is considered to be a proxy of Chinese social life. In this study, we contrast the discussions occurring on Sina Weibo and on Chinese language Twitter in order to observe two different strands of Chinese culture: people within China who use Sina Weibo with its government imposed restrictions and those outside that are free to speak completely anonymously. We first propose a simple ad-hoc algorithm to identify topics of Tweets and Weibo. Different from previous works on micro-message topic detection, our algorithm considers topics of the same contents but with different \#tags. Our algorithm can also detect topics for Tweets and Weibos without any \#tags. Using a large corpus of Weibo and Chinese language tweets, covering the period from January $1$ to December $31$, $2012$, we obtain a list of topics using clustered \#tags that we can then use to compare the two platforms. Surprisingly, we find that there are no common entries among the Top $100$ most popular topics. Furthermore, only $9.2\%$ of tweets correspond to the Top $1000$ topics on Sina Weibo platform, and conversely only $4.4\%$ of weibos were found to discuss the most popular Twitter topics. Our results reveal significant differences in social attention on the two platforms, with most popular topics on Sina Weibo relating to entertainment while most tweets corresponded to cultural or political contents that is practically non existent in Sina Weibo. △ Less

Submitted 22 December, 2015; originally announced December 2015.

Comments: 5 pages, 2 figures, 2 tables, 2 algorithms

Journal ref: 2nd Natural Language Processing for Informal Text (NLPIT), 625 (2016)

arXiv:1511.04970 [pdf]

Learning about Spanish dialects through Twitter

Authors: Bruno Gonçalves, David Sánchez

Abstract: This paper maps the large-scale variation of the Spanish language by employing a corpus based on geographically tagged Twitter messages. Lexical dialects are extracted from an analysis of variants of tens of concepts. The resulting maps show linguistic variation on an unprecedented scale across the globe. We discuss the properties of the main dialects within a machine learning approach and find th… ▽ More This paper maps the large-scale variation of the Spanish language by employing a corpus based on geographically tagged Twitter messages. Lexical dialects are extracted from an analysis of variants of tens of concepts. The resulting maps show linguistic variation on an unprecedented scale across the globe. We discuss the properties of the main dialects within a machine learning approach and find that varieties spoken in urban areas have an international character in contrast to country areas where dialects show a more regional uniformity. △ Less

Submitted 5 February, 2017; v1 submitted 16 November, 2015; originally announced November 2015.

Comments: 16 pages, 5 figures, 1 table

Journal ref: RILI, XVI 2 (28), 65-75 (2016)

arXiv:1507.06106 [pdf, other]

doi 10.1126/sciadv.1501158

The dynamic of information-driven coordination phenomena: a transfer entropy analysis

Authors: Javier Borge-Holthoefer, Nicola Perra, Bruno Gonçalves, Sandra González-Bailón, Alex Arenas, Yamir Moreno, Alessandro Vespignani

Abstract: Data from social media are providing unprecedented opportunities to investigate the processes that rule the dynamics of collective social phenomena. Here, we consider an information theoretical approach to define and measure the temporal and structural signatures typical of collective social events as they arise and gain prominence. We use the symbolic transfer entropy analysis of micro-blogging t… ▽ More Data from social media are providing unprecedented opportunities to investigate the processes that rule the dynamics of collective social phenomena. Here, we consider an information theoretical approach to define and measure the temporal and structural signatures typical of collective social events as they arise and gain prominence. We use the symbolic transfer entropy analysis of micro-blogging time series to extract directed networks of influence among geolocalized sub-units in social systems. This methodology captures the emergence of system-level dynamics close to the onset of socially relevant collective phenomena. The framework is validated against a detailed empirical analysis of five case studies. In particular, we identify a change in the characteristic time-scale of the information transfer that flags the onset of information-driven collective phenomena. Furthermore, our approach identifies an order-disorder transition in the directed network of influence between social sub-units. In the absence of a clear exogenous driving, social collective phenomena can be represented as endogenously-driven structural transitions of the information transfer network. This study provides results that can help define models and predictive algorithms for the analysis of societal events based on open source data. △ Less

Submitted 22 July, 2015; originally announced July 2015.

Comments: 46 pages (main text: 16; SI: 30)

Journal ref: Science Advances 2(4) e1501158 (2016)

arXiv:1501.07788 [pdf, other]

doi 10.1098/rsif.2015.0473

Human diffusion and city influence

Authors: Maxime Lenormand, Bruno Gonçalves, Antònia Tugores, José J. Ramasco

Abstract: Cities are characterized by concentrating population, economic activity and services. However, not all cities are equal and a natural hierarchy at local, regional or global scales spontaneously emerges. In this work, we introduce a method to quantify city influence using geolocated tweets to characterize human mobility. Rome and Paris appear consistently as the cities attracting most diverse visit… ▽ More Cities are characterized by concentrating population, economic activity and services. However, not all cities are equal and a natural hierarchy at local, regional or global scales spontaneously emerges. In this work, we introduce a method to quantify city influence using geolocated tweets to characterize human mobility. Rome and Paris appear consistently as the cities attracting most diverse visitors. The ratio between locals and non-local visitors turns out to be fundamental for a city to truly be global. Focusing only on urban residents' mobility flows, a city to city network can be constructed. This network allows us to analyze centrality measures at different scales. New York and London play a predominant role at the global scale, while urban rankings suffer substantial changes if the focus is set at a regional level. △ Less

Submitted 15 July, 2015; v1 submitted 30 January, 2015; originally announced January 2015.

Comments: 9 pages, 7 figures + appendix

Journal ref: J. R. Soc. Interface 12, 20150473 (2015)

arXiv:1501.07201 [pdf, other]

Everyday the Same Picture: Popularity and Content Diversity

Authors: Alessandro Bessi, Fabiana Zollo, Michela Del Vicario, Antonio Scala, Fabio Petroni, Bruno Gonçalves, Walter Quattrociocchi

Abstract: Facebook is flooded by diverse and heterogeneous content, from kittens up to music and news, passing through satirical and funny stories. Each piece of that corpus reflects the heterogeneity of the underlying social background. In the Italian Facebook we have found an interesting case: a page having more than $40K$ followers that every day posts the same picture of a popular Italian singer. In thi… ▽ More Facebook is flooded by diverse and heterogeneous content, from kittens up to music and news, passing through satirical and funny stories. Each piece of that corpus reflects the heterogeneity of the underlying social background. In the Italian Facebook we have found an interesting case: a page having more than $40K$ followers that every day posts the same picture of a popular Italian singer. In this work, we use such a page as a control to study and model the relationship between content heterogeneity on popularity. In particular, we use that page for a comparative analysis of information consumption patterns with respect to pages posting science and conspiracy news. In total, we analyze about $2M$ likes and $190K$ comments, made by approximately $340K$ and $65K$ users, respectively. We conclude the paper by introducing a model mimicking users selection preferences accounting for the heterogeneity of contents. △ Less

Submitted 2 February, 2015; v1 submitted 28 January, 2015; originally announced January 2015.

arXiv:1407.7094 [pdf, other]

doi 10.1371/journal.pone.0112074

Crowdsourcing Dialect Characterization through Twitter

Authors: Bruno Gonçalves, David Sánchez

Abstract: We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a global scale. A cluster analysis proves the existence of well defined macroregions sharing common l… ▽ More We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a global scale. A cluster analysis proves the existence of well defined macroregions sharing common lexical properties. Remarkably enough, we find that Spanish language is split into two superdialects, namely, an urban speech used across major American and Spanish citites and a diverse form that encompasses rural areas and small towns. The latter can be further clustered into smaller varieties with a stronger regional character. △ Less

Submitted 26 July, 2014; originally announced July 2014.

Comments: 10 pages, 5 figures

Journal ref: PLoS One 9, E112074 (2014)

arXiv:1406.4436 [pdf, other]

doi 10.1109/TNS.2015.2478965

An optimal real-time controller for vertical plasma stabilization

Authors: N. Cruz, J-M. Moret, S. Coda, B. P. Duval, H. B. Le, A. P. Rodrigues, C. A. F. Varandas, C. M. B. A. Correia, B. S. Goncalves

Abstract: Modern Tokamaks have evolved from the initial axisymmetric circular plasma shape to an elongated axisymmetric plasma shape that improves the energy confinement time and the triple product, which is a generally used figure of merit for the conditions needed for fusion reactor performance. However, the elongated plasma cross section introduces a vertical instability that demands a real-time feedback… ▽ More Modern Tokamaks have evolved from the initial axisymmetric circular plasma shape to an elongated axisymmetric plasma shape that improves the energy confinement time and the triple product, which is a generally used figure of merit for the conditions needed for fusion reactor performance. However, the elongated plasma cross section introduces a vertical instability that demands a real-time feedback control loop to stabilize the plasma vertical position and velocity. At the Tokamak à Configuration Variable (TCV) in-vessel poloidal field coils driven by fast switching power supplies are used to stabilize highly elongated plasmas. TCV plasma experiments have used a PID algorithm based controller to correct the plasma vertical position. In late 2013 experiments a new optimal real-time controller was tested improving the stability of the plasma. This contribution describes the new optimal real-time controller developed. The choice of the model that describes the plasma response to the actuators is discussed. The high order model that is initially implemented demands the application of a mathematical order reduction and the validation of the new reduced model. The lower order model is used to derive the time optimal control law. A new method for the construction of the switching curves of a bang-bang controller is presented that is based on the state-space trajectories that optimize the time to target of the system. A closed loop controller simulation tool was developed to test different possible algorithms and the results were used to improve the controller parameters. The final control algorithm and its implementation are described and preliminary experimental results are discussed. △ Less

Submitted 18 June, 2014; v1 submitted 13 June, 2014; originally announced June 2014.

Comments: Preprint submitted to IEEE Transactions on Nuclear Science

arXiv:1307.5304 [pdf, other]

doi 10.1371/journal.pone.0092196

Entangling mobility and interactions in social media

Authors: Przemyslaw A. Grabowicz, Jose J. Ramasco, Bruno Goncalves, Victor M. Eguiluz

Abstract: Daily interactions naturally define social circles. Individuals tend to be friends with the people they spend time with and they choose to spend time with their friends, inextricably entangling physical location and social relationships. As a result, it is possible to predict not only someone's location from their friends' locations but also friendship from spatial and temporal co-occurrence. Whil… ▽ More Daily interactions naturally define social circles. Individuals tend to be friends with the people they spend time with and they choose to spend time with their friends, inextricably entangling physical location and social relationships. As a result, it is possible to predict not only someone's location from their friends' locations but also friendship from spatial and temporal co-occurrence. While several models have been developed to separately describe mobility and the evolution of social networks, there is a lack of studies coupling social interactions and mobility. In this work, we introduce a new model that bridges this gap by explicitly considering the feedback of mobility on the formation of social ties. Data coming from three online social networks (Twitter, Gowalla and Brightkite) is used for validation. Our model reproduces various topological and physical properties of these networks such as: i) the size of the connected components, ii) the distance distribution between connected users, iii) the dependence of the reciprocity on the distance, iv) the variation of the social overlap and the clustering with the distance. Besides numerical simulations, a mean-field approach is also used to study analytically the main statistical features of the networks generated by the model. The robustness of the results to changes in the model parameters is explored, finding that a balance between friend visits and long-range random connections is essential to reproduce the geographical features of the empirical networks. △ Less

Submitted 10 March, 2014; v1 submitted 19 July, 2013; originally announced July 2013.

Comments: 19 pages, 19 figures

Journal ref: PLoS ONE 9(3): e92196 (2014)

arXiv:1302.6569 [pdf, other]

doi 10.1038/srep01640

Characterizing scientific production and consumption in Physics

Authors: Qian Zhang, Nicola Perra, Bruno Goncalves, Fabio Ciulla, Alessandro Vespignani

Abstract: We analyze the entire publication database of the American Physical Society generating longitudinal (50 years) citation networks geolocalized at the level of single urban areas. We define the knowledge diffusion proxy, and scientific production ranking algorithms to capture the spatio-temporal dynamics of Physics knowledge worldwide. By using the knowledge diffusion proxy we identify the key citie… ▽ More We analyze the entire publication database of the American Physical Society generating longitudinal (50 years) citation networks geolocalized at the level of single urban areas. We define the knowledge diffusion proxy, and scientific production ranking algorithms to capture the spatio-temporal dynamics of Physics knowledge worldwide. By using the knowledge diffusion proxy we identify the key cities in the production and consumption of knowledge in Physics as a function of time. The results from the scientific production ranking algorithm allow us to characterize the top cities for scholarly research in Physics. Although we focus on a single dataset concerning a specific field, the methodology presented here opens the path to comparative studies of the dynamics of knowledge across disciplines and research areas △ Less

Submitted 26 February, 2013; originally announced February 2013.

Journal ref: Nature Scientific Reports 3, 1640 (2013)

arXiv:1302.6276 [pdf, other]

The Role of Information Diffusion in the Evolution of Social Networks

Authors: Lilian Weng, Jacob Ratkiewicz, Nicola Perra, Bruno Gonçalves, Carlos Castillo, Francesco Bonchi, Rossano Schifanella, Filippo Menczer, Alessandro Flammini

Abstract: Every day millions of users are connected through online social networks, generating a rich trove of data that allows us to study the mechanisms behind human interactions. Triadic closure has been treated as the major mechanism for creating social links: if Alice follows Bob and Bob follows Charlie, Alice will follow Charlie. Here we present an analysis of longitudinal micro-blogging data, reveali… ▽ More Every day millions of users are connected through online social networks, generating a rich trove of data that allows us to study the mechanisms behind human interactions. Triadic closure has been treated as the major mechanism for creating social links: if Alice follows Bob and Bob follows Charlie, Alice will follow Charlie. Here we present an analysis of longitudinal micro-blogging data, revealing a more nuanced view of the strategies employed by users when expanding their social circles. While the network structure affects the spread of information among users, the network is in turn shaped by this communication activity. This suggests a link creation mechanism whereby Alice is more likely to follow Charlie after seeing many messages by Charlie. We characterize users with a set of parameters associated with different link creation strategies, estimated by a Maximum-Likelihood approach. Triadic closure does have a strong effect on link formation, but shortcuts based on traffic are another key factor in interpreting network evolution. However, individual strategies for following other users are highly heterogeneous. Link creation behaviors can be summarized by classifying users in different categories with distinct structural and behavioral characteristics. Users who are popular, active, and influential tend to create traffic-based shortcuts, making the information diffusion process more efficient in the network. △ Less

Submitted 20 June, 2013; v1 submitted 25 February, 2013; originally announced February 2013.

Comments: 9 pages, 10 figures, 2 tables

ACM Class: H.1; J.4; H.1.2

Journal ref: Proc. 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2013)

arXiv:1212.5238 [pdf, other]

doi 10.1371/journal.pone.0061981

The Twitter of Babel: Map** World Languages through Microblogging Platforms

Authors: Delia Mocanu, Andrea Baronchelli, Bruno Gonçalves, Nicola Perra, Alessandro Vespignani

Abstract: Large scale analysis and statistics of socio-technical systems that just a few short years ago would have required the use of consistent economic and human resources can nowadays be conveniently performed by mining the enormous amount of digital data produced by human activities. Although a characterization of several aspects of our societies is emerging from the data revolution, a number of quest… ▽ More Large scale analysis and statistics of socio-technical systems that just a few short years ago would have required the use of consistent economic and human resources can nowadays be conveniently performed by mining the enormous amount of digital data produced by human activities. Although a characterization of several aspects of our societies is emerging from the data revolution, a number of questions concerning the reliability and the biases inherent to the big data "proxies" of social life are still open. Here, we survey worldwide linguistic indicators and trends through the analysis of a large-scale dataset of microblogging posts. We show that available data allow for the study of language geography at scales ranging from country-level aggregation to specific city neighborhoods. The high resolution and coverage of the data allows us to investigate different indicators such as the linguistic homogeneity of different countries, the touristic seasonal patterns within countries and the geographical distribution of different languages in multilingual regions. This work highlights the potential of geolocalized studies of open data sources to improve current analysis and develop indicators for major social phenomena in specific communities. △ Less

Submitted 20 December, 2012; originally announced December 2012.

Journal ref: PLoS One 8, E61981 (2013)

arXiv:1209.1351 [pdf, ps, other]

doi 10.1007/s10955-012-0595-6

Emergence of influential spreaders in modified rumor models

Authors: Javier Borge-Holthoefer, Sandro Meloni, Bruno Gonçalves, Yamir Moreno

Abstract: The burst in the use of online social networks over the last decade has provided evidence that current rumor spreading models miss some fundamental ingredients in order to reproduce how information is disseminated. In particular, recent literature has revealed that these models fail to reproduce the fact that some nodes in a network have an influential role when it comes to spread a piece of infor… ▽ More The burst in the use of online social networks over the last decade has provided evidence that current rumor spreading models miss some fundamental ingredients in order to reproduce how information is disseminated. In particular, recent literature has revealed that these models fail to reproduce the fact that some nodes in a network have an influential role when it comes to spread a piece of information. In this work, we introduce two mechanisms with the aim of filling the gap between theoretical and experimental results. The first model introduces the assumption that spreaders are not always active whereas the second model considers the possibility that an ignorant is not interested in spreading the rumor. In both cases, results from numerical simulations show a higher adhesion to real data than classical rumor spreading models. Our results shed some light on the mechanisms underlying the spreading of information and ideas in large social systems and pave the way for more realistic diffusion models. △ Less

Submitted 6 September, 2012; originally announced September 2012.

Comments: 14 Pages, 6 figures, accepted for publication in Journal of Statistical Physics

Journal ref: J. Stat Phys 151, 383 (2013)

arXiv:1206.2858 [pdf, ps, other]

doi 10.1103/PhysRevLett.109.238701

Random walks and search in time-varying networks

Authors: Nicola Perra, Andrea Baronchelli, Delia Mocanu, Bruno Gonçalves, Romualdo Pastor-Satorras, Alessandro Vespignani

Abstract: The random walk process underlies the description of a large number of real world phenomena. Here we provide the study of random walk processes in time varying networks in the regime of time-scale mixing; i.e. when the network connectivity pattern and the random walk process dynamics are unfolding on the same time scale. We consider a model for time varying networks created from the activity poten… ▽ More The random walk process underlies the description of a large number of real world phenomena. Here we provide the study of random walk processes in time varying networks in the regime of time-scale mixing; i.e. when the network connectivity pattern and the random walk process dynamics are unfolding on the same time scale. We consider a model for time varying networks created from the activity potential of the nodes, and derive solutions of the asymptotic behavior of random walks and the mean first passage time in undirected and directed networks. Our findings show striking differences with respect to the well known results obtained in quenched and annealed networks, emphasizing the effects of dynamical connectivity patterns in the definition of proper strategies for search, retrieval and diffusion processes in time-varying networks △ Less

Submitted 17 January, 2013; v1 submitted 13 June, 2012; originally announced June 2012.

Journal ref: Physical Review Letters, 109, 238701 (2012)

arXiv:1205.4467 [pdf, other]

Beating the news using Social Media: the case study of American Idol

Authors: Fabio Ciulla, Delia Mocanu, Andrea Baronchelli, Bruno Gonçalves, Nicola Perra, Alessandro Vespignani

Abstract: We present a contribution to the debate on the predictability of social events using big data analytics. We focus on the elimination of contestants in the American Idol TV shows as an example of a well defined electoral phenomenon that each week draws millions of votes in the USA. We provide evidence that Twitter activity during the time span defined by the TV show airing and the voting period fol… ▽ More We present a contribution to the debate on the predictability of social events using big data analytics. We focus on the elimination of contestants in the American Idol TV shows as an example of a well defined electoral phenomenon that each week draws millions of votes in the USA. We provide evidence that Twitter activity during the time span defined by the TV show airing and the voting period following it, correlates with the contestants ranking and allows the anticipation of the voting outcome. Furthermore, the fraction of Tweets that contain geolocation information allows us to map the fanbase of each contestant, both within the US and abroad, showing that strong regional polarizations occur. Although American Idol voting is just a minimal and simplified version of complex societal phenomena such as political elections, this work shows that the volume of information available in online systems permits the real time gathering of quantitative indicators anticipating the future unfolding of opinion formation events. △ Less

Submitted 23 May, 2012; v1 submitted 20 May, 2012; originally announced May 2012.

Comments: 6 pages, 4 figures, 2 tables

arXiv:1205.1010 [pdf, other]

doi 10.1140/epjds6

Partisan Asymmetries in Online Political Activity

Authors: Michael D. Conover, Bruno Gonçalves, Alessandro Flammini, Filippo Menczer

Abstract: We examine partisan differences in the behavior, communication patterns and social interactions of more than 18,000 politically-active Twitter users to produce evidence that points to changing levels of partisan engagement with the American online political landscape. Analysis of a network defined by the communication activity of these users in proximity to the 2010 midterm congressional elections… ▽ More We examine partisan differences in the behavior, communication patterns and social interactions of more than 18,000 politically-active Twitter users to produce evidence that points to changing levels of partisan engagement with the American online political landscape. Analysis of a network defined by the communication activity of these users in proximity to the 2010 midterm congressional elections reveals a highly segregated, well clustered partisan community structure. Using cluster membership as a high-fidelity (87% accuracy) proxy for political affiliation, we characterize a wide range of differences in the behavior, communication and social connectivity of left- and right-leaning Twitter users. We find that in contrast to the online political dynamics of the 2008 campaign, right-leaning Twitter users exhibit greater levels of political activity, a more tightly interconnected social structure, and a communication network topology that facilitates the rapid and broad dissemination of political information. △ Less

Submitted 19 June, 2012; v1 submitted 4 May, 2012; originally announced May 2012.

Comments: 17 pages, 10 figures, 6 tables

Journal ref: EPJ Data Science 1, 6 (2012)

arXiv:1203.5351 [pdf, other]

doi 10.1038/srep00469

Activity driven modeling of time varying networks

Authors: Nicola Perra, Bruno Gonçalves, Romualdo Pastor-Satorras, Alessandro Vespignani

Abstract: Network modeling plays a critical role in identifying statistical regularities and structural principles common to many systems. The large majority of recent modeling approaches are connectivity driven. The structural patterns of the network are at the basis of the mechanisms ruling the network formation. Connectivity driven models necessarily provide a time-aggregated representation that may fail… ▽ More Network modeling plays a critical role in identifying statistical regularities and structural principles common to many systems. The large majority of recent modeling approaches are connectivity driven. The structural patterns of the network are at the basis of the mechanisms ruling the network formation. Connectivity driven models necessarily provide a time-aggregated representation that may fail to describe the instantaneous and fluctuating dynamics of many networks. We address this challenge by defining the activity potential, a time invariant function characterizing the agents' interactions and constructing an activity driven model capable of encoding the instantaneous time description of the network dynamics. The model provides an explanation of structural features such as the presence of hubs, which simply originate from the heterogeneous activity of agents. Within this framework, highly dynamical networks can be described analytically, allowing a quantitative discussion of the biases induced by the time-aggregated representations in the analysis of dynamical processes. △ Less

Submitted 26 June, 2012; v1 submitted 23 March, 2012; originally announced March 2012.

Comments: 10 pages, 4 figures

Journal ref: Nature Scientific Reports 2, 469 (2012)

arXiv:1111.1896 [pdf, other]

Dynamical Classes of Collective Attention in Twitter

Authors: Janette Lehmann, Bruno Gonçalves, José J. Ramasco, Ciro Cattuto

Abstract: Micro-blogging systems such as Twitter expose digital traces of social discourse with an unprecedented degree of resolution of individual behaviors. They offer an opportunity to investigate how a large-scale social system responds to exogenous or endogenous stimuli, and to disentangle the temporal, spatial and topical aspects of users' activity. Here we focus on spikes of collective attention in T… ▽ More Micro-blogging systems such as Twitter expose digital traces of social discourse with an unprecedented degree of resolution of individual behaviors. They offer an opportunity to investigate how a large-scale social system responds to exogenous or endogenous stimuli, and to disentangle the temporal, spatial and topical aspects of users' activity. Here we focus on spikes of collective attention in Twitter, and specifically on peaks in the popularity of hashtags. Users employ hashtags as a form of social annotation, to define a shared context for a specific event, topic, or meme. We analyze a large-scale record of Twitter activity and find that the evolution of hastag popularity over time defines discrete classes of hashtags. We link these dynamical classes to the events the hashtags represent and use text mining techniques to provide a semantic characterization of the hastag classes. Moreover, we track the propagation of hashtags in the Twitter social network and find that epidemic spreading plays a minor role in hastag popularity, which is mostly driven by exogenous factors. △ Less

Submitted 1 March, 2012; v1 submitted 8 November, 2011; originally announced November 2011.

Comments: 10 pages, 5 figures, 2 tables - To Appear in WWW'12

Journal ref: WWW'12, 251 (2012)

arXiv:1107.0997 [pdf, other]

doi 10.1371/journal.pone.0023084

Towards a characterization of behavior-disease models

Authors: Nicola Perra, Duygu Balcan, Bruno Gonçalves, Alessandro Vespignani

Abstract: The last decade saw the advent of increasingly realistic epidemic models that leverage on the availability of highly detailed census and human mobility data. Data-driven models aim at a granularity down to the level of households or single individuals. However, relatively little systematic work has been done to provide coupled behavior-disease models able to close the feedback loop between behavio… ▽ More The last decade saw the advent of increasingly realistic epidemic models that leverage on the availability of highly detailed census and human mobility data. Data-driven models aim at a granularity down to the level of households or single individuals. However, relatively little systematic work has been done to provide coupled behavior-disease models able to close the feedback loop between behavioral changes triggered in the population by an individual's perception of the disease spread and the actual disease spread itself. While models lacking this coupling can be extremely successful in mild epidemics, they obviously will be of limited use in situations where social disruption or behavioral alterations are induced in the population by knowledge of the disease. Here we propose a characterization of a set of prototypical mechanisms for self-initiated social distancing induced by local and non-local prevalence-based information available to individuals in the population. We characterize the effects of these mechanisms in the framework of a compartmental scheme that enlarges the basic SIR model by considering separate behavioral classes within the population. The transition of individuals in/out of behavioral classes is coupled with the spreading of the disease and provides a rich phase space with multiple epidemic peaks and tip** points. The class of models presented here can be used in the case of data-driven computational approaches to analyze scenarios of social adaptation and behavioral change. △ Less

Submitted 5 July, 2011; originally announced July 2011.

Comments: 24 pages, 15 figures

Journal ref: PLoS ONE 6(8): e23084 (2011)

arXiv:1105.5170 [pdf, other]

doi 10.1371/journal.pone.0022656

Validation of Dunbar's number in Twitter conversations

Authors: Bruno Goncalves, Nicola Perra, Alessandro Vespignani

Abstract: Modern society's increasing dependency on online tools for both work and recreation opens up unique opportunities for the study of social interactions. A large survey of online exchanges or conversations on Twitter, collected across six months involving 1.7 million individuals is presented here. We test the theoretical cognitive limit on the number of stable social relationships known as Dunbar's… ▽ More Modern society's increasing dependency on online tools for both work and recreation opens up unique opportunities for the study of social interactions. A large survey of online exchanges or conversations on Twitter, collected across six months involving 1.7 million individuals is presented here. We test the theoretical cognitive limit on the number of stable social relationships known as Dunbar's number. We find that users can entertain a maximum of 100-200 stable relationships in support for Dunbar's prediction. The "economy of attention" is limited in the online world by cognitive and biological constraints as predicted by Dunbar's theory. Inspired by this empirical evidence we propose a simple dynamical mechanism, based on finite priority queuing and time resources, that reproduces the observed social behavior. △ Less

Submitted 28 May, 2011; v1 submitted 25 May, 2011; originally announced May 2011.

Comments: 8 pages, 6 figures

Journal ref: PLoS ONE 6(8): e22656 (2011)

arXiv:1103.0784 [pdf, other]

doi 10.1162/artl_a_00034

Happiness is assortative in online social networks

Authors: Johan Bollen, Bruno Goncalves, Guangchen Ruan, Huina Mao

Abstract: Social networks tend to disproportionally favor connections between individuals with either similar or dissimilar characteristics. This propensity, referred to as assortative mixing or homophily, is expressed as the correlation between attribute values of nearest neighbour vertices in a graph. Recent results indicate that beyond demographic features such as age, sex and race, even psychological st… ▽ More Social networks tend to disproportionally favor connections between individuals with either similar or dissimilar characteristics. This propensity, referred to as assortative mixing or homophily, is expressed as the correlation between attribute values of nearest neighbour vertices in a graph. Recent results indicate that beyond demographic features such as age, sex and race, even psychological states such as "loneliness" can be assortative in a social network. In spite of the increasing societal importance of online social networks it is unknown whether assortative mixing of psychological states takes place in situations where social ties are mediated solely by online networking services in the absence of physical contact. Here, we show that general happiness or Subjective Well-Being (SWB) of Twitter users, as measured from a 6 month record of their individual tweets, is indeed assortative across the Twitter social network. To our knowledge this is the first result that shows assortative mixing in online networks at the level of SWB. Our results imply that online social networks may be equally subject to the social mechanisms that cause assortative mixing in real social networks and that such assortative mixing takes place at the level of SWB. Given the increasing prevalence of online social networks, their propensity to connect users with similar levels of SWB may be an important instrument in better understanding how both positive and negative sentiments spread through online social ties. Future research may focus on how event-specific mood states can propagate and influence user behavior in "real life". △ Less

Submitted 3 March, 2011; originally announced March 2011.

Comments: 17 pages, 9 figures

Journal ref: Artificial Life 17(3), 237-251 (2011)

arXiv:1003.5327 [pdf, other]

doi 10.1145/1810617.1810658

Agents, Bookmarks and Clicks: A topical model of Web traffic

Authors: Mark Meiss, Bruno Gonçalves, José J. Ramasco, Alessandro Flammini, Filippo Menczer

Abstract: Analysis of aggregate and individual Web traffic has shown that PageRank is a poor model of how people navigate the Web. Using the empirical traffic patterns generated by a thousand users, we characterize several properties of Web traffic that cannot be reproduced by Markovian models. We examine both aggregate statistics capturing collective behavior, such as page and link traffic, and individual… ▽ More Analysis of aggregate and individual Web traffic has shown that PageRank is a poor model of how people navigate the Web. Using the empirical traffic patterns generated by a thousand users, we characterize several properties of Web traffic that cannot be reproduced by Markovian models. We examine both aggregate statistics capturing collective behavior, such as page and link traffic, and individual statistics, such as entropy and session size. No model currently explains all of these empirical observations simultaneously. We show that all of these traffic patterns can be explained by an agent-based model that takes into account several realistic browsing behaviors. First, agents maintain individual lists of bookmarks (a non-Markovian memory mechanism) that are used as teleportation targets. Second, agents can retreat along visited links, a branching mechanism that also allows us to reproduce behaviors such as the use of a back button and tabbed browsing. Finally, agents are sustained by visiting novel pages of topical interest, with adjacent pages being more topically related to each other than distant ones. This modulates the probability that an agent continues to browse or starts a new session, allowing us to recreate heterogeneous session lengths. The resulting model is capable of reproducing the collective and individual behaviors we observe in the empirical data, reconciling the narrowly focused browsing patterns of individual users with the extreme heterogeneity of aggregate traffic measurements. This result allows us to identify a few salient features that are necessary and sufficient to interpret the browsing patterns observed in our data. In addition to the descriptive and explanatory power of such a model, our results may lead the way to more sophisticated, realistic, and effective ranking and crawling algorithms. △ Less

Submitted 27 March, 2010; originally announced March 2010.

Comments: 10 pages, 16 figures, 1 table - Long version of paper to appear in Proceedings of the 21th ACM conference on Hypertext and Hypermedia

Journal ref: Proceedings of the 21th ACM conference on Hypertext and hypermedia, 229 (2010)

arXiv:1003.5325 [pdf, other]

doi 10.1145/1557914.1557946

What's in a Session: Tracking Individual Behavior on the Web

Authors: Mark Meiss, John Duncan, Bruno Gonçalves, José J. Ramasco, Filippo Menczer

Abstract: We examine the properties of all HTTP requests generated by a thousand undergraduates over a span of two months. Preserving user identity in the data set allows us to discover novel properties of Web traffic that directly affect models of hypertext navigation. We find that the popularity of Web sites -- the number of users who contribute to their traffic -- lacks any intrinsic mean and may be… ▽ More We examine the properties of all HTTP requests generated by a thousand undergraduates over a span of two months. Preserving user identity in the data set allows us to discover novel properties of Web traffic that directly affect models of hypertext navigation. We find that the popularity of Web sites -- the number of users who contribute to their traffic -- lacks any intrinsic mean and may be unbounded. Further, many aspects of the browsing behavior of individual users can be approximated by log-normal distributions even though their aggregate behavior is scale-free. Finally, we show that users' click streams cannot be cleanly segmented into sessions using timeouts, affecting any attempt to model hypertext navigation using statistics of individual sessions. We propose a strictly logical definition of sessions based on browsing activity as revealed by referrer URLs; a user may have several active sessions in their click stream at any one time. We demonstrate that applying a timeout to these logical sessions affects their statistics to a lesser extent than a purely timeout-based mechanism. △ Less

Submitted 27 March, 2010; originally announced March 2010.

Comments: 10 pages, 13 figures, 1 table

Journal ref: Proceedings of the 20th ACM conference on Hypertext and hypermedia, 173-182 (2009)

arXiv:1002.0876 [pdf]

doi 10.3134/ehtj.09.011

Modeling vaccination campaigns and the Fall/Winter 2009 activity of the new A(H1N1) influenza in the Northern Hemisphere

Authors: Paolo Bajardi, Chiara Poletto, Duygu Balcan, Hao Hu, Bruno Goncalves, Jose J. Ramasco, Daniela Paolotti, Nicola Perra, Michele Tizzoni, Wouter Van den Broeck, Vittoria Colizza, Alessandro Vespignani

Abstract: The unfolding of pandemic influenza A(H1N1) for Fall 2009 in the Northern Hemisphere is still uncertain. Plans for vaccination campaigns and vaccine trials are underway, with the first batches expected to be available early October. Several studies point to the possibility of an anticipated pandemic peak that could undermine the effectiveness of vaccination strategies. Here we use a structured g… ▽ More The unfolding of pandemic influenza A(H1N1) for Fall 2009 in the Northern Hemisphere is still uncertain. Plans for vaccination campaigns and vaccine trials are underway, with the first batches expected to be available early October. Several studies point to the possibility of an anticipated pandemic peak that could undermine the effectiveness of vaccination strategies. Here we use a structured global epidemic and mobility metapopulation model to assess the effectiveness of massive vaccination campaigns for the Fall/Winter 2009. Mitigation effects are explored depending on the interplay between the predicted pandemic evolution and the expected delivery of vaccines. The model is calibrated using recent estimates on the transmissibility of the new A(H1N1) influenza. Results show that if additional intervention strategies were not used to delay the time of pandemic peak, vaccination may not be able to considerably reduce the cumulative number of cases, even when the mass vaccination campaign is started as early as mid-October. Prioritized vaccination would be crucial in slowing down the pandemic evolution and reducing its burden. △ Less

Submitted 3 February, 2010; originally announced February 2010.

Comments: Paper: 19 Pages, 3 Figures. Supplementary Information: 10 pages, 8 Tables

Journal ref: Emerging Health Threats 2, E11 (2009)

arXiv:0909.2417 [pdf]

doi 10.1186/1741-7015-7-45

Seasonal transmission potential and activity peaks of the new influenza A(H1N1): a Monte Carlo likelihood analysis based on human mobility

Authors: Duygu Balcan, Hao Hu, Bruno Goncalves, Paolo Bajardi, Chiara Poletto, Jose J Ramasco, Daniela Paolotti, Nicola Perra, Michele Tizzoni, Wouter Van den Broeck, Vittoria Colizza, Alessandro Vespignani

Abstract: On 11 June the World Health Organization officially raised the phase of pandemic alert (with regard to the new H1N1 influenza strain) to level 6. We use a global structured metapopulation model integrating mobility and transportation data worldwide in order to estimate the transmission potential and the relevant model parameters we used the data on the chronology of the 2009 novel influenza A(H1… ▽ More On 11 June the World Health Organization officially raised the phase of pandemic alert (with regard to the new H1N1 influenza strain) to level 6. We use a global structured metapopulation model integrating mobility and transportation data worldwide in order to estimate the transmission potential and the relevant model parameters we used the data on the chronology of the 2009 novel influenza A(H1N1). The method is based on the maximum likelihood analysis of the arrival time distribution generated by the model in 12 countries seeded by Mexico by using 1M computationally simulated epidemics. An extended chronology including 93 countries worldwide seeded before 18 June was used to ascertain the seasonality effects. We found the best estimate R0 = 1.75 (95% CI 1.64 to 1.88) for the basic reproductive number. Correlation analysis allows the selection of the most probable seasonal behavior based on the observed pattern, leading to the identification of plausible scenarios for the future unfolding of the pandemic and the estimate of pandemic activity peaks in the different hemispheres. We provide estimates for the number of hospitalizations and the attack rate for the next wave as well as an extensive sensitivity analysis on the disease parameter values. We also studied the effect of systematic therapeutic use of antiviral drugs on the epidemic timeline. The analysis shows the potential for an early epidemic peak occurring in October/November in the Northern hemisphere, likely before large-scale vaccination campaigns could be carried out. We suggest that the planning of additional mitigation policies such as systematic antiviral treatments might be the key to delay the activity peak inorder to restore the effectiveness of the vaccination programs. △ Less

Submitted 14 September, 2009; originally announced September 2009.

Comments: Paper: 29 Pages, 3 Figures and 5 Tables. Supplementary Information: 29 Pages, 5 Figures and 7 Tables. Print version: http://www.biomedcentral.com/1741-7015/7/45

Journal ref: BMC Medicine 2009, 7:45

arXiv:0907.3304 [pdf, other]

doi 10.1073/pnas.0906910106

Multiscale mobility networks and the large scale spreading of infectious diseases

Authors: Duygu Balcan, Vittoria Colizza, Bruno Goncalves, Hao Hu, Jose J. Ramasco, Alessandro Vespignani

Abstract: Among the realistic ingredients to be considered in the computational modeling of infectious diseases, human mobility represents a crucial challenge both on the theoretical side and in view of the limited availability of empirical data. In order to study the interplay between small-scale commuting flows and long-range airline traffic in sha** the spatio-temporal pattern of a global epidemic we… ▽ More Among the realistic ingredients to be considered in the computational modeling of infectious diseases, human mobility represents a crucial challenge both on the theoretical side and in view of the limited availability of empirical data. In order to study the interplay between small-scale commuting flows and long-range airline traffic in sha** the spatio-temporal pattern of a global epidemic we i) analyze mobility data from 29 countries around the world and find a gravity model able to provide a global description of commuting patterns up to 300 kms; ii) integrate in a worldwide structured metapopulation epidemic model a time-scale separation technique for evaluating the force of infection due to multiscale mobility processes in the disease dynamics. Commuting flows are found, on average, to be one order of magnitude larger than airline flows. However, their introduction into the worldwide model shows that the large scale pattern of the simulated epidemic exhibits only small variations with respect to the baseline case where only airline traffic is considered. The presence of short range mobility increases however the synchronization of subpopulations in close proximity and affects the epidemic behavior at the periphery of the airline transportation infrastructure. The present approach outlines the possibility for the definition of layered computational approaches where different modeling assumptions and granularities can be used consistently in a unifying multi-scale framework. △ Less

Submitted 20 July, 2009; originally announced July 2009.

Comments: 10 pages, 4 figures, 1 table

Journal ref: PNAS 106, 21484 (2009)

arXiv:0901.3839 [pdf, other]

Remembering what we like: Toward an agent-based model of Web traffic

Authors: Bruno Goncalves, Mark R. Meiss, Jose J. Ramasco, Alessandro Flammini, Filippo Menczer

Abstract: Analysis of aggregate Web traffic has shown that PageRank is a poor model of how people actually navigate the Web. Using the empirical traffic patterns generated by a thousand users over the course of two months, we characterize the properties of Web traffic that cannot be reproduced by Markovian models, in which destinations are independent of past decisions. In particular, we show that the div… ▽ More Analysis of aggregate Web traffic has shown that PageRank is a poor model of how people actually navigate the Web. Using the empirical traffic patterns generated by a thousand users over the course of two months, we characterize the properties of Web traffic that cannot be reproduced by Markovian models, in which destinations are independent of past decisions. In particular, we show that the diversity of sites visited by individual users is smaller and more broadly distributed than predicted by the PageRank model; that link traffic is more broadly distributed than predicted; and that the time between consecutive visits to the same site by a user is less broadly distributed than predicted. To account for these discrepancies, we introduce a more realistic navigation model in which agents maintain individual lists of bookmarks that are used as teleportation targets. The model can also account for branching, a traffic property caused by browser features such as tabs and the back button. The model reproduces aggregate traffic patterns such as site popularity, while also generating more accurate predictions of diversity, link traffic, and return time distributions. This model for the first time allows us to capture the extreme heterogeneity of aggregate traffic measurements while explaining the more narrowly focused browsing patterns of individual users. △ Less

Submitted 24 January, 2009; originally announced January 2009.

Comments: 4 pages, 4 figures. Accepted in WSDM 2009 Late Breaking Results

Journal ref: WSDM 2009 Late Breaking Results

arXiv:0901.0498 [pdf, other]

doi 10.1007/978-3-642-02469-6_102

Towards the characterization of individual users through Web analytics

Authors: Bruno Goncalves, Jose J. Ramasco

Abstract: We perform an analysis of the way individual users navigate in the Web. We focus primarily in the temporal patterns of they return to a given page. The return probability as a function of time as well as the distribution of time intervals between consecutive visits are measured and found to be independent of the level of activity of single users. The results indicate a rich variety of individual… ▽ More We perform an analysis of the way individual users navigate in the Web. We focus primarily in the temporal patterns of they return to a given page. The return probability as a function of time as well as the distribution of time intervals between consecutive visits are measured and found to be independent of the level of activity of single users. The results indicate a rich variety of individual behaviors and seem to preclude the possibility of defining a characteristic frequency for each user in his/her visits to a single site. △ Less

Submitted 5 January, 2009; originally announced January 2009.

Comments: 8 pages, 4 figures. To appear in Proceeding of Complex'09

Journal ref: Complex Sciences, 2247-2254 (2009)

arXiv:0803.4018 [pdf, other]

doi 10.1103/PhysRevE.78.026123

Human dynamics revealed through Web analytics

Authors: Bruno Goncalves, Jose J. Ramasco

Abstract: When the World Wide Web was first conceived as a way to facilitate the sharing of scientific information at the CERN (European Center for Nuclear Research) few could have imagined the role it would come to play in the following decades. Since then, the increasing ubiquity of Internet access and the frequency with which people interact with it raise the possibility of using the Web to better obse… ▽ More When the World Wide Web was first conceived as a way to facilitate the sharing of scientific information at the CERN (European Center for Nuclear Research) few could have imagined the role it would come to play in the following decades. Since then, the increasing ubiquity of Internet access and the frequency with which people interact with it raise the possibility of using the Web to better observe, understand, and monitor several aspects of human social behavior. Web sites with large numbers of frequently returning users are ideal for this task. If these sites belong to companies or universities, their usage patterns can furnish information about the working habits of entire populations. In this work, we analyze the properly anonymized logs detailing the access history to Emory University's Web site. Emory is a medium size university located in Atlanta, Georgia. We find interesting structure in the activity patterns of the domain and study in a systematic way the main forces behind the dynamics of the traffic. In particular, we show that both linear preferential linking and priority based queuing are essential ingredients to understand the way users navigate the Web. △ Less

Submitted 21 May, 2008; v1 submitted 27 March, 2008; originally announced March 2008.

Comments: 7 pages, 8 figures

Journal ref: Physical Review E 78, 026123 (2008)

arXiv:cond-mat/0609776 [pdf, ps, other]

doi 10.1103/PhysRevE.76.066106

Transport on weighted Networks: when correlations are independent of degree

Authors: Jose J. Ramasco, Bruno Goncalves

Abstract: Most real-world networks are weighted graphs with the weight of the edges reflecting the relative importance of the connections. In this work, we study non degree dependent correlations between edge weights, generalizing thus the correlations beyond the degree dependent case. We propose a simple method to introduce weight-weight correlations in topologically uncorrelated graphs. This allows us t… ▽ More Most real-world networks are weighted graphs with the weight of the edges reflecting the relative importance of the connections. In this work, we study non degree dependent correlations between edge weights, generalizing thus the correlations beyond the degree dependent case. We propose a simple method to introduce weight-weight correlations in topologically uncorrelated graphs. This allows us to test different measures to discriminate between the different correlation types and to quantify their intensity. We also discuss here the effect of weight correlations on the transport properties of the networks, showing that positive correlations dramatically improve transport. Finally, we give two examples of real-world networks (social and transport graphs) in which weight-weight correlations are present. △ Less

Submitted 5 October, 2007; v1 submitted 29 September, 2006; originally announced September 2006.

Comments: 8 pages, 8 figures

Journal ref: Phys. Rev. E 76, 066106 (2007).

arXiv:cond-mat/0501361 [pdf, ps, other]

doi 10.1016/j.physa.2005.06.049

Fractal Power Law in Literary English

Authors: L. L. Goncalves, L. B. Goncalves

Abstract: We present in this paper a numerical investigation of literary texts by various well-known English writers, covering the first half of the twentieth century, based upon the results obtained through corpus analysis of the texts. A fractal power law is obtained for the lexical wealth defined as the ratio between the number of different words and the total number of words of a given text. By consid… ▽ More We present in this paper a numerical investigation of literary texts by various well-known English writers, covering the first half of the twentieth century, based upon the results obtained through corpus analysis of the texts. A fractal power law is obtained for the lexical wealth defined as the ratio between the number of different words and the total number of words of a given text. By considering as a signature of each author the exponent and the amplitude of the power law, and the standard deviation of the lexical wealth, it is possible to discriminate works of different genres and writers and show that each writer has a very distinct signature, either considered among other literary writers or compared with writers of non-literary texts. It is also shown that, for a given author, the signature is able to discriminate between short stories and novels. △ Less

Submitted 3 June, 2005; v1 submitted 15 January, 2005; originally announced January 2005.

Comments: 27 pages, 10 tables,15 figures. Revised version accepted in Physica A

Showing 1–46 of 46 results for author: Gonçalves, B