-
Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma?
Authors:
Nicoló Fontana,
Francesco Pierri,
Luca Maria Aiello
Abstract:
The behavior of Large Language Models (LLMs) as artificial social agents is largely unexplored, and we still lack extensive evidence of how these agents react to simple social stimuli. Testing the behavior of AI agents in classic Game Theory experiments provides a promising theoretical framework for evaluating the norms and values of these agents in archetypal social situations. In this work, we i…
▽ More
The behavior of Large Language Models (LLMs) as artificial social agents is largely unexplored, and we still lack extensive evidence of how these agents react to simple social stimuli. Testing the behavior of AI agents in classic Game Theory experiments provides a promising theoretical framework for evaluating the norms and values of these agents in archetypal social situations. In this work, we investigate the cooperative behavior of Llama2 when playing the Iterated Prisoner's Dilemma against random adversaries displaying various levels of hostility. We introduce a systematic methodology to evaluate an LLM's comprehension of the game's rules and its capability to parse historical gameplay logs for decision-making. We conducted simulations of games lasting for 100 rounds, and analyzed the LLM's decisions in terms of dimensions defined in behavioral economics literature. We find that Llama2 tends not to initiate defection but it adopts a cautious approach towards cooperation, sharply shifting towards a behavior that is both forgiving and non-retaliatory only when the opponent reduces its rate of defection below 30%. In comparison to prior research on human participants, Llama2 exhibits a greater inclination towards cooperative behavior. Our systematic approach to the study of LLMs in game theoretical scenarios is a step towards using these simulations to inform practices of LLM auditing and alignment.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Urban highways are barriers to social ties
Authors:
Luca Maria Aiello,
Anastassia Vybornova,
Sándor Juhász,
Michael Szell,
Eszter Bokányi
Abstract:
Urban highways are common, especially in the US, making cities more car-centric. They promise the annihilation of distance but obstruct pedestrian mobility, thus playing a key role in limiting social interactions locally. Although this limiting role is widely acknowledged in urban studies, the quantitative relationship between urban highways and social ties is barely tested. Here we define a Barri…
▽ More
Urban highways are common, especially in the US, making cities more car-centric. They promise the annihilation of distance but obstruct pedestrian mobility, thus playing a key role in limiting social interactions locally. Although this limiting role is widely acknowledged in urban studies, the quantitative relationship between urban highways and social ties is barely tested. Here we define a Barrier Score that relates massive, geolocated online social network data to highways in the 50 largest US cities. At the unprecedented granularity of individual social ties, we show that urban highways are associated with decreased social connectivity. This barrier effect is especially strong for short distances and consistent with historical cases of highways that were built to purposefully disrupt or isolate Black neighborhoods. By combining spatial infrastructure with social tie data, our method adds a new dimension to demographic studies of social segregation. Our study can inform reparative planning for an evidence-based reduction of spatial inequality, and more generally, support a better integration of the social fabric in urban planning.
△ Less
Submitted 18 April, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
Towards Human Awareness in Robot Task Planning with Large Language Models
Authors:
Yuchen Liu,
Luigi Palmieri,
Sebastian Koch,
Ilche Georgievski,
Marco Aiello
Abstract:
The recent breakthroughs in the research on Large Language Models (LLMs) have triggered a transformation across several research domains. Notably, the integration of LLMs has greatly enhanced performance in robot Task And Motion Planning (TAMP). However, previous approaches often neglect the consideration of dynamic environments, i.e., the presence of dynamic objects such as humans. In this paper,…
▽ More
The recent breakthroughs in the research on Large Language Models (LLMs) have triggered a transformation across several research domains. Notably, the integration of LLMs has greatly enhanced performance in robot Task And Motion Planning (TAMP). However, previous approaches often neglect the consideration of dynamic environments, i.e., the presence of dynamic objects such as humans. In this paper, we propose a novel approach to address this gap by incorporating human awareness into LLM-based robot task planning. To obtain an effective representation of the dynamic environment, our approach integrates humans' information into a hierarchical scene graph. To ensure the plan's executability, we leverage LLMs to ground the environmental topology and actionable knowledge into formal planning language. Most importantly, we use LLMs to predict future human activities and plan tasks for the robot considering the predictions. Our contribution facilitates the development of integrating human awareness into LLM-driven robot task planning, and paves the way for proactive robot decision-making in dynamic environments.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
DELTA: Decomposed Efficient Long-Term Robot Task Planning using Large Language Models
Authors:
Yuchen Liu,
Luigi Palmieri,
Sebastian Koch,
Ilche Georgievski,
Marco Aiello
Abstract:
Recent advancements in Large Language Models (LLMs) have sparked a revolution across various research fields. In particular, the integration of common-sense knowledge from LLMs into robot task and motion planning has been proven to be a game-changer, elevating performance in terms of explainability and downstream task efficiency to unprecedented heights. However, managing the vast knowledge encaps…
▽ More
Recent advancements in Large Language Models (LLMs) have sparked a revolution across various research fields. In particular, the integration of common-sense knowledge from LLMs into robot task and motion planning has been proven to be a game-changer, elevating performance in terms of explainability and downstream task efficiency to unprecedented heights. However, managing the vast knowledge encapsulated within these large models has posed challenges, often resulting in infeasible plans generated by LLM-based planning systems due to hallucinations or missing domain information. To overcome these challenges and obtain even greater planning feasibility and computational efficiency, we propose a novel LLM-driven task planning approach called DELTA. For achieving better grounding from environmental topology into actionable knowledge, DELTA leverages the power of scene graphs as environment representations within LLMs, enabling the fast generation of precise planning problem descriptions. For obtaining higher planning performance, we use LLMs to decompose the long-term task goals into an autoregressive sequence of sub-goals for an automated task planner to solve. Our contribution enables a more efficient and fully automatic task planning pipeline, achieving higher planning success rates and significantly shorter planning times compared to the state of the art.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
The causal role of the Reddit collective action on the GameStop short squeeze
Authors:
Antonio Desiderio,
Luca Maria Aiello,
Giulio Cimini,
Laura Alessandretti
Abstract:
In early 2021, the stock prices of GameStop, AMC, Nokia, and BlackBerry experienced dramatic increases, triggered by short squeeze operations that have been largely attributed to Reddit's retail investors. These events showcased, for the first time, the potential of online social networks to catalyze financial collective action. How, when and to what extent Reddit users played a causal role in dri…
▽ More
In early 2021, the stock prices of GameStop, AMC, Nokia, and BlackBerry experienced dramatic increases, triggered by short squeeze operations that have been largely attributed to Reddit's retail investors. These events showcased, for the first time, the potential of online social networks to catalyze financial collective action. How, when and to what extent Reddit users played a causal role in driving up these prices, however, remains unclear. To address these questions, we employ causal inference techniques, leveraging data capturing activity on Reddit and Twitter, and trading volume with a high temporal resolution. We find that Reddit discussions foreshadowed trading volume before the GameStop short squeeze, with their predictive power being particularly strong on hourly time scales. This effect emerged abruptly and became prominent a few weeks before the event, but waned once the community of investors gained widespread visibility through Twitter. As the causal link unfolded, the collective investment of the Reddit community, quantified through each user's financial position on GameStop, closely mirrored the market capitalization of the stock. The evidence from our study suggests that Reddit users fueled the GameStop short squeeze, and thereby Reddit served as a coordination hub for a shared financial strategy. Towards the end of January, users talking about GameStop contributed to raise the popularity of BlackBerry, AMC and Nokia, which emerged as the most popular stocks as the community gained global recognition. Overall, our results shed light on the dynamics behind the first large-scale financial collective action driven by social media users.
△ Less
Submitted 5 February, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Narratives of Collective Action in YouTube's Discourse on Veganism
Authors:
Arianna Pera,
Luca Maria Aiello
Abstract:
Narratives can be powerful tools for inspiring action on pressing societal issues such as climate change. While social science theories offer frameworks for understanding the narratives that arise within collective movements, these are rarely applied to the vast data available from social media platforms, which play a significant role in sha** public opinion and mobilizing collective action. Thi…
▽ More
Narratives can be powerful tools for inspiring action on pressing societal issues such as climate change. While social science theories offer frameworks for understanding the narratives that arise within collective movements, these are rarely applied to the vast data available from social media platforms, which play a significant role in sha** public opinion and mobilizing collective action. This gap in the empirical evaluation of online narratives limits our understanding of their relationship with public response. In this study, we focus on plant-based diets as a form of pro-environmental action and employ natural language processing to operationalize a theoretical framework of moral narratives specific to the vegan movement. We apply this framework to narratives found in YouTube videos promoting environmental initiatives such as Veganuary, Meatless March, and No Meat May. Our analysis reveals that several narrative types, as defined by the theory, are empirically present in the data. To identify narratives with the potential to elicit positive public engagement, we used text processing to estimate the proportion of comments supporting collective action across narrative types. Video narratives advocating social fight, whether through protest or through efforts to convert others to the cause, are associated with a stronger sense of collective action in the respective comments. These narrative types also demonstrate increased semantic coherence and alignment between the message and public response, markers typically associated with successful collective action. Our work offers new insights into the complex factors that influence the emergence of collective action, thereby informing the development of effective communication strategies within social movements.
△ Less
Submitted 28 March, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
The Persuasive Power of Large Language Models
Authors:
Simon Martin Breum,
Daniel Vædele Egdal,
Victor Gram Mortensen,
Anders Giovanni Møller,
Luca Maria Aiello
Abstract:
The increasing capability of Large Language Models to act as human-like social agents raises two important questions in the area of opinion dynamics. First, whether these agents can generate effective arguments that could be injected into the online discourse to steer the public opinion. Second, whether artificial agents can interact with each other to reproduce dynamics of persuasion typical of h…
▽ More
The increasing capability of Large Language Models to act as human-like social agents raises two important questions in the area of opinion dynamics. First, whether these agents can generate effective arguments that could be injected into the online discourse to steer the public opinion. Second, whether artificial agents can interact with each other to reproduce dynamics of persuasion typical of human social systems, opening up opportunities for studying synthetic social systems as faithful proxies for opinion dynamics in human populations. To address these questions, we designed a synthetic persuasion dialogue scenario on the topic of climate change, where a 'convincer' agent generates a persuasive argument for a 'skeptic' agent, who subsequently assesses whether the argument changed its internal opinion state. Different types of arguments were generated to incorporate different linguistic dimensions underpinning psycho-linguistic theories of opinion change. We then asked human judges to evaluate the persuasiveness of machine-generated arguments. Arguments that included factual knowledge, markers of trust, expressions of support, and conveyed status were deemed most effective according to both humans and agents, with humans reporting a marked preference for knowledge-based arguments. Our experimental framework lays the groundwork for future in-silico studies of opinion dynamics, and our findings suggest that artificial agents have the potential of playing an important role in collective processes of opinion formation in online social media.
△ Less
Submitted 24 December, 2023;
originally announced December 2023.
-
Shifting Climates: Climate Change Communication from YouTube to TikTok
Authors:
Arianna Pera,
Luca Maria Aiello
Abstract:
Public discourse on critical issues such as climate change is progressively shifting to social media platforms that prioritize short-form video content. Content creators acting on those platforms play a pivotal role in sha** the discourse, yet the dynamics of communication and audience reactions across platforms remain underexplored. To improve our understanding of this transition, we studied th…
▽ More
Public discourse on critical issues such as climate change is progressively shifting to social media platforms that prioritize short-form video content. Content creators acting on those platforms play a pivotal role in sha** the discourse, yet the dynamics of communication and audience reactions across platforms remain underexplored. To improve our understanding of this transition, we studied the video content produced by 21 prominent YouTube creators who have expanded their influence to TikTok as information disseminators. Using dictionary-based tools and BERT-based embeddings, we analyzed the transcripts of nearly 7k climate-related videos across both platforms and the 574k comments they received. We found that, when publishing on TikTok, creators use a more emotionally resonant, self-referential, and action-oriented language compared to YouTube. We also observed a strong semantic alignment between videos and comments, with creators who excel at diversifying their TikTok content from YouTube typically receiving responses that more closely align with their produced content. This suggests that tailored communication strategies hold greater promise in directing public discussion toward desired topics, which bears implications for the design of effective climate communication campaigns.
△ Less
Submitted 20 February, 2024; v1 submitted 8 December, 2023;
originally announced December 2023.
-
The role of interface design on prompt-mediated creativity in Generative AI
Authors:
Maddalena Torricelli,
Mauro Martino,
Andrea Baronchelli,
Luca Maria Aiello
Abstract:
Generative AI for the creation of images is becoming a staple in the toolkit of digital artists and visual designers. The interaction with these systems is mediated by \emph{prompting}, a process in which users write a short text to describe the desired image's content and style. The study of prompts offers an unprecedented opportunity to gain insight into the process of human creativity. Yet, our…
▽ More
Generative AI for the creation of images is becoming a staple in the toolkit of digital artists and visual designers. The interaction with these systems is mediated by \emph{prompting}, a process in which users write a short text to describe the desired image's content and style. The study of prompts offers an unprecedented opportunity to gain insight into the process of human creativity. Yet, our understanding of how people use them remains limited. We analyze more than 145,000 prompts from the logs of two Generative AI platforms (Stable Diffusion and Pick-a-Pic) to shed light on how people \emph{explore} new concepts over time, and how their exploration might be influenced by different design choices in human-computer interfaces to Generative AI. We find that users exhibit a tendency towards exploration of new topics over exploitation of concepts visited previously. However, a comparative analysis of the two platforms, which differ both in scope and functionalities, reveals some stark differences. Features diverting user focus from prompting and providing instead shortcuts for quickly generating image variants are associated with a considerable reduction in both exploration of novel concepts and detail in the submitted prompts. These results carry direct implications for the design of human interfaces to Generative AI and raise new questions regarding how the process of prompting should be aided in ways that best support creativity.
△ Less
Submitted 17 February, 2024; v1 submitted 30 November, 2023;
originally announced December 2023.
-
Measuring Behavior Change with Observational Studies: a Review
Authors:
Arianna Pera,
Gianmarco de Francisci Morales,
Luca Maria Aiello
Abstract:
Exploring behavioral change in the digital age is imperative for societal progress in the context of 21st-century challenges. We analyzed 148 articles (2000-2023) and built a map that categorizes behaviors and change detection methodologies, platforms of reference, and theoretical frameworks that characterize online behavior change. Our findings uncover a focus on sentiment shifts, an emphasis on…
▽ More
Exploring behavioral change in the digital age is imperative for societal progress in the context of 21st-century challenges. We analyzed 148 articles (2000-2023) and built a map that categorizes behaviors and change detection methodologies, platforms of reference, and theoretical frameworks that characterize online behavior change. Our findings uncover a focus on sentiment shifts, an emphasis on API-restricted platforms, and limited theory integration. We call for methodologies able to capture a wider range of behavioral types, diverse data sources, and stronger theory-practice alignment in the study of online behavioral change.
△ Less
Submitted 2 November, 2023; v1 submitted 30 October, 2023;
originally announced October 2023.
-
Model Predictive Control (MPC) of an Artificial Pancreas with Data-Driven Learning of Multi-Step-Ahead Blood Glucose Predictors
Authors:
Eleonora Maria Aiello,
Mehrad Jaloli,
Marzia Cescon
Abstract:
We present the design and \textit{in-silico} evaluation of a closed-loop insulin delivery algorithm to treat type 1 diabetes (T1D) consisting in a data-driven multi-step-ahead blood glucose (BG) predictor integrated into a Linear Time-Varying (LTV) Model Predictive Control (MPC) framework. Instead of identifying an open-loop model of the glucoregulatory system from available data, we propose to di…
▽ More
We present the design and \textit{in-silico} evaluation of a closed-loop insulin delivery algorithm to treat type 1 diabetes (T1D) consisting in a data-driven multi-step-ahead blood glucose (BG) predictor integrated into a Linear Time-Varying (LTV) Model Predictive Control (MPC) framework. Instead of identifying an open-loop model of the glucoregulatory system from available data, we propose to directly fit the entire BG prediction over a predefined prediction horizon to be used in the MPC, as a nonlinear function of past input-ouput data and an affine function of future insulin control inputs. For the nonlinear part, a Long Short-Term Memory (LSTM) network is proposed, while for the affine component a linear regression model is chosen. To assess benefits and drawbacks when compared to a traditional linear MPC based on an auto-regressive with exogenous (ARX) input model identified from data, we evaluated the proposed LSTM-MPC controller in three simulation scenarios: a nominal case with 3 meals per day, a random meal disturbances case where meals were generated with a recently published meal generator, and a case with 25$\%$ decrease in the insulin sensitivity. Further, in all the scenarios, no feedforward meal bolus was administered. For the more challenging random meal generation scenario, the mean $\pm$ standard deviation percent time in the range 70-180 [mg/dL] was 74.99 $\pm$ 7.09 vs. 54.15 $\pm$ 14.89, the mean $\pm$ standard deviation percent time in the tighter range 70-140 [mg/dL] was 47.78$\pm$8.55 vs. 34.62 $\pm$9.04, while the mean $\pm$ standard deviation percent time in sever hypoglycemia, i.e., $<$ 54 [mg/dl] was 1.00$\pm$3.18 vs. 9.45$\pm$11.71, for our proposed LSTM-MPC controller and the traditional ARX-MPC, respectively. Our approach provided accurate predictions of future glucose concentrations and good closed-loop performances of the overall MPC controller.
△ Less
Submitted 22 July, 2023;
originally announced July 2023.
-
Dream Content Discovery from Reddit with an Unsupervised Mixed-Method Approach
Authors:
Anubhab Das,
Sanja Šćepanović,
Luca Maria Aiello,
Remington Mallett,
Deirdre Barrett,
Daniele Quercia
Abstract:
Dreaming is a fundamental but not fully understood part of human experience that can shed light on our thought patterns. Traditional dream analysis practices, while popular and aided by over 130 unique scales and rating systems, have limitations. Mostly based on retrospective surveys or lab studies, they struggle to be applied on a large scale or to show the importance and connections between diff…
▽ More
Dreaming is a fundamental but not fully understood part of human experience that can shed light on our thought patterns. Traditional dream analysis practices, while popular and aided by over 130 unique scales and rating systems, have limitations. Mostly based on retrospective surveys or lab studies, they struggle to be applied on a large scale or to show the importance and connections between different dream themes. To overcome these issues, we developed a new, data-driven mixed-method approach for identifying topics in free-form dream reports through natural language processing. We tested this method on 44,213 dream reports from Reddit's r/Dreams subreddit, where we found 217 topics, grouped into 22 larger themes: the most extensive collection of dream topics to date. We validated our topics by comparing it to the widely-used Hall and van de Castle scale. Going beyond traditional scales, our method can find unique patterns in different dream types (like nightmares or recurring dreams), understand topic importance and connections, and observe changes in collective dream experiences over time and around major events, like the COVID-19 pandemic and the recent Russo-Ukrainian war. We envision that the applications of our method will provide valuable insights into the intricate nature of dreaming.
△ Less
Submitted 9 July, 2023;
originally announced July 2023.
-
The STOIC2021 COVID-19 AI challenge: applying reusable training methodologies to private data
Authors:
Luuk H. Boulogne,
Julian Lorenz,
Daniel Kienzle,
Robin Schon,
Katja Ludwig,
Rainer Lienhart,
Simon Jegou,
Guang Li,
Cong Chen,
Qi Wang,
Derik Shi,
Mayug Maniparambil,
Dominik Muller,
Silvan Mertes,
Niklas Schroter,
Fabio Hellmann,
Miriam Elia,
Ine Dirks,
Matias Nicolas Bossa,
Abel Diaz Berenguer,
Tanmoy Mukherjee,
Jef Vandemeulebroucke,
Hichem Sahli,
Nikos Deligiannis,
Panagiotis Gonidakis
, et al. (13 additional authors not shown)
Abstract:
Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training m…
▽ More
Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training methodologies. With T3, challenge organizers train a codebase provided by the participants on sequestered training data. T3 was implemented in the STOIC2021 challenge, with the goal of predicting from a computed tomography (CT) scan whether subjects had a severe COVID-19 infection, defined as intubation or death within one month. STOIC2021 consisted of a Qualification phase, where participants developed challenge solutions using 2000 publicly available CT scans, and a Final phase, where participants submitted their training methodologies with which solutions were trained on CT scans of 9724 subjects. The organizers successfully trained six of the eight Final phase submissions. The submitted codebases for training and running inference were released publicly. The winning solution obtained an area under the receiver operating characteristic curve for discerning between severe and non-severe COVID-19 of 0.815. The Final phase solutions of all finalists improved upon their Qualification phase solutions.HSUXJM-TNZF9CHSUXJM-TNZF9C
△ Less
Submitted 25 June, 2023; v1 submitted 18 June, 2023;
originally announced June 2023.
-
Drivers of social influence in the Twitter migration to Mastodon
Authors:
Lucio La Cava,
Luca Maria Aiello,
Andrea Tagarelli
Abstract:
The migration of Twitter users to Mastodon following Elon Musk's acquisition presents a unique opportunity to study collective behavior and gain insights into the drivers of coordinated behavior in online media. We analyzed the social network and the public conversations of about 75,000 migrated users and observed that the temporal trace of their migrations is compatible with a phenomenon of socia…
▽ More
The migration of Twitter users to Mastodon following Elon Musk's acquisition presents a unique opportunity to study collective behavior and gain insights into the drivers of coordinated behavior in online media. We analyzed the social network and the public conversations of about 75,000 migrated users and observed that the temporal trace of their migrations is compatible with a phenomenon of social influence, as described by a compartmental epidemic model of information diffusion. Drawing from prior research on behavioral change, we delved into the factors that account for variations across different Twitter communities in the effectiveness of the spreading of the influence to migrate. Communities in which the influence process unfolded more rapidly exhibit lower density of social connections, higher levels of signaled commitment to migrating, and more emphasis on shared identity and exchange of factual knowledge in the community discussion. These factors account collectively for 57% of the variance in the observed data. Our results highlight the joint importance of network structure, commitment, and psycho-linguistic aspects of social interactions in describing grassroots collective action, and contribute to deepen our understanding of the mechanisms driving processes of behavior change of online groups.
△ Less
Submitted 28 November, 2023; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Service Composition in the ChatGPT Era
Authors:
Marco Aiello,
Ilche Georgievski
Abstract:
The paper speculates about how ChatGPT-like systems can support the field of automated service composition and identifies new research areas to explore in order to take advantage of such tools in the field of service-oriented composition.
The paper speculates about how ChatGPT-like systems can support the field of automated service composition and identifies new research areas to explore in order to take advantage of such tools in the field of service-oriented composition.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
The Parrot Dilemma: Human-Labeled vs. LLM-augmented Data in Classification Tasks
Authors:
Anders Giovanni Møller,
Jacob Aarup Dalsgaard,
Arianna Pera,
Luca Maria Aiello
Abstract:
In the realm of Computational Social Science (CSS), practitioners often navigate complex, low-resource domains and face the costly and time-intensive challenges of acquiring and annotating data. We aim to establish a set of guidelines to address such challenges, comparing the use of human-labeled data with synthetically generated data from GPT-4 and Llama-2 in ten distinct CSS classification tasks…
▽ More
In the realm of Computational Social Science (CSS), practitioners often navigate complex, low-resource domains and face the costly and time-intensive challenges of acquiring and annotating data. We aim to establish a set of guidelines to address such challenges, comparing the use of human-labeled data with synthetically generated data from GPT-4 and Llama-2 in ten distinct CSS classification tasks of varying complexity. Additionally, we examine the impact of training data sizes on performance. Our findings reveal that models trained on human-labeled data consistently exhibit superior or comparable performance compared to their synthetically augmented counterparts. Nevertheless, synthetic augmentation proves beneficial, particularly in improving performance on rare classes within multi-class tasks. Furthermore, we leverage GPT-4 and Llama-2 for zero-shot classification and find that, while they generally display strong performance, they often fall short when compared to specialized classifiers trained on moderately sized training sets.
△ Less
Submitted 5 February, 2024; v1 submitted 26 April, 2023;
originally announced April 2023.
-
Multidimensional Tie Strength and Economic Development
Authors:
Luca Maria Aiello,
Sagar Joglekar,
Daniele Quercia
Abstract:
The strength of social relations has been shown to affect an individual's access to opportunities. To date, however, the correspondence between tie strength and population's economic prospects has not been quantified, largely because of the inability to operationalise strength based on Granovetter's classic theory. Our work departed from the premise that tie strength is a unidimensional construct…
▽ More
The strength of social relations has been shown to affect an individual's access to opportunities. To date, however, the correspondence between tie strength and population's economic prospects has not been quantified, largely because of the inability to operationalise strength based on Granovetter's classic theory. Our work departed from the premise that tie strength is a unidimensional construct (typically operationalized with frequency or volume of contact), and used instead a validated model of ten fundamental dimensions of social relationships grounded in the literature of social psychology. We built state-of-the-art NLP tools to infer the presence of these dimensions from textual communication, and analyzed a large conversation network of 630K geo-referenced Reddit users across the entire US connected by 12.8M social ties created over the span of 7 years. We found that unidimensional tie strength is only weakly correlated with economic opportunities (R2=0.30), while multidimensional constructs are highly correlated (R2=0.62). In particular, economic opportunities are associated to the combination of: i) knowledge ties, which bridge geographically distant groups, facilitating the knowledge dissemination across communities; and ii) social support ties, which knit geographically close communities together, and represent dependable sources of social and emotional support. These results point to the importance of develo** high-quality measures of tie strength in network theory.
△ Less
Submitted 22 December, 2022;
originally announced December 2022.
-
The language of opinion change on social media under the lens of communicative action
Authors:
Corrado Monti,
Luca Maria Aiello,
Gianmarco De Francisci Morales,
Francesco Bonchi
Abstract:
Which messages are more effective at inducing a change of opinion in the listener? We approach this question within the frame of Habermas' theory of communicative action, which posits that the illocutionary intent of the message (its pragmatic meaning) is the key. Thanks to recent advances in natural language processing, we are able to operationalize this theory by extracting the latent social dim…
▽ More
Which messages are more effective at inducing a change of opinion in the listener? We approach this question within the frame of Habermas' theory of communicative action, which posits that the illocutionary intent of the message (its pragmatic meaning) is the key. Thanks to recent advances in natural language processing, we are able to operationalize this theory by extracting the latent social dimensions of a message, namely archetypes of social intent of language, that come from social exchange theory. We identify key ingredients to opinion change by looking at more than 46k posts and more than 3.5M comments on Reddit's r/ChangeMyView, a debate forum where people try to change each other's opinion and explicitly mark opinion-changing comments with a special flag called "delta". Comments that express no intent are about 77% less likely to change the mind of the recipient, compared to comments that convey at least one social dimension. Among the various social dimensions, the ones that are most likely to produce an opinion change are knowledge, similarity, and trust, which resonates with Habermas' theory of communicative action. We also find other new important dimensions, such as appeals to power or empathetic expressions of support. Finally, in line with theories of constructive conflict, yet contrary to the popular characterization of conflict as the bane of modern social media, our findings show that voicing conflict in the context of a structured public debate can promote integration, especially when it is used to counter another conflictive stance. By leveraging recent advances in natural language processing, our work provides an empirical framework for Habermas' theory, finds concrete examples of its effects in the wild, and suggests its possible extension with a more faceted understanding of intent interpreted as social dimensions of language.
△ Less
Submitted 31 October, 2022;
originally announced October 2022.
-
Urban form and COVID-19 cases and deaths in Greater London: an urban morphometric approach
Authors:
Alessandro Venerandi,
Luca Maria Aiello,
Sergio Porta
Abstract:
The COVID-19 pandemic generated a considerable debate in relation to urban density. This is an old debate, originated in mid 19th century's England with the emergence of public health and urban planning disciplines. While popularly linked, evidence suggests that such relationship cannot be generally assumed. Furthermore, urban density has been investigated in a spatially coarse manner (predominant…
▽ More
The COVID-19 pandemic generated a considerable debate in relation to urban density. This is an old debate, originated in mid 19th century's England with the emergence of public health and urban planning disciplines. While popularly linked, evidence suggests that such relationship cannot be generally assumed. Furthermore, urban density has been investigated in a spatially coarse manner (predominantly at city level) and never contextualised with other descriptors of urban form. In this work, we explore COVID-19 and urban form in Greater London, relating a comprehensive set of morphometric descriptors (including built-up density) to COVID-19 deaths and cases, while controlling for socioeconomic, ethnicity, age, and co-morbidity. We describe urban form at individual building level and then aggregate information for official neighbourhoods, allowing for a detailed intra-urban representation. Results show that: i) control variables significantly explain more variance of both COVID-19 cases and deaths than the morphometric descriptors; ii) of what the latter can explain, built-up density is indeed the most associated, though inversely. The typical London neighbourhood with high levels of COVID-19 infections and deaths resembles a suburb, featuring a low-density urban fabric dotted by larger free-standing buildings and framed by a poorly inter-connected street network.
△ Less
Submitted 16 October, 2022;
originally announced October 2022.
-
Enough Hot Air: The Role of Immersion Cooling
Authors:
Kawsar Haghshenas,
Brian Setz,
Yannis Bloch,
Marco Aiello
Abstract:
Air cooling is the traditional solution to chill servers in data centers. However, the continuous increase in global data center energy consumption combined with the increase of the racks' power dissipation calls for the use of more efficient alternatives. Immersion cooling is one such alternative. In this paper, we quantitatively examine and compare air cooling and immersion cooling solutions. Th…
▽ More
Air cooling is the traditional solution to chill servers in data centers. However, the continuous increase in global data center energy consumption combined with the increase of the racks' power dissipation calls for the use of more efficient alternatives. Immersion cooling is one such alternative. In this paper, we quantitatively examine and compare air cooling and immersion cooling solutions. The examined characteristics include power usage efficiency (PUE), computing and power density, cost, and maintenance overheads. A direct comparison shows a reduction of about 50% in energy consumption and a reduction of about two-thirds of the occupied space, by using immersion cooling. In addition, the higher heat capacity of used liquids in immersion cooling compared to air allows for much higher rack power densities. Moreover, immersion cooling requires less capital and operational expenditures. However, challenging maintenance procedures together with the increased number of IT failures are the main downsides. By selecting immersion cooling, cloud providers must trade-off the decrease in energy and cost and the increase in power density with its higher maintenance and reliability concerns. Finally, we argue that retrofitting an air-cooled data center with immersion cooling will result in high costs and is generally not recommended.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
Risk Awareness in HTN Planning
Authors:
Ebaa Alnazer,
Ilche Georgievski,
Marco Aiello
Abstract:
Actual real-world domains are characterised by uncertain situations in which acting and use of resources require embracing risk. Performing actions in such domains always entails costs of consuming some resource, such as time, money, or energy, where the knowledge about these costs can range from totally known to totally unknown and even unknowable probabilities of costs. Think of robotic domains,…
▽ More
Actual real-world domains are characterised by uncertain situations in which acting and use of resources require embracing risk. Performing actions in such domains always entails costs of consuming some resource, such as time, money, or energy, where the knowledge about these costs can range from totally known to totally unknown and even unknowable probabilities of costs. Think of robotic domains, where actions and their costs are non-deterministic due to uncertain factors like obstacles. Choosing which action to perform considering its cost on the available resource requires taking a stance on risk. Thus, these domains call for not only planning under uncertainty but also planning while embracing risk. Taking Hierarchical Task Network (HTN) planning as a widely used planning technique in real-world applications, one can observe that existing approaches do not account for risk. That is, computing most probable or optimal plans using actions with single-valued costs is only enough to express risk neutrality. In this work, we postulate that HTN planning can become risk aware by considering expected utility theory, a representative concept of decision theory that enables choosing actions considering a probability distribution of their costs and a given risk attitude expressed using a utility function. In particular, we introduce a general framework for HTN planning that allows modelling risk and uncertainty using a probability distribution of action costs upon which we define risk-aware HTN planning as an approach that accounts for the different risk attitudes and allows computing plans that go beyond risk neutrality. In fact, we layout that computing risk-aware plans requires finding plans with the highest expected utility. Finally, we argue that it is possible for HTN planning agents to solve specialised risk-aware HTN planning problems by adapting some existing HTN planning approaches.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
Heterogeneous rarity patterns drive price dynamics in NFT collections
Authors:
Amin Mekacher,
Alberto Bracci,
Matthieu Nadini,
Mauro Martino,
Laura Alessandretti,
Luca Maria Aiello,
Andrea Baronchelli
Abstract:
We quantify Non Fungible Token (NFT) rarity and investigate how it impacts market behaviour by analysing a dataset of 3.7M transactions collected between January 2018 and June 2022, involving 1.4M NFTs distributed across 410 collections. First, we consider the rarity of an NFT based on the set of human-readable attributes it possesses and show that most collections present heterogeneous rarity pat…
▽ More
We quantify Non Fungible Token (NFT) rarity and investigate how it impacts market behaviour by analysing a dataset of 3.7M transactions collected between January 2018 and June 2022, involving 1.4M NFTs distributed across 410 collections. First, we consider the rarity of an NFT based on the set of human-readable attributes it possesses and show that most collections present heterogeneous rarity patterns, with few rare NFTs and a large number of more common ones. Then, we analyze market performance and show that, on average, rarer NFTs: (i) sell for higher prices, (ii) are traded less frequently, (iii) guarantee higher returns on investment (ROIs), and (iv) are less risky, i.e., less prone to yield negative returns. We anticipate that these findings will be of interest to researchers as well as NFT creators, collectors, and traders.
△ Less
Submitted 31 August, 2022; v1 submitted 21 April, 2022;
originally announced April 2022.
-
Epidemic Dreams: Dreaming about health during the COVID-19 pandemic
Authors:
Sanja Šćepanović,
Luca Maria Aiello,
Deirdre Barrett,
Daniele Quercia
Abstract:
The continuity hypothesis of dreams suggests that the content of dreams is continuous with the dreamer's waking experiences. Given the unprecedented nature of the experiences during COVID-19, we studied the continuity hypothesis in the context of the pandemic. We implemented a deep-learning algorithm that can extract mentions of medical conditions from text and applied it to two datasets collected…
▽ More
The continuity hypothesis of dreams suggests that the content of dreams is continuous with the dreamer's waking experiences. Given the unprecedented nature of the experiences during COVID-19, we studied the continuity hypothesis in the context of the pandemic. We implemented a deep-learning algorithm that can extract mentions of medical conditions from text and applied it to two datasets collected during the pandemic: 2,888 dream reports (dreaming life experiences), and 57M tweets mentioning the pandemic (waking life experiences). The health expressions common to both sets were typical COVID-19 symptoms (e.g., cough, fever, and anxiety), suggesting that dreams reflected people's real-world experiences. The health expressions that distinguished the two sets reflected differences in thought processes: expressions in waking life reflected a linear and logical thought process and, as such, described realistic symptoms or related disorders (e.g., nasal pain, SARS, H1N1); those in dreaming life reflected a thought process closer to the visual and emotional spheres and, as such, described either conditions unrelated to the virus (e.g., maggots, deformities, snakebites), or conditions of surreal nature (e.g., teeth falling out, body crumbling into sand). Our results confirm that dream reports represent an understudied yet valuable source of people's health experiences in the real world.
△ Less
Submitted 2 February, 2022;
originally announced February 2022.
-
From Reddit to Wall Street: The role of committed minorities in financial collective action
Authors:
Lorenzo Lucchini,
Luca Maria Aiello,
Laura Alessandretti,
Gianmarco De Francisci Morales,
Michele Starnini,
Andrea Baronchelli
Abstract:
In January 2021, retail investors coordinated on Reddit to target short selling activity by hedge funds on GameStop shares, causing a surge in the share price and triggering significant losses for the funds involved. Such an effective collective action was unprecedented in finance, and its dynamics remain unclear. Here, we analyse Reddit and financial data and rationalise the events based on recen…
▽ More
In January 2021, retail investors coordinated on Reddit to target short selling activity by hedge funds on GameStop shares, causing a surge in the share price and triggering significant losses for the funds involved. Such an effective collective action was unprecedented in finance, and its dynamics remain unclear. Here, we analyse Reddit and financial data and rationalise the events based on recent findings describing how a small fraction of committed individuals may trigger behavioural cascades. First, we operationalise the concept of individual commitment in financial discussions. Second, we show that the increase of commitment within Reddit predated the initial surge in price. Third, we reveal that initial committed users occupied a central position in the network of Reddit conversations. Finally, we show that the social identity of the broader Reddit community grew as the collective action unfolded. These findings shed light on financial collective action, as several observers anticipate it will grow in importance.
△ Less
Submitted 13 September, 2021; v1 submitted 15 July, 2021;
originally announced July 2021.
-
Cartographic Design of Cultural Maps
Authors:
Edyta Paulina Bogucka,
Marios Constantinides,
Luca Maria Aiello,
Daniele Quercia,
Wonyoung So,
Melanie Bancilhon
Abstract:
Throughout history, maps have been used as a tool to explore cities. They visualize a city's urban fabric through its streets, buildings, and points of interest. Besides purely navigation purposes, street names also reflect a city's culture through its commemorative practices. Therefore, cultural maps that unveil socio-cultural characteristics encoded in street names could potentially raise citize…
▽ More
Throughout history, maps have been used as a tool to explore cities. They visualize a city's urban fabric through its streets, buildings, and points of interest. Besides purely navigation purposes, street names also reflect a city's culture through its commemorative practices. Therefore, cultural maps that unveil socio-cultural characteristics encoded in street names could potentially raise citizens' historical awareness. But designing effective cultural maps is challenging, not only due to data scarcity but also due to the lack of effective approaches to engage citizens with data exploration. To address these challenges, we collected a dataset of 5,000 streets across the cities of Paris, Vienna, London, and New York, and built their cultural maps grounded on cartographic storytelling techniques. Through data exploration scenarios, we demonstrated how cultural maps engage users and allow them to discover distinct patterns in the ways these cities are gender-biased, celebrate various professions, and embrace foreign cultures.
△ Less
Submitted 8 June, 2021;
originally announced June 2021.
-
Streetonomics: Quantifying Culture Using Street Names
Authors:
Melanie Bancilhon,
Marios Constantinides,
Edyta Paulina Bogucka,
Luca Maria Aiello,
Daniele Quercia
Abstract:
Quantifying a society's value system is important because it suggests what people deeply care about -- it reflects who they actually are and, more importantly, who they will like to be. This cultural quantification has been typically done by studying literary production. However, a society's value system might well be implicitly quantified based on the decisions that people took in the past and th…
▽ More
Quantifying a society's value system is important because it suggests what people deeply care about -- it reflects who they actually are and, more importantly, who they will like to be. This cultural quantification has been typically done by studying literary production. However, a society's value system might well be implicitly quantified based on the decisions that people took in the past and that were mediated by what they care about. It turns out that one class of these decisions is visible in ordinary settings: it is visible in street names. We studied the names of 4,932 honorific streets in the cities of Paris, Vienna, London and New York. We chose these four cities because they were important centers of cultural influence for the Western world in the 20th century. We found that street names greatly reflect the extent to which a society is gender biased, which professions are considered elite ones, and the extent to which a city is influenced by the rest of the world. This way of quantifying a society's value system promises to inform new methodologies in Digital Humanities; makes it possible for municipalities to reflect on their past to inform their future; and informs the design of everyday's educational tools that promote historical awareness in a playful way.
△ Less
Submitted 18 June, 2021; v1 submitted 8 June, 2021;
originally announced June 2021.
-
Map** the NFT revolution: market trends, trade networks and visual features
Authors:
Matthieu Nadini,
Laura Alessandretti,
Flavio Di Giacinto,
Mauro Martino,
Luca Maria Aiello,
Andrea Baronchelli
Abstract:
Non Fungible Tokens (NFTs) are digital assets that represent objects like art, collectible, and in-game items. They are traded online, often with cryptocurrency, and are generally encoded within smart contracts on a blockchain. Public attention towards NFTs has exploded in 2021, when their market has experienced record sales, but little is known about the overall structure and evolution of its mar…
▽ More
Non Fungible Tokens (NFTs) are digital assets that represent objects like art, collectible, and in-game items. They are traded online, often with cryptocurrency, and are generally encoded within smart contracts on a blockchain. Public attention towards NFTs has exploded in 2021, when their market has experienced record sales, but little is known about the overall structure and evolution of its market. Here, we analyse data concerning 6.1 million trades of 4.7 million NFTs between June 23, 2017 and April 27, 2021, obtained primarily from Ethereum and WAX blockchains. First, we characterize statistical properties of the market. Second, we build the network of interactions, show that traders typically specialize on NFTs associated with similar objects and form tight clusters with other traders that exchange the same kind of objects. Third, we cluster objects associated to NFTs according to their visual features and show that collections contain visually homogeneous objects. Finally, we investigate the predictability of NFT sales using simple machine learning algorithms and find that sale history and, secondarily, visual features are good predictors for price. We anticipate that these findings will stimulate further research on NFT production, adoption, and trading in different contexts.
△ Less
Submitted 20 September, 2021; v1 submitted 1 June, 2021;
originally announced June 2021.
-
The Healthy States of America: Creating a Health Taxonomy with Social Media
Authors:
Sanja Scepanovic,
Luca Maria Aiello,
Ke Zhou,
Sagar Joglekar,
Daniele Quercia
Abstract:
Since the uptake of social media, researchers have mined online discussions to track the outbreak and evolution of specific diseases or chronic conditions such as influenza or depression. To broaden the set of diseases under study, we developed a Deep Learning tool for Natural Language Processing that extracts mentions of virtually any medical condition or disease from unstructured social media te…
▽ More
Since the uptake of social media, researchers have mined online discussions to track the outbreak and evolution of specific diseases or chronic conditions such as influenza or depression. To broaden the set of diseases under study, we developed a Deep Learning tool for Natural Language Processing that extracts mentions of virtually any medical condition or disease from unstructured social media text. With that tool at hand, we processed Reddit and Twitter posts, analyzed the clusters of the two resulting co-occurrence networks of conditions, and discovered that they correspond to well-defined categories of medical conditions. This resulted in the creation of the first comprehensive taxonomy of medical conditions automatically derived from online discussions. We validated the structure of our taxonomy against the official International Statistical Classification of Diseases and Related Health Problems (ICD-11), finding matches of our clusters with 20 official categories, out of 22. Based on the mentions of our taxonomy's sub-categories on Reddit posts geo-referenced in the U.S., we were then able to compute disease-specific health scores. As opposed to counts of disease mentions or counts with no knowledge of our taxonomy's structure, we found that our disease-specific health scores are causally linked with the officially reported prevalence of 18 conditions.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
HeartBees: Visualizing Crowd Affects
Authors:
Chao Ying Qin,
Marios Constantinides,
Luca Maria Aiello,
Daniele Quercia
Abstract:
Affective sharing within groups strengthens coordination and empathy, leads to better health outcomes, and increases productivity and performance. Existing tools for affective sharing face one main challenge: creating a representation of collective emotional states that is relatable and universally accessible. To overcome this challenge, we propose HeartBees, a bio-feedback system for visualizing…
▽ More
Affective sharing within groups strengthens coordination and empathy, leads to better health outcomes, and increases productivity and performance. Existing tools for affective sharing face one main challenge: creating a representation of collective emotional states that is relatable and universally accessible. To overcome this challenge, we propose HeartBees, a bio-feedback system for visualizing collective emotional states, which maps a multi-dimensional emotion model into a metaphorical visualization of flocks of birds. Grounded on Affective Computing literature and physiological sensing, we mapped physiological indicators that could be obtained from wearable devices into a multi-dimensional emotion model, which, in turn, our HeartBees can make use of. We evaluated our nature-inspired interactive system with 353 online participants, whose responses showed good consensus in the way they subjectively perceived the visualizations. Last, we discuss practical applications of HeartBees.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
How Epidemic Psychology Works on Twitter: Evolution of responses to the COVID-19 pandemic in the U.S
Authors:
Luca Maria Aiello,
Daniele Quercia,
Ke Zhou,
Marios Constantinides,
Sanja Šćepanović,
Sagar Joglekar
Abstract:
Disruptions resulting from an epidemic might often appear to amount to chaos but, in reality, can be understood in a systematic way through the lens of "epidemic psychology". According to Philip Strong, the founder of the sociological study of epidemic infectious diseases, not only is an epidemic biological; there is also the potential for three psycho-social epidemics: of fear, moralization, and…
▽ More
Disruptions resulting from an epidemic might often appear to amount to chaos but, in reality, can be understood in a systematic way through the lens of "epidemic psychology". According to Philip Strong, the founder of the sociological study of epidemic infectious diseases, not only is an epidemic biological; there is also the potential for three psycho-social epidemics: of fear, moralization, and action. This work empirically tests Strong's model at scale by studying the use of language of 122M tweets related to the COVID-19 pandemic posted in the U.S. during the whole year of 2020. On Twitter, we identified three distinct phases. Each of them is characterized by different regimes of the three psycho-social epidemics. In the refusal phase, users refused to accept reality despite the increasing number of deaths in other countries. In the anger phase (started after the announcement of the first death in the country), users' fear translated into anger about the looming feeling that things were about to change. Finally, in the acceptance phase, which began after the authorities imposed physical-distancing measures, users settled into a "new normal" for their daily activities. Overall, refusal of accepting reality gradually died off as the year went on, while acceptance increasingly took hold. During 2020, as cases surged in waves, so did anger, re-emerging cyclically at each wave. Our real-time operationalization of Strong's model is designed in a way that makes it possible to embed epidemic psychology into real-time models (e.g., epidemiological and mobility models).
△ Less
Submitted 20 July, 2021; v1 submitted 26 July, 2020;
originally announced July 2020.
-
Ten Social Dimensions of Conversations and Relationships
Authors:
Minje Choi,
Luca Maria Aiello,
Krisztian Zsolt Varga,
Daniele Quercia
Abstract:
Decades of social science research identified ten fundamental dimensions that provide the conceptual building blocks to describe the nature of human relationships. Yet, it is not clear to what extent these concepts are expressed in everyday language and what role they have in sha** observable dynamics of social interactions. After annotating conversational text through crowdsourcing, we trained…
▽ More
Decades of social science research identified ten fundamental dimensions that provide the conceptual building blocks to describe the nature of human relationships. Yet, it is not clear to what extent these concepts are expressed in everyday language and what role they have in sha** observable dynamics of social interactions. After annotating conversational text through crowdsourcing, we trained NLP tools to detect the presence of these types of interaction from conversations, and applied them to 160M messages written by geo-referenced Reddit users, 290k emails from the Enron corpus and 300k lines of dialogue from movie scripts. We show that social dimensions can be predicted purely from conversations with an AUC up to 0.98, and that the combination of the predicted dimensions suggests both the types of relationships people entertain (conflict vs. support) and the types of real-world communities (wealthy vs. deprived) they shape.
△ Less
Submitted 27 January, 2020;
originally announced January 2020.
-
IPPO: A Privacy-Aware Architecture for Decentralized Data-sharing
Authors:
Maurizio Aiello,
Enrico Cambiaso,
Roberto Canonico,
Leonardo Maccari,
Marco Mellia,
Antonio Pescapè,
Ivan Vaccari
Abstract:
Online trackers personalize ads campaigns, exponentially increasing their efficacy compared to traditional channels. The downside of this is that thousands of mostly unknown systems own our profiles and violate our privacy without our awareness. IPPO turns the table and re-empower users of their data, through anonymised data publishing via a Blockchain-based Decentralized Data Marketplace. We also…
▽ More
Online trackers personalize ads campaigns, exponentially increasing their efficacy compared to traditional channels. The downside of this is that thousands of mostly unknown systems own our profiles and violate our privacy without our awareness. IPPO turns the table and re-empower users of their data, through anonymised data publishing via a Blockchain-based Decentralized Data Marketplace. We also propose a service based on machine learning and big data analytics to automatically identify web trackers and build Privacy Labels (PLs), based on the nutrition labels concept. This paper describes the motivation, the vision, the architecture and the research challenges related to IPPO.
△ Less
Submitted 17 January, 2020;
originally announced January 2020.
-
FaceLift: A transparent deep learning framework to beautify urban scenes
Authors:
Sagar Joglekar,
Daniele Quercia,
Miriam Redi,
Luca Maria Aiello,
Tobias Kauer,
Nishanth Sastry
Abstract:
In the area of computer vision, deep learning techniques have recently been used to predict whether urban scenes are likely to be considered beautiful: it turns out that these techniques are able to make accurate predictions. Yet they fall short when it comes to generating actionable insights for urban design. To support urban interventions, one needs to go beyond predicting beauty, and tackle the…
▽ More
In the area of computer vision, deep learning techniques have recently been used to predict whether urban scenes are likely to be considered beautiful: it turns out that these techniques are able to make accurate predictions. Yet they fall short when it comes to generating actionable insights for urban design. To support urban interventions, one needs to go beyond predicting beauty, and tackle the challenge of recreating beauty. Unfortunately, deep learning techniques have not been designed with that challenge in mind. Given their "black-box nature", these models cannot be directly used to explain why a particular urban scene is deemed to be beautiful. To partly fix that, we propose a deep learning framework called Facelift, that is able to both beautify existing urban scenes (Google Street views) and explain which urban elements make those transformed scenes beautiful. To quantitatively evaluate our framework, we cannot resort to any existing metric (as the research problem at hand has never been tackled before) and need to formulate new ones. These new metrics should ideally capture the presence/absence of elements that make urban spaces great. Upon a review of the urban planning literature, we identify five main metrics: walkability, green spaces, openness, landmarks and visual complexity. We find that, across all the five metrics, the beautified scenes meet the expectations set by the literature on what great spaces tend to be made of. This result is further confirmed by a 20-participant expert survey in which FaceLift have been found to be effective in promoting citizen participation. All this suggests that, in the future, as our framework's components are further researched and become better and more sophisticated, it is not hard to imagine technologies that will be able to accurately and efficiently support architects and planners in the design of spaces we intuitively love.
△ Less
Submitted 16 January, 2020;
originally announced January 2020.
-
The Language of Dialogue Is Complex
Authors:
Alexander Robertson,
Luca Maria Aiello,
Daniele Quercia
Abstract:
Integrative Complexity (IC) is a psychometric that measures the ability of a person to recognize multiple perspectives and connect them, thus identifying paths for conflict resolution. IC has been linked to a wide variety of political, social and personal outcomes but evaluating it is a time-consuming process requiring skilled professionals to manually score texts, a fact which accounts for the li…
▽ More
Integrative Complexity (IC) is a psychometric that measures the ability of a person to recognize multiple perspectives and connect them, thus identifying paths for conflict resolution. IC has been linked to a wide variety of political, social and personal outcomes but evaluating it is a time-consuming process requiring skilled professionals to manually score texts, a fact which accounts for the limited exploration of IC at scale on social media.We combine natural language processing and machine learning to train an IC classification model that achieves state-of-the-art performance on unseen data and more closely adheres to the established structure of the IC coding process than previous automated approaches. When applied to the content of 400k+ comments from online fora about depression and knowledge exchange, our model was capable of replicating key findings of prior work, thus providing the first example of using IC tools for large-scale social media analytics.
△ Less
Submitted 5 June, 2019;
originally announced June 2019.
-
Large-scale and high-resolution analysis of food purchases and health outcomes
Authors:
Luca Maria Aiello,
Rossano Schifanella,
Daniele Quercia,
Lucia Del Prete
Abstract:
To complement traditional dietary surveys, which are costly and of limited scale, researchers have resorted to digital data to infer the impact of eating habits on people's health. However, online studies are limited in resolution: they are carried out at regional level and do not capture precisely the composition of the food consumed. We study the association between food consumption (derived fro…
▽ More
To complement traditional dietary surveys, which are costly and of limited scale, researchers have resorted to digital data to infer the impact of eating habits on people's health. However, online studies are limited in resolution: they are carried out at regional level and do not capture precisely the composition of the food consumed. We study the association between food consumption (derived from the loyalty cards of the main grocery retailer in London) and health outcomes (derived from publicly-available medical prescription records). The scale and granularity of our analysis is unprecedented: we analyze 1.6B food item purchases and 1.1B medical prescriptions for the entire city of London over the course of one year. By studying food consumption down to the level of nutrients, we show that nutrient diversity and amount of calories are the strongest predictors of the prevalence of three diseases related to what is called the "metabolic syndrome": hypertension, high cholesterol, and diabetes. This syndrome is a cluster of symptoms generally associated with obesity, is common across the rich world, and affects one in four adults in the UK. Our linear regression models achieve an R2 of 0.6 when estimating the prevalence of diabetes in nearly 1000 census areas in London, and a classifier can identify (un)healthy areas with up to 91% accuracy. Interestingly, healthy areas are not necessarily well-off (income matters less than what one would expect) and have distinctive features: they tend to systematically eat less carbohydrates and sugar, diversify nutrients, and avoid large quantities. More generally, our study shows that analytics of digital records of grocery purchases can be used as a cheap and scalable tool for health surveillance and, upon these records, different stakeholders from governments to insurance companies to food companies could implement effective prevention strategies.
△ Less
Submitted 30 April, 2019;
originally announced May 2019.
-
Coloring in the Links: Capturing Social Ties as They are Perceived
Authors:
Sebastian Deri,
Jeremie Rappaz,
Luca Maria Aiello,
Daniele Quercia
Abstract:
The richness that characterizes relationships is often absent when they are modeled using computational methods in network science. Typically, relationships are represented simply as links, perhaps with weights. The lack of finer granularity is due in part to the fact that, aside from linkage and strength, no fundamental or immediately obvious dimensions exist along which to categorize relationshi…
▽ More
The richness that characterizes relationships is often absent when they are modeled using computational methods in network science. Typically, relationships are represented simply as links, perhaps with weights. The lack of finer granularity is due in part to the fact that, aside from linkage and strength, no fundamental or immediately obvious dimensions exist along which to categorize relationships. Here we propose a set of dimensions that capture major components of many relationships -- derived both from relevant academic literature and people's everyday descriptions of their relationships. We first review prominent findings in sociology and social psychology, highlighting dimensions that have been widely used to categorize social relationships. Next, we examine the validity of these dimensions empirically in two crowd-sourced experiments. Ultimately, we arrive at a set of ten major dimensions that can be used to categorize relationships: similarity, trust, romance, social support, identity, respect, knowledge exchange, power, fun, and conflict. These ten dimensions, while not dispositive, offer higher resolution than existing models. Indeed, we show that one can more accurately predict missing links in a social graph by using these dimensions than by using a state-of-the-art link embeddedness method. We also describe tinghy.org, an online platform we built to collect data about how social media users perceive their online relationships, allowing us to examine these dimensions at scale. Overall, by proposing a new way of modeling social graphs, our work aims to contribute both to theory in network science and practice in the design of social-networking applications.
△ Less
Submitted 12 February, 2019;
originally announced February 2019.
-
Anticipating cryptocurrency prices using machine learning
Authors:
Laura Alessandretti,
Abeer ElBahrawy,
Luca Maria Aiello,
Andrea Baronchelli
Abstract:
Machine learning and AI-assisted trading have attracted growing interest for the past few years. Here, we use this approach to test the hypothesis that the inefficiency of the cryptocurrency market can be exploited to generate abnormal profits. We analyse daily data for $1,681$ cryptocurrencies for the period between Nov. 2015 and Apr. 2018. We show that simple trading strategies assisted by state…
▽ More
Machine learning and AI-assisted trading have attracted growing interest for the past few years. Here, we use this approach to test the hypothesis that the inefficiency of the cryptocurrency market can be exploited to generate abnormal profits. We analyse daily data for $1,681$ cryptocurrencies for the period between Nov. 2015 and Apr. 2018. We show that simple trading strategies assisted by state-of-the-art machine learning algorithms outperform standard benchmarks. Our results show that nontrivial, but ultimately simple, algorithmic mechanisms can help anticipate the short-term evolution of the cryptocurrency market.
△ Less
Submitted 9 November, 2018; v1 submitted 22 May, 2018;
originally announced May 2018.
-
Hearts and Politics: Metrics for Tracking Biorhythm Changes during Brexit and Trump
Authors:
Luca Maria Aiello,
Daniele Quercia,
Eva Roitmann
Abstract:
Our internal experience of time reflects what is going in the world around us. Our body's natural rhythms get disrupted for a variety of external factors, including exposure to collective events. We collect readings of steps, sleep, and heart rates from 11K users of health tracking devices in London and San Francisco. We introduce measures to quantify changes in not only volume of these three bio-…
▽ More
Our internal experience of time reflects what is going in the world around us. Our body's natural rhythms get disrupted for a variety of external factors, including exposure to collective events. We collect readings of steps, sleep, and heart rates from 11K users of health tracking devices in London and San Francisco. We introduce measures to quantify changes in not only volume of these three bio-signals (as previous research has done) but also synchronicity and periodicity, and we empirically assess how strong those variations are, compared to random expectation, during four major events: Christmas, New Year's Eve, Brexit, and the US presidential election of 2016 (Donald Trump's election). While Christmas and New Year's eve are associated with short-term effects, Brexit and Trump's election are associated with longer-term disruptions. Our results promise to inform the design of new ways of monitoring population health at scale.
△ Less
Submitted 18 April, 2018;
originally announced April 2018.
-
The New Urban Success: How Culture Pays
Authors:
Desislava Hristova,
Luca Maria Aiello,
Daniele Quercia
Abstract:
Urban economists have put forward the idea that cities that are culturally interesting tend to attract "the creative class" and, as a result, end up being economically successful. Yet it is still unclear how economic and cultural dynamics mutually influence each other. By contrast, that has been extensively studied in the case of individuals. Over decades, the French sociologist Pierre Bourdieu sh…
▽ More
Urban economists have put forward the idea that cities that are culturally interesting tend to attract "the creative class" and, as a result, end up being economically successful. Yet it is still unclear how economic and cultural dynamics mutually influence each other. By contrast, that has been extensively studied in the case of individuals. Over decades, the French sociologist Pierre Bourdieu showed that people's success and their positions in society mainly depend on how much they can spend (their economic capital) and what their interests are (their cultural capital). For the first time, we adapt Bourdieu's framework to the city context. We operationalize a neighborhood's cultural capital in terms of the cultural interests that pictures geo-referenced in the neighborhood tend to express. This is made possible by the mining of what users of the photo-sharing site of Flickr have posted in the cities of London and New York over 5 years. In so doing, we are able to show that economic capital alone does not explain urban development. The combination of cultural capital and economic capital, instead, is more indicative of neighborhood growth in terms of house prices and improvements of socio-economic conditions. Culture pays, but only up to a point as it comes with one of the most vexing urban challenges: that of gentrification.
△ Less
Submitted 10 April, 2018;
originally announced April 2018.
-
Beautiful and damned. Combined effect of content quality and social ties on user engagement
Authors:
Luca M. Aiello,
Rossano Schifanella,
Miriam Redi,
Stacey Svetlichnaya,
Frank Liu,
Simon Osindero
Abstract:
User participation in online communities is driven by the intertwinement of the social network structure with the crowd-generated content that flows along its links. These aspects are rarely explored jointly and at scale. By looking at how users generate and access pictures of varying beauty on Flickr, we investigate how the production of quality impacts the dynamics of online social systems. We d…
▽ More
User participation in online communities is driven by the intertwinement of the social network structure with the crowd-generated content that flows along its links. These aspects are rarely explored jointly and at scale. By looking at how users generate and access pictures of varying beauty on Flickr, we investigate how the production of quality impacts the dynamics of online social systems. We develop a deep learning computer vision model to score images according to their aesthetic value and we validate its output through crowdsourcing. By applying it to over 15B Flickr photos, we study for the first time how image beauty is distributed over a large-scale social system. Beautiful images are evenly distributed in the network, although only a small core of people get social recognition for them. To study the impact of exposure to quality on user engagement, we set up matching experiments aimed at detecting causality from observational data. Exposure to beauty is double-edged: following people who produce high-quality content increases one's probability of uploading better photos; however, an excessive imbalance between the quality generated by a user and the user's neighbors leads to a decline in engagement. Our analysis has practical implications for improving link recommender systems.
△ Less
Submitted 1 November, 2017;
originally announced November 2017.
-
Personalized advice for enhancing well-being using automated impulse response analysis --- AIRA
Authors:
F. J. Blaauw,
L. van der Krieke,
A. C. Emerencia,
M. Aiello,
P. de Jonge
Abstract:
The attention for personalized mental health care is thriving. Research data specific to the individual, such as time series sensor data or data from intensive longitudinal studies, is relevant from a research perspective, as analyses on these data can reveal the heterogeneity among the participants and provide more precise and individualized results than with group-based methods. However, using t…
▽ More
The attention for personalized mental health care is thriving. Research data specific to the individual, such as time series sensor data or data from intensive longitudinal studies, is relevant from a research perspective, as analyses on these data can reveal the heterogeneity among the participants and provide more precise and individualized results than with group-based methods. However, using this data for self-management and to help the individual to improve his or her mental health has proven to be challenging.
The present work describes a novel approach to automatically generate personalized advice for the improvement of the well-being of individuals by using time series data from intensive longitudinal studies: Automated Impulse Response Analysis (AIRA). AIRA analyzes vector autoregression models of well-being by generating impulse response functions. These impulse response functions are used in simulations to determine which variables in the model have the largest influence on the other variables and thus on the well-being of the participant. The effects found can be used to support self-management.
We demonstrate the practical usefulness of AIRA by performing analysis on longitudinal self-reported data about psychological variables. To evaluate its effectiveness and efficacy, we ran its algorithms on two data sets ($N=4$ and $N=5$), and discuss the results. Furthermore, we compare AIRA's output to the results of a previously published study and show that the results are comparable. By automating Impulse Response Function Analysis, AIRA fulfills the need for accurate individualized models of health outcomes at a low resource cost with the potential for upscaling.
△ Less
Submitted 15 June, 2017;
originally announced June 2017.
-
Evolution of Ego-networks in Social Media with Link Recommendations
Authors:
Luca Maria Aiello,
Nicola Barbieri
Abstract:
Ego-networks are fundamental structures in social graphs, yet the process of their evolution is still widely unexplored. In an online context, a key question is how link recommender systems may skew the growth of these networks, possibly restraining diversity. To shed light on this matter, we analyze the complete temporal evolution of 170M ego-networks extracted from Flickr and Tumblr, comparing l…
▽ More
Ego-networks are fundamental structures in social graphs, yet the process of their evolution is still widely unexplored. In an online context, a key question is how link recommender systems may skew the growth of these networks, possibly restraining diversity. To shed light on this matter, we analyze the complete temporal evolution of 170M ego-networks extracted from Flickr and Tumblr, comparing links that are created spontaneously with those that have been algorithmically recommended. We find that the evolution of ego-networks is bursty, community-driven, and characterized by subsequent phases of explosive diameter increase, slight shrinking, and stabilization. Recommendations favor popular and well-connected nodes, limiting the diameter expansion. With a matching experiment aimed at detecting causal relationships from observational data, we find that the bias introduced by the recommendations fosters global diversity in the process of neighbor selection. Last, with two link prediction experiments, we show how insights from our analysis can be used to improve the effectiveness of social recommender systems.
△ Less
Submitted 5 February, 2017;
originally announced February 2017.
-
iPhone's Digital Marketplace: Characterizing the Big Spenders
Authors:
Farshad Kooti,
Mihajlo Grbovic,
Luca Maria Aiello,
Eric Bax,
Kristina Lerman
Abstract:
With mobile shop** surging in popularity, people are spending ever more money on digital purchases through their mobile devices and phones. However, few large-scale studies of mobile shop** exist. In this paper we analyze a large data set consisting of more than 776M digital purchases made on Apple mobile devices that include songs, apps, and in-app purchases. We find that 61% of all the spend…
▽ More
With mobile shop** surging in popularity, people are spending ever more money on digital purchases through their mobile devices and phones. However, few large-scale studies of mobile shop** exist. In this paper we analyze a large data set consisting of more than 776M digital purchases made on Apple mobile devices that include songs, apps, and in-app purchases. We find that 61% of all the spending is on in-app purchases and that the top 1% of users are responsible for 59% of all the spending. These big spenders are more likely to be male and older, and less likely to be from the US. We study how they adopt and abandon individual app, and find that, after an initial phase of increased daily spending, users gradually lose interest: the delay between their purchases increases and the spending decreases with a sharp drop toward the end. Finally, we model the in-app purchasing behavior in multiple steps: 1) we model the time between purchases; 2) we train a classifier to predict whether the user will make a purchase from a new app or continue purchasing from the existing app; and 3) based on the outcome of the previous step, we attempt to predict the exact app, new or existing, from which the next purchase will come. The results yield new insights into spending habits in the mobile digital marketplace.
△ Less
Submitted 25 January, 2017;
originally announced January 2017.
-
Pornography consumption in Social Media
Authors:
Mauro Coletto,
Luca Maria Aiello,
Claudio Lucchese,
Fabrizio Silvestri
Abstract:
The structure of a social network is fundamentally related to the interests of its members. People assort spontaneously based on the topics that are relevant to them, forming social groups that revolve around different subjects. Online social media are also favorable ecosystems for the formation of topical communities centered on matters that are not commonly taken up by the general public because…
▽ More
The structure of a social network is fundamentally related to the interests of its members. People assort spontaneously based on the topics that are relevant to them, forming social groups that revolve around different subjects. Online social media are also favorable ecosystems for the formation of topical communities centered on matters that are not commonly taken up by the general public because of the embarrassment, discomfort, or shock they may cause. Those are communities that depict or discuss what are usually referred to as deviant behaviors, conducts that are commonly considered inappropriate because they are somehow violative of society's norms or moral standards that are shared among the majority of the members of society. Pornography consumption, drug use, excessive drinking, illegal hunting, eating disorders, or any self-harming or addictive practice are all examples of deviant behaviors.
△ Less
Submitted 20 January, 2017; v1 submitted 24 December, 2016;
originally announced December 2016.
-
On the Behaviour of Deviant Communities in Online Social Networks
Authors:
Mauro Coletto,
Luca Maria Aiello,
Claudio Lucchese,
Fabrizio Silvestri
Abstract:
On-line social networks are complex ensembles of inter-linked communities that interact on different topics. Some communities are characterized by what are usually referred to as deviant behaviors, conducts that are commonly considered inappropriate with respect to the society's norms or moral standards. Eating disorders, drug use, and adult content consumption are just a few examples. We refer to…
▽ More
On-line social networks are complex ensembles of inter-linked communities that interact on different topics. Some communities are characterized by what are usually referred to as deviant behaviors, conducts that are commonly considered inappropriate with respect to the society's norms or moral standards. Eating disorders, drug use, and adult content consumption are just a few examples. We refer to such communities as deviant networks. It is commonly believed that such deviant networks are niche, isolated social groups, whose activity is well separated from the mainstream social-media life. According to this assumption, research studies have mostly considered them in isolation. In this work we focused on adult content consumption networks, which are present in many on-line social media and in the Web in general. We found that few small and densely connected communities are responsible for most of the content production. Differently from previous work, we studied how such communities interact with the whole social network. We found that the produced content flows to the rest of the network mostly directly or through bridge-communities, reaching at least 450 times more users. We also show that a large fraction of the users can be inadvertently exposed to such content through indirect content resharing. We also discuss a demographic analysis of the producers and consumers networks. Finally, we show that it is easily possible to identify a few core users to radically uproot the diffusion process. We aim at setting the basis to study deviant communities in context.
△ Less
Submitted 26 October, 2016;
originally announced October 2016.
-
The Emotional and Chromatic Layers of Urban Smells
Authors:
Daniele Quercia,
Luca Maria Aiello,
Rossano Schifanella
Abstract:
People are able to detect up to 1 trillion odors. Yet, city planning is concerned only with a few bad odors, mainly because odors are currently captured only through complaints made by urban dwellers. To capture both good and bad odors, we resort to a methodology that has been recently proposed and relies on tagging information of geo-referenced pictures. In doing so for the cities of London and B…
▽ More
People are able to detect up to 1 trillion odors. Yet, city planning is concerned only with a few bad odors, mainly because odors are currently captured only through complaints made by urban dwellers. To capture both good and bad odors, we resort to a methodology that has been recently proposed and relies on tagging information of geo-referenced pictures. In doing so for the cities of London and Barcelona, this work makes three new contributions. We study 1) how the urban smellscape changes in time and space; 2) which emotions people share at places with specific smells; and 3) what is the color of a smell, if it exists. Without social media data, insights about those three aspects have been difficult to produce in the past, further delaying the creation of urban restorative experiences.
△ Less
Submitted 21 May, 2016;
originally announced May 2016.
-
Chatty Maps: Constructing sound maps of urban areas from social media data
Authors:
Luca Maria Aiello,
Rossano Schifanella,
Daniele Quercia,
Francesco Aletta
Abstract:
Urban sound has a huge influence over how we perceive places. Yet, city planning is concerned mainly with noise, simply because annoying sounds come to the attention of city officials in the form of complaints, while general urban sounds do not come to the attention as they cannot be easily captured at city scale. To capture both unpleasant and pleasant sounds, we applied a new methodology that re…
▽ More
Urban sound has a huge influence over how we perceive places. Yet, city planning is concerned mainly with noise, simply because annoying sounds come to the attention of city officials in the form of complaints, while general urban sounds do not come to the attention as they cannot be easily captured at city scale. To capture both unpleasant and pleasant sounds, we applied a new methodology that relies on tagging information of geo-referenced pictures to the cities of London and Barcelona. To begin with, we compiled the first urban sound dictionary and compared it to the one produced by collating insights from the literature: ours was experimentally more valid (if correlated with official noise pollution levels) and offered a wider geographic coverage. From picture tags, we then studied the relationship between soundscapes and emotions. We learned that streets with music sounds were associated with strong emotions of joy or sadness, while those with human sounds were associated with joy or surprise. Finally, we studied the relationship between soundscapes and people's perceptions and, in so doing, we were able to map which areas are chaotic, monotonous, calm, and exciting.Those insights promise to inform the creation of restorative experiences in our increasingly urbanized world.
△ Less
Submitted 24 March, 2016;
originally announced March 2016.
-
Portrait of an Online Shopper: Understanding and Predicting Consumer Behavior
Authors:
Farshad Kooti,
Kristina Lerman,
Luca Maria Aiello,
Mihajlo Grbovic,
Nemanja Djuric,
Vladan Radosavljevic
Abstract:
Consumer spending accounts for a large fraction of the US economic activity. Increasingly, consumer activity is moving to the web, where digital traces of shop** and purchases provide valuable data about consumer behavior. We analyze these data extracted from emails and combine them with demographic information to characterize, model, and predict consumer behavior. Breaking down purchasing by ag…
▽ More
Consumer spending accounts for a large fraction of the US economic activity. Increasingly, consumer activity is moving to the web, where digital traces of shop** and purchases provide valuable data about consumer behavior. We analyze these data extracted from emails and combine them with demographic information to characterize, model, and predict consumer behavior. Breaking down purchasing by age and gender, we find that the amount of money spent on online purchases grows sharply with age, peaking in late 30s. Men are more frequent online purchasers and spend more money when compared to women. Linking online shop** to income, we find that shoppers from more affluent areas purchase more expensive items and buy them more frequently, resulting in significantly more money spent on online purchases. We also look at dynamics of purchasing behavior and observe daily and weekly cycles in purchasing behavior, similarly to other online activities.
More specifically, we observe temporal patterns in purchasing behavior suggesting shoppers have finite budgets: the more expensive an item, the longer the shopper waits since the last purchase to buy it. We also observe that shoppers who email each other purchase more similar items than socially unconnected shoppers, and this effect is particularly evident among women. Finally, we build a model to predict when shoppers will make a purchase and how much will spend on it. We find that temporal features improve prediction accuracy over competitive baselines. A better understanding of consumer behavior can help improve marketing efforts and make online shop** more pleasant and efficient.
△ Less
Submitted 15 December, 2015;
originally announced December 2015.
-
Smelly Maps: The Digital Life of Urban Smellscapes
Authors:
Daniele Quercia,
Rossano Schifanella,
Luca Maria Aiello,
Kate McLean
Abstract:
Smell has a huge influence over how we perceive places. Despite its importance, smell has been crucially overlooked by urban planners and scientists alike, not least because it is difficult to record and analyze at scale. One of the authors of this paper has ventured out in the urban world and conducted smellwalks in a variety of cities: participants were exposed to a range of different smellscape…
▽ More
Smell has a huge influence over how we perceive places. Despite its importance, smell has been crucially overlooked by urban planners and scientists alike, not least because it is difficult to record and analyze at scale. One of the authors of this paper has ventured out in the urban world and conducted smellwalks in a variety of cities: participants were exposed to a range of different smellscapes and asked to record their experiences. As a result, smell-related words have been collected and classified, creating the first dictionary for urban smell. Here we explore the possibility of using social media data to reliably map the smells of entire cities. To this end, for both Barcelona and London, we collect geo-referenced picture tags from Flickr and Instagram, and geo-referenced tweets from Twitter. We match those tags and tweets with the words in the smell dictionary. We find that smell-related words are best classified in ten categories. We also find that specific categories (e.g., industry, transport, cleaning) correlate with governmental air quality indicators, adding validity to our study.
△ Less
Submitted 26 May, 2015;
originally announced May 2015.
-
Local Ranking Problem on the BrowseGraph
Authors:
Michele Trevisiol,
Luca Maria Aiello,
Paolo Boldi,
Roi Blanco
Abstract:
The "Local Ranking Problem" (LRP) is related to the computation of a centrality-like rank on a local graph, where the scores of the nodes could significantly differ from the ones computed on the global graph. Previous work has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a graph where nodes are webpages and edges are browsing transitions. Recently, this graph has receive…
▽ More
The "Local Ranking Problem" (LRP) is related to the computation of a centrality-like rank on a local graph, where the scores of the nodes could significantly differ from the ones computed on the global graph. Previous work has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a graph where nodes are webpages and edges are browsing transitions. Recently, this graph has received more and more attention in many different tasks such as ranking, prediction and recommendation. However, a web-server has only the browsing traffic performed on its pages (local BrowseGraph) and, as a consequence, the local computation can lead to estimation errors, which hinders the increasing number of applications in the state of the art. Also, although the divergence between the local and global ranks has been measured, the possibility of estimating such divergence using only local knowledge has been mainly overlooked. These aspects are of great interest for online service providers who want to: (i) gauge their ability to correctly assess the importance of their resources only based on their local knowledge, and (ii) take into account real user browsing fluxes that better capture the actual user interest than the static hyperlink network. We study the LRP problem on a BrowseGraph from a large news provider, considering as subgraphs the aggregations of browsing traces of users coming from different domains. We show that the distance between rankings can be accurately predicted based only on structural information of the local graph, being able to achieve an average rank correlation as high as 0.8.
△ Less
Submitted 23 May, 2015;
originally announced May 2015.