-
Social network modeling and applications, a tutorial
Authors:
Lisette Espín-Noboa,
Tiago Peixoto,
Fariba Karimi
Abstract:
Social networks have been widely studied over the last century from multiple disciplines to understand societal issues such as inequality in employment rates, managerial performance, and epidemic spread. Today, these and many more issues can be studied at global scale thanks to the digital footprints that we generate when browsing the Web or using social media platforms. Unfortunately, scientists…
▽ More
Social networks have been widely studied over the last century from multiple disciplines to understand societal issues such as inequality in employment rates, managerial performance, and epidemic spread. Today, these and many more issues can be studied at global scale thanks to the digital footprints that we generate when browsing the Web or using social media platforms. Unfortunately, scientists often struggle to access to such data primarily because it is proprietary, and even when it is shared with privacy guarantees, such data is either no representative or too big. In this tutorial, we will discuss recent advances and future directions in network modeling. In particular, we focus on how to exploit synthetic networks to study real-world problems such as data privacy, spreading dynamics, algorithmic bias, and ranking inequalities. We start by reviewing different types of generative models for social networks including node-attributed and scale-free networks. Then, we showcase how to perform a network selection analysis to characterize the mechanisms of edge formation of any given real-world network.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Interpreting wealth distribution via poverty map inference using multimodal data
Authors:
Lisette Espín-Noboa,
János Kertész,
Márton Karsai
Abstract:
Poverty maps are essential tools for governments and NGOs to track socioeconomic changes and adequately allocate infrastructure and services in places in need. Sensor and online crowd-sourced data combined with machine learning methods have provided a recent breakthrough in poverty map inference. However, these methods do not capture local wealth fluctuations, and are not optimized to produce acco…
▽ More
Poverty maps are essential tools for governments and NGOs to track socioeconomic changes and adequately allocate infrastructure and services in places in need. Sensor and online crowd-sourced data combined with machine learning methods have provided a recent breakthrough in poverty map inference. However, these methods do not capture local wealth fluctuations, and are not optimized to produce accountable results that guarantee accurate predictions to all sub-populations. Here, we propose a pipeline of machine learning models to infer the mean and standard deviation of wealth across multiple geographically clustered populated places, and illustrate their performance in Sierra Leone and Uganda. These models leverage seven independent and freely available feature sources based on satellite images, and metadata collected via online crowd-sourcing and social media. Our models show that combined metadata features are the best predictors of wealth in rural areas, outperforming image-based models, which are the best for predicting the highest wealth quintiles. Our results recover the local mean and variation of wealth, and correctly capture the positive yet non-monotonous correlation between them. We further demonstrate the capabilities and limitations of model transfer across countries and the effects of data recency and other biases. Our methodology provides open tools to build towards more transparent and interpretable models to help governments and NGOs to make informed decisions based on data availability, urbanization level, and poverty thresholds.
△ Less
Submitted 6 April, 2023; v1 submitted 17 February, 2023;
originally announced February 2023.
-
Link recommendations: Their impact on network structure and minorities
Authors:
Antonio Ferrara,
Lisette Espín-Noboa,
Fariba Karimi,
Claudia Wagner
Abstract:
Network-based people recommendation algorithms are widely employed on the Web to suggest new connections in social media or professional platforms. While such recommendations bring people together, the feedback loop between the algorithms and the changes in network structure may exacerbate social biases. These biases include rich-get-richer effects, filter bubbles, and polarization. However, socia…
▽ More
Network-based people recommendation algorithms are widely employed on the Web to suggest new connections in social media or professional platforms. While such recommendations bring people together, the feedback loop between the algorithms and the changes in network structure may exacerbate social biases. These biases include rich-get-richer effects, filter bubbles, and polarization. However, social networks are diverse complex systems and recommendations may affect them differently, depending on their structural properties. In this work, we explore five people recommendation algorithms by systematically applying them over time to different synthetic networks. In particular, we measure to what extent these recommendations change the structure of bi-populated networks and show how these changes affect the minority group. Our systematic experimentation helps to better understand when link recommendation algorithms are beneficial or harmful to minority groups in social networks. In particular, our findings suggest that, while all algorithms tend to close triangles and increase cohesion, all algorithms except Node2Vec are prone to favor and suggest nodes with high in-degree. Furthermore, we found that, especially when both classes are heterophilic, recommendation algorithms can reduce the visibility of minorities.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Inequality and Inequity in Network-based Ranking and Recommendation Algorithms
Authors:
Lisette Espín-Noboa,
Claudia Wagner,
Markus Strohmaier,
Fariba Karimi
Abstract:
Though algorithms promise many benefits including efficiency, objectivity and accuracy, they may also introduce or amplify biases. Here we study two well-known algorithms, namely PageRank and Who-to-Follow (WTF), and show to what extent their ranks produce inequality and inequity when applied to directed social networks. To this end, we propose a directed network model with preferential attachment…
▽ More
Though algorithms promise many benefits including efficiency, objectivity and accuracy, they may also introduce or amplify biases. Here we study two well-known algorithms, namely PageRank and Who-to-Follow (WTF), and show to what extent their ranks produce inequality and inequity when applied to directed social networks. To this end, we propose a directed network model with preferential attachment and homophily (DPAH) and demonstrate the influence of network structure on the rank distributions of these algorithms. Our main findings suggest that (i) inequality is positively correlated with inequity, (ii) inequality is driven by the interplay between preferential attachment, homophily, node activity and edge density, and (iii) inequity is driven by the interplay between homophily and minority size. In particular, these two algorithms reduce, replicate and amplify the representation of minorities in top ranks when majorities are homophilic, neutral and heterophilic, respectively. Moreover, when this representation is reduced, minorities may improve their visibility in the rank by connecting strategically in the network. For instance, by increasing their out-degree or homophily when majorities are also homophilic. These findings shed light on the social and algorithmic mechanisms that hinder equality and equity in network-based ranking and recommendation algorithms.
△ Less
Submitted 22 July, 2022; v1 submitted 30 September, 2021;
originally announced October 2021.
-
HopRank: How Semantic Structure Influences Teleportation in PageRank (A Case Study on BioPortal)
Authors:
Lisette Espín-Noboa,
Florian Lemmerich,
Simon Walk,
Markus Strohmaier,
Mark A. Musen
Abstract:
This paper introduces HopRank, an algorithm for modeling human navigation on semantic networks. HopRank leverages the assumption that users know or can see the whole structure of the network. Therefore, besides following links, they also follow nodes at certain distances (i.e., k-hop neighborhoods), and not at random as suggested by PageRank, which assumes only links are known or visible. We obser…
▽ More
This paper introduces HopRank, an algorithm for modeling human navigation on semantic networks. HopRank leverages the assumption that users know or can see the whole structure of the network. Therefore, besides following links, they also follow nodes at certain distances (i.e., k-hop neighborhoods), and not at random as suggested by PageRank, which assumes only links are known or visible. We observe such preference towards k-hop neighborhoods on BioPortal, one of the leading repositories of biomedical ontologies on the Web. In general, users navigate within the vicinity of a concept. But they also "jump" to distant concepts less frequently. We fit our model on 11 ontologies using the transition matrix of clickstreams, and show that semantic structure can influence teleportation in PageRank. This suggests that users--to some extent--utilize knowledge about the underlying structure of ontologies, and leverage it to reach certain pieces of information. Our results help the development and improvement of user interfaces for ontology exploration.
△ Less
Submitted 15 March, 2019; v1 submitted 13 March, 2019;
originally announced March 2019.
-
Towards Quantifying Sampling Bias in Network Inference
Authors:
Lisette Espín-Noboa,
Claudia Wagner,
Fariba Karimi,
Kristina Lerman
Abstract:
Relational inference leverages relationships between entities and links in a network to infer information about the network from a small sample. This method is often used when global information about the network is not available or difficult to obtain. However, how reliable is inference from a small labelled sample? How should the network be sampled, and what effect does it have on inference erro…
▽ More
Relational inference leverages relationships between entities and links in a network to infer information about the network from a small sample. This method is often used when global information about the network is not available or difficult to obtain. However, how reliable is inference from a small labelled sample? How should the network be sampled, and what effect does it have on inference error? How does the structure of the network impact the sampling strategy? We address these questions by systematically examining how network sampling strategy and sample size affect accuracy of relational inference in networks. To this end, we generate a family of synthetic networks where nodes have a binary attribute and a tunable level of homophily. As expected, we find that in heterophilic networks, we can obtain good accuracy when only small samples of the network are initially labelled, regardless of the sampling strategy. Surprisingly, this is not the case for homophilic networks, and sampling strategies that work well in heterophilic networks lead to large inference errors. These findings suggest that the impact of network structure on relational classification is more complex than previously thought.
△ Less
Submitted 6 March, 2018;
originally announced March 2018.
-
Characterizing Information Diets of Social Media Users
Authors:
Juhi Kulshrestha,
Muhammad Bilal Zafar,
Lisette Espin-Noboa,
Krishna P. Gummadi,
Saptarshi Ghosh
Abstract:
With the widespread adoption of social media sites like Twitter and Facebook, there has been a shift in the way information is produced and consumed. Earlier, the only producers of information were traditional news organizations, which broadcast the same carefully-edited information to all consumers over mass media channels. Whereas, now, in online social media, any user can be a producer of infor…
▽ More
With the widespread adoption of social media sites like Twitter and Facebook, there has been a shift in the way information is produced and consumed. Earlier, the only producers of information were traditional news organizations, which broadcast the same carefully-edited information to all consumers over mass media channels. Whereas, now, in online social media, any user can be a producer of information, and every user selects which other users she connects to, thereby choosing the information she consumes. Moreover, the personalized recommendations that most social media sites provide also contribute towards the information consumed by individual users. In this work, we define a concept of information diet -- which is the topical distribution of a given set of information items (e.g., tweets) -- to characterize the information produced and consumed by various types of users in the popular Twitter social media. At a high level, we find that (i) popular users mostly produce very specialized diets focusing on only a few topics; in fact, news organizations (e.g., NYTimes) produce much more focused diets on social media as compared to their mass media diets, (ii) most users' consumption diets are primarily focused towards one or two topics of their interest, and (iii) the personalized recommendations provided by Twitter help to mitigate some of the topical imbalances in the users' consumption diets, by adding information on diverse topics apart from the users' primary topics of interest.
△ Less
Submitted 5 April, 2017;
originally announced April 2017.
-
How Users Explore Ontologies on the Web: A Study of NCBO's BioPortal Usage Logs
Authors:
Simon Walk,
Lisette Espín-Noboa,
Denis Helic,
Markus Strohmaier,
Mark Musen
Abstract:
Ontologies in the biomedical domain are numerous, highly specialized and very expensive to develop. Thus, a crucial prerequisite for ontology adoption and reuse is effective support for exploring and finding existing ontologies. Towards that goal, the National Center for Biomedical Ontology (NCBO) has developed BioPortal---an online repository designed to support users in exploring and finding mor…
▽ More
Ontologies in the biomedical domain are numerous, highly specialized and very expensive to develop. Thus, a crucial prerequisite for ontology adoption and reuse is effective support for exploring and finding existing ontologies. Towards that goal, the National Center for Biomedical Ontology (NCBO) has developed BioPortal---an online repository designed to support users in exploring and finding more than 500 existing biomedical ontologies. In 2016, BioPortal represents one of the largest portals for exploration of semantic biomedical vocabularies and terminologies, which is used by many researchers and practitioners. While usage of this portal is high, we know very little about how exactly users search and explore ontologies and what kind of usage patterns or user groups exist in the first place. Deeper insights into user behavior on such portals can provide valuable information to devise strategies for a better support of users in exploring and finding existing ontologies, and thereby enable better ontology reuse. To that end, we study and group users according to their browsing behavior on BioPortal using data mining techniques. Additionally, we use the obtained groups to characterize and compare exploration strategies across ontologies. In particular, we were able to identify seven distinct browsing-behavior types, which all make use of different functionality provided by BioPortal. For example, Search Explorers make extensive use of the search functionality while Ontology Tree Explorers mainly rely on the class hierarchy to explore ontologies. Further, we show that specific characteristics of ontologies influence the way users explore and interact with the website. Our results may guide the development of more user-oriented systems for ontology exploration on the Web.
△ Less
Submitted 31 October, 2016; v1 submitted 28 October, 2016;
originally announced October 2016.
-
Discovering and Characterizing Mobility Patterns in Urban Spaces: A Study of Manhattan Taxi Data
Authors:
Lisette Espín-Noboa,
Florian Lemmerich,
Philipp Singer,
Markus Strohmaier
Abstract:
Nowadays, human movement in urban spaces can be traced digitally in many cases. It can be observed that movement patterns are not constant, but vary across time and space. In this work,we characterize such spatio-temporal patterns with an innovative combination of two separate approaches that have been utilized for studying human mobility in the past. First, by using non-negative tensor factorizat…
▽ More
Nowadays, human movement in urban spaces can be traced digitally in many cases. It can be observed that movement patterns are not constant, but vary across time and space. In this work,we characterize such spatio-temporal patterns with an innovative combination of two separate approaches that have been utilized for studying human mobility in the past. First, by using non-negative tensor factorization (NTF), we are able to cluster human behavior based on spatio-temporal dimensions. Second, for understanding these clusters, we propose to use HypTrails, a Bayesian approach for expressing and comparing hypotheses about human trails. To formalize hypotheses we utilize data that is publicly available on the Web, namely Foursquare data and census data provided by an open data platform. By applying this combination of approaches to taxi data in Manhattan, we can discover and explain different patterns in human mobility that cannot be identified in a collective analysis. As one example, we can find a group of taxi rides that end at locations with a high number of party venues (according to Foursquare) on weekend nights. Overall, our work demonstrates that human mobility is not one-dimensional but rather contains different facets both in time and space which we explain by utilizing online data. The findings of this paper argue for a more fine-grained analysis of human mobility in order to make more informed decisions for e.g., enhancing urban structures, tailored traffic control and location-based recommender systems.
△ Less
Submitted 9 February, 2016; v1 submitted 20 January, 2016;
originally announced January 2016.