-
Social Catalysts: Characterizing People Who Spark Conversations Among Others
Authors:
Martin Saveski,
Farshad Kooti,
Sylvia Morelli Vitousek,
Carlos Diuk,
Bryce Bartlett,
Lada Adamic
Abstract:
People assume different and important roles within social networks. Some roles have received extensive study: that of influencers who are well-connected, and that of brokers who bridge unconnected parts of the network. However, very little work has explored another potentially important role, that of creating opportunities for people to interact and facilitating conversation between them. These in…
▽ More
People assume different and important roles within social networks. Some roles have received extensive study: that of influencers who are well-connected, and that of brokers who bridge unconnected parts of the network. However, very little work has explored another potentially important role, that of creating opportunities for people to interact and facilitating conversation between them. These individuals bring people together and act as social catalysts. In this paper, we test for the presence of social catalysts on the online social network Facebook. We first identify posts that have spurred conversations between the poster's friends and summarize the characteristics of such posts. We then aggregate the number of catalyzed comments at the poster level, as a measure of the individual's "catalystness." The top 1% of such individuals account for 31% of catalyzed interactions, although their network characteristics do not differ markedly from others who post as frequently and have a similar number of friends. By collecting survey data, we also validate the behavioral measure of catalystness: a person is more likely to be nominated as a social catalyst by their friends if their posts prompt discussions between other people more frequently. The measure, along with other conversation-related features, is one of the most predictive of a person being nominated as a catalyst. Although influencers and brokers may have gotten more attention for their network positions, our findings provide converging evidence that another important role exists and is recognized in online social networks.
△ Less
Submitted 13 August, 2021; v1 submitted 10 July, 2021;
originally announced July 2021.
-
Causal Network Motifs: Identifying Heterogeneous Spillover Effects in A/B Tests
Authors:
Yuan Yuan,
Kristen M. Altenburger,
Farshad Kooti
Abstract:
Randomized experiments, or "A/B" tests, remain the gold standard for evaluating the causal effect of a policy intervention or product change. However, experimental settings, such as social networks, where users are interacting and influencing one another, may violate conventional assumptions of no interference for credible causal inference. Existing solutions to the network setting include account…
▽ More
Randomized experiments, or "A/B" tests, remain the gold standard for evaluating the causal effect of a policy intervention or product change. However, experimental settings, such as social networks, where users are interacting and influencing one another, may violate conventional assumptions of no interference for credible causal inference. Existing solutions to the network setting include accounting for the fraction or count of treated neighbors in a user's network, yet most current methods do not account for the local network structure beyond simply counting the number of neighbors. Our study provides an approach that accounts for both the local structure in a user's social network via motifs as well as the treatment assignment conditions of neighbors. We propose a two-part approach. We first introduce and employ "causal network motifs", which are network motifs that characterize the assignment conditions in local ego networks; and then we propose a tree-based algorithm for identifying different network interference conditions and estimating their average potential outcomes. Our approach can account for social network theories, such as structural diversity and echo chambers, and also can help specify network interference conditions that are suitable to each experiment. We test our method on a synthetic network setting and on a real-world experiment on a large-scale network, which highlight how accounting for local structures can better account for different interference patterns in networks.
△ Less
Submitted 15 February, 2021; v1 submitted 19 October, 2020;
originally announced October 2020.
-
iPhone's Digital Marketplace: Characterizing the Big Spenders
Authors:
Farshad Kooti,
Mihajlo Grbovic,
Luca Maria Aiello,
Eric Bax,
Kristina Lerman
Abstract:
With mobile shop** surging in popularity, people are spending ever more money on digital purchases through their mobile devices and phones. However, few large-scale studies of mobile shop** exist. In this paper we analyze a large data set consisting of more than 776M digital purchases made on Apple mobile devices that include songs, apps, and in-app purchases. We find that 61% of all the spend…
▽ More
With mobile shop** surging in popularity, people are spending ever more money on digital purchases through their mobile devices and phones. However, few large-scale studies of mobile shop** exist. In this paper we analyze a large data set consisting of more than 776M digital purchases made on Apple mobile devices that include songs, apps, and in-app purchases. We find that 61% of all the spending is on in-app purchases and that the top 1% of users are responsible for 59% of all the spending. These big spenders are more likely to be male and older, and less likely to be from the US. We study how they adopt and abandon individual app, and find that, after an initial phase of increased daily spending, users gradually lose interest: the delay between their purchases increases and the spending decreases with a sharp drop toward the end. Finally, we model the in-app purchasing behavior in multiple steps: 1) we model the time between purchases; 2) we train a classifier to predict whether the user will make a purchase from a new app or continue purchasing from the existing app; and 3) based on the outcome of the previous step, we attempt to predict the exact app, new or existing, from which the next purchase will come. The results yield new insights into spending habits in the mobile digital marketplace.
△ Less
Submitted 25 January, 2017;
originally announced January 2017.
-
Ensemble Validation: Selectivity has a Price, but Variety is Free
Authors:
Eric Bax,
Farshad Kooti
Abstract:
Suppose some classifiers are selected from a set of hypothesis classifiers to form an equally-weighted ensemble that selects a member classifier at random for each input example. Then the ensemble has an error bound consisting of the average error bound for the member classifiers, a term for selectivity that varies from zero (if all hypothesis classifiers are selected) to a standard uniform error…
▽ More
Suppose some classifiers are selected from a set of hypothesis classifiers to form an equally-weighted ensemble that selects a member classifier at random for each input example. Then the ensemble has an error bound consisting of the average error bound for the member classifiers, a term for selectivity that varies from zero (if all hypothesis classifiers are selected) to a standard uniform error bound (if only a single classifier is selected), and small constants. There is no penalty for using a richer hypothesis set if the same fraction of the hypothesis classifiers are selected for the ensemble.
△ Less
Submitted 28 March, 2019; v1 submitted 4 October, 2016;
originally announced October 2016.
-
Evidence of Online Performance Deterioration in User Sessions on Reddit
Authors:
Philipp Singer,
Emilio Ferrara,
Farshad Kooti,
Markus Strohmaier,
Kristina Lerman
Abstract:
This article presents evidence of performance deterioration in online user sessions quantified by studying a massive dataset containing over 55 million comments posted on Reddit in April 2015. After segmenting the sessions (i.e., periods of activity without a prolonged break) depending on their intensity (i.e., how many posts users produced during sessions), we observe a general decrease in the qu…
▽ More
This article presents evidence of performance deterioration in online user sessions quantified by studying a massive dataset containing over 55 million comments posted on Reddit in April 2015. After segmenting the sessions (i.e., periods of activity without a prolonged break) depending on their intensity (i.e., how many posts users produced during sessions), we observe a general decrease in the quality of comments produced by users over the course of sessions. We propose mixed-effects models that capture the impact of session intensity on comments, including their length, quality, and the responses they generate from the community. Our findings suggest performance deterioration: Sessions of increasing intensity are associated with the production of shorter, progressively less complex comments, which receive declining quality scores (as rated by other users), and are less and less engaging (i.e., they attract fewer responses). Our contribution evokes a connection between cognitive and attention dynamics and the usage of online social peer production platforms, specifically the effects of deterioration of user performance.
△ Less
Submitted 26 August, 2016; v1 submitted 23 April, 2016;
originally announced April 2016.
-
The DARPA Twitter Bot Challenge
Authors:
V. S. Subrahmanian,
Amos Azaria,
Skylar Durst,
Vadim Kagan,
Aram Galstyan,
Kristina Lerman,
Linhong Zhu,
Emilio Ferrara,
Alessandro Flammini,
Filippo Menczer,
Andrew Stevens,
Alexander Dekhtyar,
Shuyang Gao,
Tad Hogg,
Farshad Kooti,
Yan Liu,
Onur Varol,
Prashant Shiralkar,
Vinod Vydiswaran,
Qiaozhu Mei,
Tim Hwang
Abstract:
A number of organizations ranging from terrorist groups such as ISIS to politicians and nation states reportedly conduct explicit campaigns to influence opinion on social media, posing a risk to democratic processes. There is thus a growing need to identify and eliminate "influence bots" - realistic, automated identities that illicitly shape discussion on sites like Twitter and Facebook - before t…
▽ More
A number of organizations ranging from terrorist groups such as ISIS to politicians and nation states reportedly conduct explicit campaigns to influence opinion on social media, posing a risk to democratic processes. There is thus a growing need to identify and eliminate "influence bots" - realistic, automated identities that illicitly shape discussion on sites like Twitter and Facebook - before they get too influential. Spurred by such events, DARPA held a 4-week competition in February/March 2015 in which multiple teams supported by the DARPA Social Media in Strategic Communications program competed to identify a set of previously identified "influence bots" serving as ground truth on a specific topic within Twitter. Past work regarding influence bots often has difficulty supporting claims about accuracy, since there is limited ground truth (though some exceptions do exist [3,7]). However, with the exception of [3], no past work has looked specifically at identifying influence bots on a specific topic. This paper describes the DARPA Challenge and describes the methods used by the three top-ranked teams.
△ Less
Submitted 21 April, 2016; v1 submitted 19 January, 2016;
originally announced January 2016.
-
Portrait of an Online Shopper: Understanding and Predicting Consumer Behavior
Authors:
Farshad Kooti,
Kristina Lerman,
Luca Maria Aiello,
Mihajlo Grbovic,
Nemanja Djuric,
Vladan Radosavljevic
Abstract:
Consumer spending accounts for a large fraction of the US economic activity. Increasingly, consumer activity is moving to the web, where digital traces of shop** and purchases provide valuable data about consumer behavior. We analyze these data extracted from emails and combine them with demographic information to characterize, model, and predict consumer behavior. Breaking down purchasing by ag…
▽ More
Consumer spending accounts for a large fraction of the US economic activity. Increasingly, consumer activity is moving to the web, where digital traces of shop** and purchases provide valuable data about consumer behavior. We analyze these data extracted from emails and combine them with demographic information to characterize, model, and predict consumer behavior. Breaking down purchasing by age and gender, we find that the amount of money spent on online purchases grows sharply with age, peaking in late 30s. Men are more frequent online purchasers and spend more money when compared to women. Linking online shop** to income, we find that shoppers from more affluent areas purchase more expensive items and buy them more frequently, resulting in significantly more money spent on online purchases. We also look at dynamics of purchasing behavior and observe daily and weekly cycles in purchasing behavior, similarly to other online activities.
More specifically, we observe temporal patterns in purchasing behavior suggesting shoppers have finite budgets: the more expensive an item, the longer the shopper waits since the last purchase to buy it. We also observe that shoppers who email each other purchase more similar items than socially unconnected shoppers, and this effect is particularly evident among women. Finally, we build a model to predict when shoppers will make a purchase and how much will spend on it. We find that temporal features improve prediction accuracy over competitive baselines. A better understanding of consumer behavior can help improve marketing efforts and make online shop** more pleasant and efficient.
△ Less
Submitted 15 December, 2015;
originally announced December 2015.
-
Evolution of Conversations in the Age of Email Overload
Authors:
Farshad Kooti,
Luca Maria Aiello,
Mihajlo Grbovic,
Kristina Lerman,
Amin Mantrach
Abstract:
Email is a ubiquitous communications tool in the workplace and plays an important role in social interactions. Previous studies of email were largely based on surveys and limited to relatively small populations of email users within organizations. In this paper, we report results of a large-scale study of more than 2 million users exchanging 16 billion emails over several months. We quantitatively…
▽ More
Email is a ubiquitous communications tool in the workplace and plays an important role in social interactions. Previous studies of email were largely based on surveys and limited to relatively small populations of email users within organizations. In this paper, we report results of a large-scale study of more than 2 million users exchanging 16 billion emails over several months. We quantitatively characterize the replying behavior in conversations within pairs of users. In particular, we study the time it takes the user to reply to a received message and the length of the reply sent. We consider a variety of factors that affect the reply time and length, such as the stage of the conversation, user demographics, and use of portable devices. In addition, we study how increasing load affects emailing behavior. We find that as users receive more email messages in a day, they reply to a smaller fraction of them, using shorter replies. However, their responsiveness remains intact, and they may even reply to emails faster. Finally, we predict the time to reply, length of reply, and whether the reply ends a conversation. We demonstrate considerable improvement over the baseline in all three prediction tasks, showing the significant role that the factors that we uncover play, in determining replying behavior. We rank these factors based on their predictive power. Our findings have important implications for understanding human behavior and designing better email management applications for tasks like ranking unread emails.
△ Less
Submitted 2 April, 2015;
originally announced April 2015.
-
The Social Name-Letter Effect on Online Social Networks
Authors:
Farshad Kooti,
Gabriel Magno,
Ingmar Weber
Abstract:
The Name-Letter Effect states that people have a preference for brands, places, and even jobs that start with the same letter as their own first name. So Sam might like Snickers and live in Seattle. We use social network data from Twitter and Google+ to replicate this effect in a new environment. We find limited to no support for the Name-Letter Effect on social networks. We do, however, find a ve…
▽ More
The Name-Letter Effect states that people have a preference for brands, places, and even jobs that start with the same letter as their own first name. So Sam might like Snickers and live in Seattle. We use social network data from Twitter and Google+ to replicate this effect in a new environment. We find limited to no support for the Name-Letter Effect on social networks. We do, however, find a very robust Same-Name Effect where, say, Michaels would be more likely to link to other Michaels than Johns. This effect persists when accounting for gender, nationality, race, and age. The fundamentals behind these effects have implications beyond psychology as understanding how a positive self-image is transferred to other entities is important in domains ranging from studying homophily to personalized advertising and to link formation in social networks.
△ Less
Submitted 20 November, 2014;
originally announced November 2014.
-
Network Weirdness: Exploring the Origins of Network Paradoxes
Authors:
Farshad Kooti,
Nathan O. Hodas,
Kristina Lerman
Abstract:
Social networks have many counter-intuitive properties, including the "friendship paradox" that states, on average, your friends have more friends than you do. Recently, a variety of other paradoxes were demonstrated in online social networks. This paper explores the origins of these network paradoxes. Specifically, we ask whether they arise from mathematical properties of the networks or whether…
▽ More
Social networks have many counter-intuitive properties, including the "friendship paradox" that states, on average, your friends have more friends than you do. Recently, a variety of other paradoxes were demonstrated in online social networks. This paper explores the origins of these network paradoxes. Specifically, we ask whether they arise from mathematical properties of the networks or whether they have a behavioral origin. We show that sampling from heavy-tailed distributions always gives rise to a paradox in the mean, but not the median. We propose a strong form of network paradoxes, based on utilizing the median, and validate it empirically using data from two online social networks. Specifically, we show that for any user the majority of user's friends and followers have more friends, followers, etc. than the user, and that this cannot be explained by statistical properties of sampling. Next, we explore the behavioral origins of the paradoxes by using the shuffle test to remove correlations between node degrees and attributes. We find that paradoxes for the mean persist in the shuffled network, but not for the median. We demonstrate that strong paradoxes arise due to the assortativity of user attributes, including degree, and correlation between degree and attribute.
△ Less
Submitted 27 March, 2014;
originally announced March 2014.
-
Friendship Paradox Redux: Your Friends Are More Interesting Than You
Authors:
Nathan O. Hodas,
Farshad Kooti,
Kristina Lerman
Abstract:
Feld's friendship paradox states that "your friends have more friends than you, on average." This paradox arises because extremely popular people, despite being rare, are overrepresented when averaging over friends. Using a sample of the Twitter firehose, we confirm that the friendship paradox holds for >98% of Twitter users. Because of the directed nature of the follower graph on Twitter, we are…
▽ More
Feld's friendship paradox states that "your friends have more friends than you, on average." This paradox arises because extremely popular people, despite being rare, are overrepresented when averaging over friends. Using a sample of the Twitter firehose, we confirm that the friendship paradox holds for >98% of Twitter users. Because of the directed nature of the follower graph on Twitter, we are further able to confirm more detailed forms of the friendship paradox: everyone you follow or who follows you has more friends and followers than you. This is likely caused by a correlation we demonstrate between Twitter activity, number of friends, and number of followers. In addition, we discover two new paradoxes: the virality paradox that states "your friends receive more viral content than you, on average," and the activity paradox, which states "your friends are more active than you, on average." The latter paradox is important in regulating online communication. It may result in users having difficulty maintaining optimal incoming information rates, because following additional users causes the volume of incoming tweets to increase super-linearly. While users may compensate for increased information flow by increasing their own activity, users become information overloaded when they receive more information than they are able or willing to process. We compare the average size of cascades that are sent and received by overloaded and underloaded users. And we show that overloaded users post and receive larger cascades and they are poor detector of small cascades.
△ Less
Submitted 11 April, 2013;
originally announced April 2013.