-
Link Prediction Accuracy on Real-World Networks Under Non-Uniform Missing Edge Patterns
Authors:
Xie He,
Amir Ghasemian,
Eun Lee,
Alice Schwarze,
Aaron Clauset,
Peter J. Mucha
Abstract:
Real-world network datasets are typically obtained in ways that fail to capture all edges. The patterns of missing data are often non-uniform as they reflect biases and other shortcomings of different data collection methods. Nevertheless, uniform missing data is a common assumption made when no additional information is available about the underlying missing-edge pattern, and link prediction meth…
▽ More
Real-world network datasets are typically obtained in ways that fail to capture all edges. The patterns of missing data are often non-uniform as they reflect biases and other shortcomings of different data collection methods. Nevertheless, uniform missing data is a common assumption made when no additional information is available about the underlying missing-edge pattern, and link prediction methods are frequently tested against uniformly missing edges. To investigate the impact of different missing-edge patterns on link prediction accuracy, we employ 9 link prediction algorithms from 4 different families to analyze 20 different missing-edge patterns that we categorize into 5 groups. Our comparative simulation study, spanning 250 real-world network datasets from 6 different domains, provides a detailed picture of the significant variations in the performance of different link prediction algorithms in these different settings. With this study, we aim to provide a guide for future researchers to help them select a link prediction algorithm that is well suited to their sampled network data, considering the data collection process and application domain.
△ Less
Submitted 30 April, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Causally estimating the effect of YouTube's recommender system using counterfactual bots
Authors:
Homa Hosseinmardi,
Amir Ghasemian,
Miguel Rivera-Lanas,
Manoel Horta Ribeiro,
Robert West,
Duncan J. Watts
Abstract:
In recent years, critics of online platforms have raised concerns about the ability of recommendation algorithms to amplify problematic content, with potentially radicalizing consequences. However, attempts to evaluate the effect of recommenders have suffered from a lack of appropriate counterfactuals -- what a user would have viewed in the absence of algorithmic recommendations -- and hence canno…
▽ More
In recent years, critics of online platforms have raised concerns about the ability of recommendation algorithms to amplify problematic content, with potentially radicalizing consequences. However, attempts to evaluate the effect of recommenders have suffered from a lack of appropriate counterfactuals -- what a user would have viewed in the absence of algorithmic recommendations -- and hence cannot disentangle the effects of the algorithm from a user's intentions. Here we propose a method that we call ``counterfactual bots'' to causally estimate the role of algorithmic recommendations on the consumption of highly partisan content. By comparing bots that replicate real users' consumption patterns with ``counterfactual'' bots that follow rule-based trajectories, we show that, on average, relying exclusively on the recommender results in less partisan consumption, where the effect is most pronounced for heavy partisan consumers. Following a similar method, we also show that if partisan consumers switch to moderate content, YouTube's sidebar recommender ``forgets'' their partisan preference within roughly 30 videos regardless of their prior history, while homepage recommendations shift more gradually towards moderate content. Overall, our findings indicate that, at least since the algorithm changes that YouTube implemented in 2019, individual consumption patterns mostly reflect individual preferences, where algorithmic recommendations play, if anything, a moderating role.
△ Less
Submitted 1 December, 2023; v1 submitted 20 August, 2023;
originally announced August 2023.
-
The Enmity Paradox
Authors:
Amir Ghasemian,
Nicholas A. Christakis
Abstract:
The "friendship paradox" of social networks states that, on average, "your friends have more friends than you do." Here, we theoretically and empirically explore a related and overlooked paradox we refer to as the "enmity paradox." We use empirical data from 24,687 people living in 176 villages in rural Honduras. We show that, for a real negative undirected network (created by symmetrizing antagon…
▽ More
The "friendship paradox" of social networks states that, on average, "your friends have more friends than you do." Here, we theoretically and empirically explore a related and overlooked paradox we refer to as the "enmity paradox." We use empirical data from 24,687 people living in 176 villages in rural Honduras. We show that, for a real negative undirected network (created by symmetrizing antagonistic interactions), the paradox exists as it does in the positive world. Specifically, a person's enemies have more enemies, on average, than a person does. Furthermore, in a mixed world of positive and negative ties, we study the conditions for the existence of the paradox, both theoretically and empirically, finding that, for instance, a person's friends typically have more enemies than a person does. We also confirm the "generalized" enmity paradox for nontopological attributes in real data, analogous to the generalized friendship paradox (e.g., the claim that a person's enemies are richer, on average, than a person is). As a consequence, the naturally occurring variance in the degree distribution of both friendship and antagonism in social networks can skew people's perceptions of the social world.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Examining the consumption of radical content on YouTube
Authors:
Homa Hosseinmardi,
Amir Ghasemian,
Aaron Clauset,
Markus Mobius,
David M. Rothschild,
Duncan J. Watts
Abstract:
Although it is under-studied relative to other social media platforms, YouTube is arguably the largest and most engaging online media consumption platform in the world. Recently, YouTube's scale has fueled concerns that YouTube users are being radicalized via a combination of biased recommendations and ostensibly apolitical anti-woke channels, both of which have been claimed to direct attention to…
▽ More
Although it is under-studied relative to other social media platforms, YouTube is arguably the largest and most engaging online media consumption platform in the world. Recently, YouTube's scale has fueled concerns that YouTube users are being radicalized via a combination of biased recommendations and ostensibly apolitical anti-woke channels, both of which have been claimed to direct attention to radical political content. Here we test this hypothesis using a representative panel of more than 300,000 Americans and their individual-level browsing behavior, on and off YouTube, from January 2016 through December 2019. Using a labeled set of political news channels, we find that news consumption on YouTube is dominated by mainstream and largely centrist sources. Consumers of far-right content, while more engaged than average, represent a small and stable percentage of news consumers. However, consumption of anti-woke content, defined in terms of its opposition to progressive intellectual and political agendas, grew steadily in popularity and is correlated with consumption of far-right content off-platform. We find no evidence that engagement with far-right content is caused by YouTube recommendations systematically, nor do we find clear evidence that anti-woke channels serve as a gateway to the far right. Rather, consumption of political content on YouTube appears to reflect individual preferences that extend across the web as a whole.
△ Less
Submitted 14 February, 2022; v1 submitted 25 November, 2020;
originally announced November 2020.
-
Stacking Models for Nearly Optimal Link Prediction in Complex Networks
Authors:
Amir Ghasemian,
Homa Hosseinmardi,
Aram Galstyan,
Edoardo M. Airoldi,
Aaron Clauset
Abstract:
Most real-world networks are incompletely observed. Algorithms that can accurately predict which links are missing can dramatically speedup the collection of network data and improve the validity of network models. Many algorithms now exist for predicting missing links, given a partially observed network, but it has remained unknown whether a single best predictor exists, how link predictability v…
▽ More
Most real-world networks are incompletely observed. Algorithms that can accurately predict which links are missing can dramatically speedup the collection of network data and improve the validity of network models. Many algorithms now exist for predicting missing links, given a partially observed network, but it has remained unknown whether a single best predictor exists, how link predictability varies across methods and networks from different domains, and how close to optimality current methods are. We answer these questions by systematically evaluating 203 individual link predictor algorithms, representing three popular families of methods, applied to a large corpus of 548 structurally diverse networks from six scientific domains. We first show that individual algorithms exhibit a broad diversity of prediction errors, such that no one predictor or family is best, or worst, across all realistic inputs. We then exploit this diversity via meta-learning to construct a series of "stacked" models that combine predictors into a single algorithm. Applied to a broad range of synthetic networks, for which we may analytically calculate optimal performance, these stacked models achieve optimal or nearly optimal levels of accuracy. Applied to real-world networks, stacked models are also superior, but their accuracy varies strongly by domain, suggesting that link prediction may be fundamentally easier in social networks than in biological or technological networks. These results indicate that the state-of-the-art for link prediction comes from combining individual algorithms, which achieves nearly optimal predictions. We close with a brief discussion of limitations and opportunities for further improvement of these results.
△ Less
Submitted 17 September, 2019;
originally announced September 2019.
-
A Unified Approach to Mitigate Voltage Jump Effects in Near Optimal Switching Surface Control of DC-DC Converters
Authors:
Amir Ghasemian,
Asghar Taheri
Abstract:
The Equivalent Series Resistance (ESR) of the output capacitor may cause output voltage Vo jumps, that are not modeled commonly for second order DC-DC converters, i.e., converters with two second order switched subsystems. These jump discontinuities in Vo lead to performance issues in Switching Surface (SS) controllers. In this paper, these ESR effects are modeled using switched systems with state…
▽ More
The Equivalent Series Resistance (ESR) of the output capacitor may cause output voltage Vo jumps, that are not modeled commonly for second order DC-DC converters, i.e., converters with two second order switched subsystems. These jump discontinuities in Vo lead to performance issues in Switching Surface (SS) controllers. In this paper, these ESR effects are modeled using switched systems with state jumps, called Jump-Flow Switched (JFS) systems. Furthermore, it is shown that approximating the capacitor voltage (Vc), with Vo, can cause undesired limit cycles, oscillations, chattering or instability issues. To resolve these issues, a non-jum** normal switched system is defined for JFS systems, that is equivalent to the internal continuous dynamics. Also, the challenges of designing SS controllers, for this equivalent switched system is studied, and the Constrained Near Optimal (CNO) SS is designed for the equivalent switched system of buck, boost, and buck-boost converters. To eliminate the required estimations, a general class of switching methods are defined, that also avoids chattering and eliminates the conventional hysteresis blocks. The proposed controller is implemented using analog op-amp circuits. Experimental results show fast and robust responses of the controller board with buck, boost, and buck-boost converters.
△ Less
Submitted 1 July, 2019; v1 submitted 24 March, 2019;
originally announced March 2019.
-
Tensor Embedding: A Supervised Framework for Human Behavioral Data Mining and Prediction
Authors:
Homa Hosseinmardi,
Amir Ghasemian,
Shrikanth Narayanan,
Kristina Lerman,
Emilio Ferrara
Abstract:
Today's densely instrumented world offers tremendous opportunities for continuous acquisition and analysis of multimodal sensor data providing temporal characterization of an individual's behaviors. Is it possible to efficiently couple such rich sensor data with predictive modeling techniques to provide contextual, and insightful assessments of individual performance and wellbeing? Prediction of d…
▽ More
Today's densely instrumented world offers tremendous opportunities for continuous acquisition and analysis of multimodal sensor data providing temporal characterization of an individual's behaviors. Is it possible to efficiently couple such rich sensor data with predictive modeling techniques to provide contextual, and insightful assessments of individual performance and wellbeing? Prediction of different aspects of human behavior from these noisy, incomplete, and heterogeneous bio-behavioral temporal data is a challenging problem, beyond unsupervised discovery of latent structures. We propose a Supervised Tensor Embedding (STE) algorithm for high dimension multimodal data with join decomposition of input and target variable. Furthermore, we show that features selection will help to reduce the contamination in the prediction and increase the performance. The efficiently of the methods was tested via two different real world datasets.
△ Less
Submitted 31 August, 2018;
originally announced August 2018.
-
Evaluating Overfit and Underfit in Models of Network Community Structure
Authors:
Amir Ghasemian,
Homa Hosseinmardi,
Aaron Clauset
Abstract:
A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algori…
▽ More
A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i) algorithms vary widely both in the number of communities they find and in their corresponding composition, given the same input, (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks, and (iii) these differences induce wide variation in accuracy on link prediction and link description tasks. We introduce a new diagnostic for evaluating overfitting and underfitting in practice, and use it to roughly divide community detection methods into general and specialized learning algorithms. Across methods and inputs, Bayesian techniques based on the stochastic block model and a minimum description length approach to regularization represent the best general learning approach, but can be outperformed under specific circumstances. These results introduce both a theoretically principled approach to evaluate over and underfitting in models of network community structure and a realistic benchmark by which new methods may be evaluated and compared.
△ Less
Submitted 16 April, 2019; v1 submitted 28 February, 2018;
originally announced February 2018.
-
Detectability thresholds and optimal algorithms for community structure in dynamic networks
Authors:
Amir Ghasemian,
Pan Zhang,
Aaron Clauset,
Cristopher Moore,
Leto Peel
Abstract:
We study the fundamental limits on learning latent community structure in dynamic networks. Specifically, we study dynamic stochastic block models where nodes change their community membership over time, but where edges are generated independently at each time step. In this setting (which is a special case of several existing models), we are able to derive the detectability threshold exactly, as a…
▽ More
We study the fundamental limits on learning latent community structure in dynamic networks. Specifically, we study dynamic stochastic block models where nodes change their community membership over time, but where edges are generated independently at each time step. In this setting (which is a special case of several existing models), we are able to derive the detectability threshold exactly, as a function of the rate of change and the strength of the communities. Below this threshold, we claim that no algorithm can identify the communities better than chance. We then give two algorithms that are optimal in the sense that they succeed all the way down to this limit. The first uses belief propagation (BP), which gives asymptotically optimal accuracy, and the second is a fast spectral clustering algorithm, based on linearizing the BP equations. We verify our analytic and algorithmic results via numerical simulation, and close with a brief discussion of extensions and open questions.
△ Less
Submitted 19 June, 2015;
originally announced June 2015.
-
A First Step to Convolutive Sparse Representation
Authors:
Hamed Firouzi,
Massoud Babaie-Zadeh,
Aria Ghasemian,
Christian Jutten
Abstract:
In this paper an extension of the sparse decomposition problem is considered and an algorithm for solving it is presented. In this extension, it is known that one of the shifted versions of a signal s (not necessarily the original signal itself) has a sparse representation on an overcomplete dictionary, and we are looking for the sparsest representation among the representations of all the shift…
▽ More
In this paper an extension of the sparse decomposition problem is considered and an algorithm for solving it is presented. In this extension, it is known that one of the shifted versions of a signal s (not necessarily the original signal itself) has a sparse representation on an overcomplete dictionary, and we are looking for the sparsest representation among the representations of all the shifted versions of s. Then, the proposed algorithm finds simultaneously the amount of the required shift, and the sparse representation. Experimental results emphasize on the performance of our algorithm.
△ Less
Submitted 20 September, 2008;
originally announced September 2008.