-
Recommendation Fairness in Social Networks Over Time
Authors:
Meng Cao,
Hussain Hussain,
Sandipan Sikdar,
Denis Helic,
Markus Strohmaier,
Roman Kern
Abstract:
In social recommender systems, it is crucial that the recommendation models provide equitable visibility for different demographic groups, such as gender or race. Most existing research has addressed this problem by only studying individual static snapshots of networks that typically change over time. To address this gap, we study the evolution of recommendation fairness over time and its relation…
▽ More
In social recommender systems, it is crucial that the recommendation models provide equitable visibility for different demographic groups, such as gender or race. Most existing research has addressed this problem by only studying individual static snapshots of networks that typically change over time. To address this gap, we study the evolution of recommendation fairness over time and its relation to dynamic network properties. We examine three real-world dynamic networks by evaluating the fairness of six recommendation algorithms and analyzing the association between fairness and network properties over time. We further study how interventions on network properties influence fairness by examining counterfactual scenarios with alternative evolution outcomes and differing network properties. Our results on empirical datasets suggest that recommendation fairness improves over time, regardless of the recommendation method. We also find that two network properties, minority ratio, and homophily ratio, exhibit stable correlations with fairness over time. Our counterfactual study further suggests that an extreme homophily ratio potentially contributes to unfair recommendations even with a balanced minority ratio. Our work provides insights into the evolution of fairness within dynamic networks in social science. We believe that our findings will help system operators and policymakers to better comprehend the implications of temporal changes and interventions targeting fairness in social networks.
△ Less
Submitted 7 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Protection from Evil and Good: The Differential Effects of Page Protection on Wikipedia Article Quality
Authors:
Thorsten Ruprechter,
Manoel Horta Ribeiro,
Robert West,
Denis Helic
Abstract:
Wikipedia, the Web's largest encyclopedia, frequently faces content disputes or malicious users seeking to subvert its integrity. Administrators can mitigate such disruptions by enforcing "page protection" that selectively limits contributions to specific articles to help prevent the degradation of content. However, this practice contradicts one of Wikipedia's fundamental principles$-$that it is o…
▽ More
Wikipedia, the Web's largest encyclopedia, frequently faces content disputes or malicious users seeking to subvert its integrity. Administrators can mitigate such disruptions by enforcing "page protection" that selectively limits contributions to specific articles to help prevent the degradation of content. However, this practice contradicts one of Wikipedia's fundamental principles$-$that it is open to all contributors$-$and may hinder further improvement of the encyclopedia. In this paper, we examine the effect of page protection on article quality to better understand whether and when page protections are warranted. Using decade-long data on page protections from the English Wikipedia, we conduct a quasi-experimental study analyzing pages that received "requests for page protection"$-$written appeals submitted by Wikipedia editors to administrators to impose page protections. We match pages that indeed received page protection with similar pages that did not and quantify the causal effect of the interventions on a well-established measure of article quality. Our findings indicate that the effect of page protection on article quality depends on the characteristics of the page prior to the intervention: high-quality articles are affected positively as opposed to low-quality articles that are impacted negatively. Subsequent analysis suggests that high-quality articles degrade when left unprotected, whereas low-quality articles improve. Overall, with our study, we outline page protections on Wikipedia and inform best practices on whether and when to protect an article.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Adversarial Inter-Group Link Injection Degrades the Fairness of Graph Neural Networks
Authors:
Hussain Hussain,
Meng Cao,
Sandipan Sikdar,
Denis Helic,
Elisabeth Lex,
Markus Strohmaier,
Roman Kern
Abstract:
We present evidence for the existence and effectiveness of adversarial attacks on graph neural networks (GNNs) that aim to degrade fairness. These attacks can disadvantage a particular subgroup of nodes in GNN-based node classification, where nodes of the underlying network have sensitive attributes, such as race or gender. We conduct qualitative and experimental analyses explaining how adversaria…
▽ More
We present evidence for the existence and effectiveness of adversarial attacks on graph neural networks (GNNs) that aim to degrade fairness. These attacks can disadvantage a particular subgroup of nodes in GNN-based node classification, where nodes of the underlying network have sensitive attributes, such as race or gender. We conduct qualitative and experimental analyses explaining how adversarial link injection impairs the fairness of GNN predictions. For example, an attacker can compromise the fairness of GNN-based node classification by injecting adversarial links between nodes belonging to opposite subgroups and opposite class labels. Our experiments on empirical datasets demonstrate that adversarial fairness attacks can significantly degrade the fairness of GNN predictions (attacks are effective) with a low perturbation rate (attacks are efficient) and without a significant drop in accuracy (attacks are deceptive). This work demonstrates the vulnerability of GNN models to adversarial fairness attacks. We hope our findings raise awareness about this issue in our community and lay a foundation for the future development of GNN models that are more robust to such attacks.
△ Less
Submitted 16 December, 2022; v1 submitted 13 September, 2022;
originally announced September 2022.
-
Structack: Structure-based Adversarial Attacks on Graph Neural Networks
Authors:
Hussain Hussain,
Tomislav Duricic,
Elisabeth Lex,
Denis Helic,
Markus Strohmaier,
Roman Kern
Abstract:
Recent work has shown that graph neural networks (GNNs) are vulnerable to adversarial attacks on graph data. Common attack approaches are typically informed, i.e. they have access to information about node attributes such as labels and feature vectors. In this work, we study adversarial attacks that are uninformed, where an attacker only has access to the graph structure, but no information about…
▽ More
Recent work has shown that graph neural networks (GNNs) are vulnerable to adversarial attacks on graph data. Common attack approaches are typically informed, i.e. they have access to information about node attributes such as labels and feature vectors. In this work, we study adversarial attacks that are uninformed, where an attacker only has access to the graph structure, but no information about node attributes. Here the attacker aims to exploit structural knowledge and assumptions, which GNN models make about graph data. In particular, literature has shown that structural node centrality and similarity have a strong influence on learning with GNNs. Therefore, we study the impact of centrality and similarity on adversarial attacks on GNNs. We demonstrate that attackers can exploit this information to decrease the performance of GNNs by focusing on injecting links between nodes of low similarity and, surprisingly, low centrality. We show that structure-based uninformed attacks can approach the performance of informed attacks, while being computationally more efficient. With our paper, we present a new attack strategy on GNNs that we refer to as Structack. Structack can successfully manipulate the performance of GNNs with very limited information while operating under tight computational constraints. Our work contributes towards building more robust machine learning approaches on graphs.
△ Less
Submitted 28 July, 2021; v1 submitted 23 July, 2021;
originally announced July 2021.
-
RFID-based Article-to-Fixture Predictions in Real-World Fashion Stores
Authors:
Matthias Wölbitsch,
Thomas Hasler,
Patrick Kasper,
Denis Helic,
Simon Walk
Abstract:
In recent years, Radio Frequency Identification (RFID) technology has been applied to improve numerous processes, such as inventory management in retail stores. However, automatic localization of RFID-tagged goods in stores is still a challenging problem. To address this issue, we equip fixtures (e.g., shelves) with reference tags and use data we collect during RFID-based stocktakes to map article…
▽ More
In recent years, Radio Frequency Identification (RFID) technology has been applied to improve numerous processes, such as inventory management in retail stores. However, automatic localization of RFID-tagged goods in stores is still a challenging problem. To address this issue, we equip fixtures (e.g., shelves) with reference tags and use data we collect during RFID-based stocktakes to map articles to fixtures. Knowing the location of goods enables the implementation of several practical applications, such as automated Money Map** (i.e., a heat map of sales across fixtures). Specifically, we conduct controlled lab experiments and a case-study in two fashion retail stores to evaluate our article-to-fixture prediction approaches. The approaches are based on calculating distances between read event time series using DTW, and clustering of read events using DBSCAN. We find that, read events collected during RFID-based stocktakes can be used to assign articles to fixtures with an accuracy of more than 90%. Additionally, we conduct a pilot to investigate the challenges related to the integration of such a localization system in the day-to-day business of retail stores. Hence, in this paper we present an exploratory venture into novel and practical RFID-based applications in fashion retails stores, beyond the scope of stock management.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
Surfacing Estimation Uncertainty in the Decay Parameters of Hawkes Processes with Exponential Kernels
Authors:
Tiago Santos,
Florian Lemmerich,
Denis Helic
Abstract:
As a tool for capturing irregular temporal dependencies (rather than resorting to binning temporal observations to construct time series), Hawkes processes with exponential decay have seen widespread adoption across many application domains, such as predicting the occurrence time of the next earthquake or stock market spike. However, practical applications of Hawkes processes face a noteworthy cha…
▽ More
As a tool for capturing irregular temporal dependencies (rather than resorting to binning temporal observations to construct time series), Hawkes processes with exponential decay have seen widespread adoption across many application domains, such as predicting the occurrence time of the next earthquake or stock market spike. However, practical applications of Hawkes processes face a noteworthy challenge: There is substantial and often unquantified variance in decay parameter estimations, especially in the case of a small number of observations or when the dynamics behind the observed data suddenly change. We empirically study the cause of these practical challenges and we develop an approach to surface and thereby mitigate them. In particular, our inspections of the Hawkes process likelihood function uncover the properties of the uncertainty when fitting the decay parameter. We thus propose to explicitly capture this uncertainty within a Bayesian framework. With a series of experiments with synthetic and real-world data from domains such as "classical" earthquake modeling or the manifestation of collective emotions on Twitter, we demonstrate that our proposed approach helps to quantify uncertainty and thereby to understand and fit Hawkes processes in practice.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Limiting Tags Fosters Efficiency
Authors:
Tiago Santos,
Keith Burghardt,
Kristina Lerman,
Denis Helic
Abstract:
Tagging facilitates information retrieval in social media and other online communities by allowing users to organize and describe online content. Researchers found that the efficiency of tagging systems steadily decreases over time, because tags become less precise in identifying specific documents, i.e., they lose their descriptiveness. However, previous works did not answer how or even whether c…
▽ More
Tagging facilitates information retrieval in social media and other online communities by allowing users to organize and describe online content. Researchers found that the efficiency of tagging systems steadily decreases over time, because tags become less precise in identifying specific documents, i.e., they lose their descriptiveness. However, previous works did not answer how or even whether community managers can improve the efficiency of tags. In this work, we use information-theoretic measures to track the descriptive and retrieval efficiency of tags on Stack Overflow, a question-answering system that strictly limits the number of tags users can specify per question. We observe that tagging efficiency stabilizes over time, while tag content and descriptiveness both increase. To explain this observation, we hypothesize that limiting the number of tags fosters novelty and diversity in tag usage, two properties which are both beneficial for tagging efficiency. To provide qualitative evidence supporting our hypothesis, we present a statistical model of tagging that demonstrates how novelty and diversity lead to greater tag efficiency in the long run. Our work offers insights into policies to improve information organization and retrieval in online communities.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Volunteer contributions to Wikipedia increased during COVID-19 mobility restrictions
Authors:
Thorsten Ruprechter,
Manoel Horta Ribeiro,
Tiago Santos,
Florian Lemmerich,
Markus Strohmaier,
Robert West,
Denis Helic
Abstract:
Wikipedia, the largest encyclopedia ever created, is a global initiative driven by volunteer contributions. When the COVID-19 pandemic broke out and mobility restrictions ensued across the globe, it was unclear whether Wikipedia volunteers would become less active in the face of the pandemic, or whether they would rise to meet the increased demand for high-quality information despite the added str…
▽ More
Wikipedia, the largest encyclopedia ever created, is a global initiative driven by volunteer contributions. When the COVID-19 pandemic broke out and mobility restrictions ensued across the globe, it was unclear whether Wikipedia volunteers would become less active in the face of the pandemic, or whether they would rise to meet the increased demand for high-quality information despite the added stress inflicted by this crisis. Analyzing 223 million edits contributed from 2018 to 2020 across twelve Wikipedia language editions, we find that Wikipedia's global volunteer community responded remarkably to the pandemic, substantially increasing both productivity and the number of newcomers who joined the community. For example, contributions to the English Wikipedia increased by over 20% compared to the expectation derived from pre-pandemic data. Our work sheds light on the response of a global volunteer population to the COVID-19 crisis, providing valuable insights into the behavior of critical online communities under stress.
△ Less
Submitted 2 November, 2021; v1 submitted 19 February, 2021;
originally announced February 2021.
-
Synwalk -- Community Detection via Random Walk Modelling
Authors:
Christian Toth,
Denis Helic,
Bernhard C. Geiger
Abstract:
Complex systems, abstractly represented as networks, are ubiquitous in everyday life. Analyzing and understanding these systems requires, among others, tools for community detection. As no single best community detection algorithm can exist, robustness across a wide variety of problem settings is desirable. In this work, we present Synwalk, a random walk-based community detection method. Synwalk b…
▽ More
Complex systems, abstractly represented as networks, are ubiquitous in everyday life. Analyzing and understanding these systems requires, among others, tools for community detection. As no single best community detection algorithm can exist, robustness across a wide variety of problem settings is desirable. In this work, we present Synwalk, a random walk-based community detection method. Synwalk builds upon a solid theoretical basis and detects communities by synthesizing the random walk induced by the given network from a class of candidate random walks. We thoroughly validate the effectiveness of our approach on synthetic and empirical networks, respectively, and compare Synwalk's performance with the performance of Infomap and Walktrap. Our results indicate that Synwalk performs robustly on networks with varying mixing parameters and degree distributions. We outperform Infomap on networks with high mixing parameter, and Infomap and Walktrap on networks with many small communities and low average degree. Our work has a potential to inspire further development of community detection via synthesis of random walks and we provide concrete ideas for future research.
△ Less
Submitted 21 January, 2021;
originally announced January 2021.
-
On the Impact of Communities on Semi-supervised Classification Using Graph Neural Networks
Authors:
Hussain Hussain,
Tomislav Duricic,
Elisabeth Lex,
Roman Kern,
Denis Helic
Abstract:
Graph Neural Networks (GNNs) are effective in many applications. Still, there is a limited understanding of the effect of common graph structures on the learning process of GNNs. In this work, we systematically study the impact of community structure on the performance of GNNs in semi-supervised node classification on graphs. Following an ablation study on six datasets, we measure the performance…
▽ More
Graph Neural Networks (GNNs) are effective in many applications. Still, there is a limited understanding of the effect of common graph structures on the learning process of GNNs. In this work, we systematically study the impact of community structure on the performance of GNNs in semi-supervised node classification on graphs. Following an ablation study on six datasets, we measure the performance of GNNs on the original graphs, and the change in performance in the presence and the absence of community structure. Our results suggest that communities typically have a major impact on the learning process and classification performance. For example, in cases where the majority of nodes from one community share a single classification label, breaking up community structure results in a significant performance drop. On the other hand, for cases where labels show low correlation with communities, we find that the graph structure is rather irrelevant to the learning process, and a feature-only baseline becomes hard to beat. With our work, we provide deeper insights in the abilities and limitations of GNNs, including a set of general guidelines for model selection based on the graph structure.
△ Less
Submitted 5 March, 2021; v1 submitted 30 October, 2020;
originally announced October 2020.
-
Empirical Comparison of Graph Embeddings for Trust-Based Collaborative Filtering
Authors:
Tomislav Duricic,
Hussain Hussain,
Emanuel Lacic,
Dominik Kowald,
Denis Helic,
Elisabeth Lex
Abstract:
In this work, we study the utility of graph embeddings to generate latent user representations for trust-based collaborative filtering. In a cold-start setting, on three publicly available datasets, we evaluate approaches from four method families: (i) factorization-based, (ii) random walk-based, (iii) deep learning-based, and (iv) the Large-scale Information Network Embedding (LINE) approach. We…
▽ More
In this work, we study the utility of graph embeddings to generate latent user representations for trust-based collaborative filtering. In a cold-start setting, on three publicly available datasets, we evaluate approaches from four method families: (i) factorization-based, (ii) random walk-based, (iii) deep learning-based, and (iv) the Large-scale Information Network Embedding (LINE) approach. We find that across the four families, random-walk-based approaches consistently achieve the best accuracy. Besides, they result in highly novel and diverse recommendations. Furthermore, our results show that the use of graph embeddings in trust-based collaborative filtering significantly improves user coverage.
△ Less
Submitted 1 February, 2021; v1 submitted 30 March, 2020;
originally announced March 2020.
-
Characterizing the Global Crowd Workforce: A Cross-Country Comparison of Crowdworker Demographics
Authors:
Lisa Posch,
Arnim Bleier,
Fabian Flöck,
Clemens M. Lechner,
Katharina Kinder-Kurlanda,
Denis Helic,
Markus Strohmaier
Abstract:
Since its emergence roughly a decade ago, microtask crowdsourcing has been attracting a heterogeneous set of workers from all over the globe. This paper sets out to explore the characteristics of the international crowd workforce and offers a cross-national comparison of crowdworker populations from ten countries. We provide an analysis and comparison of demographic characteristics and shed light…
▽ More
Since its emergence roughly a decade ago, microtask crowdsourcing has been attracting a heterogeneous set of workers from all over the globe. This paper sets out to explore the characteristics of the international crowd workforce and offers a cross-national comparison of crowdworker populations from ten countries. We provide an analysis and comparison of demographic characteristics and shed light on the significance of microtask income for workers situated in different national contexts. With over 11,000 individual responses, this study is the first large-scale country-level analysis of the characteristics of workers on the platform Appen (formerly CrowdFlower and Figure Eight), one of the two platforms dominating the microtask market. We find large differences between the characteristics of the crowd workforces of different countries, both regarding demography and regarding the importance of microtask income for workers. Furthermore, we find that the composition of the workforce in the ten countries was largely stable across samples taken at different points in time.
△ Less
Submitted 3 November, 2022; v1 submitted 14 December, 2018;
originally announced December 2018.
-
Activity Archetypes in Question-and-Answer (Q&A) Websites - A Study of 50 Stack Exchange Instances
Authors:
Tiago Santos,
Simon Walk,
Roman Kern,
Markus Strohmaier,
Denis Helic
Abstract:
Millions of users on the Internet discuss a variety of topics on Question-and-Answer (Q&A) instances. However, not all instances and topics receive the same amount of attention, as some thrive and achieve self-sustaining levels of activity, while others fail to attract users and either never grow beyond being a small niche community or become inactive. Hence, it is imperative to not only better un…
▽ More
Millions of users on the Internet discuss a variety of topics on Question-and-Answer (Q&A) instances. However, not all instances and topics receive the same amount of attention, as some thrive and achieve self-sustaining levels of activity, while others fail to attract users and either never grow beyond being a small niche community or become inactive. Hence, it is imperative to not only better understand but also to distill deciding factors and rules that define and govern sustainable Q&A instances. We aim to empower community managers with quantitative methods for them to better understand, control and foster their communities, and thus contribute to making the Web a more efficient place to exchange information. To that end, we extract, model and cluster user activity-based time series from $50$ randomly selected Q&A instances from the Stack Exchange network to characterize user behavior. We find four distinct types of user activity temporal patterns, which vary primarily according to the users' activity frequency. Finally, by breaking down total activity in our 50 Q&A instances by the previously identified user activity profiles, we classify those 50 Q&A instances into three different activity profiles. Our parsimonious categorization of Q&A instances aligns with the stage of development and maturity of the underlying communities, and can potentially help operators of such instances: We not only quantitatively assess progress of Q&A instances, but we also derive practical implications for optimizing Q&A community building efforts, as we e.g. recommend which user types to focus on at different developmental stages of a Q&A community.
△ Less
Submitted 10 April, 2019; v1 submitted 15 September, 2017;
originally announced September 2017.
-
How Users Explore Ontologies on the Web: A Study of NCBO's BioPortal Usage Logs
Authors:
Simon Walk,
Lisette Espín-Noboa,
Denis Helic,
Markus Strohmaier,
Mark Musen
Abstract:
Ontologies in the biomedical domain are numerous, highly specialized and very expensive to develop. Thus, a crucial prerequisite for ontology adoption and reuse is effective support for exploring and finding existing ontologies. Towards that goal, the National Center for Biomedical Ontology (NCBO) has developed BioPortal---an online repository designed to support users in exploring and finding mor…
▽ More
Ontologies in the biomedical domain are numerous, highly specialized and very expensive to develop. Thus, a crucial prerequisite for ontology adoption and reuse is effective support for exploring and finding existing ontologies. Towards that goal, the National Center for Biomedical Ontology (NCBO) has developed BioPortal---an online repository designed to support users in exploring and finding more than 500 existing biomedical ontologies. In 2016, BioPortal represents one of the largest portals for exploration of semantic biomedical vocabularies and terminologies, which is used by many researchers and practitioners. While usage of this portal is high, we know very little about how exactly users search and explore ontologies and what kind of usage patterns or user groups exist in the first place. Deeper insights into user behavior on such portals can provide valuable information to devise strategies for a better support of users in exploring and finding existing ontologies, and thereby enable better ontology reuse. To that end, we study and group users according to their browsing behavior on BioPortal using data mining techniques. Additionally, we use the obtained groups to characterize and compare exploration strategies across ontologies. In particular, we were able to identify seven distinct browsing-behavior types, which all make use of different functionality provided by BioPortal. For example, Search Explorers make extensive use of the search functionality while Ontology Tree Explorers mainly rely on the class hierarchy to explore ontologies. Further, we show that specific characteristics of ontologies influence the way users explore and interact with the website. Our results may guide the development of more user-oriented systems for ontology exploration on the Web.
△ Less
Submitted 31 October, 2016; v1 submitted 28 October, 2016;
originally announced October 2016.
-
Assessing the Navigational Effects of Click Biases and Link Insertion on the Web
Authors:
Florian Geigl,
Kristina Lerman,
Simon Walk,
Markus Strohmaier,
Denis Helic
Abstract:
Websites have an inherent interest in steering user navigation in order to, for example, increase sales of specific products or categories, or to guide users towards specific information. In general, website administrators can use the following two strategies to influence their visitors' navigation behavior. First, they can introduce click biases to reinforce specific links on their website by cha…
▽ More
Websites have an inherent interest in steering user navigation in order to, for example, increase sales of specific products or categories, or to guide users towards specific information. In general, website administrators can use the following two strategies to influence their visitors' navigation behavior. First, they can introduce click biases to reinforce specific links on their website by changing their visual appearance, for example, by locating them on the top of the page. Second, they can utilize link insertion to generate new paths for users to navigate over. In this paper, we present a novel approach for measuring the potential effects of these two strategies on user navigation. Our results suggest that, depending on the pages for which we want to increase user visits, optimal link modification strategies vary. Moreover, simple topological measures can be used as proxies for assessing the impact of the intended changes on the navigation of users, even before these changes are implemented.
△ Less
Submitted 20 March, 2016;
originally announced March 2016.
-
Improving Reachability and Navigability in Recommender Systems
Authors:
Daniel Lamprecht,
Markus Strohmaier,
Denis Helic
Abstract:
In this paper, we investigate recommender systems from a network perspective and investigate recommendation networks, where nodes are items (e.g., movies) and edges are constructed from top-N recommendations (e.g., related movies). In particular, we focus on evaluating the reachability and navigability of recommendation networks and investigate the following questions: (i) How well do recommendati…
▽ More
In this paper, we investigate recommender systems from a network perspective and investigate recommendation networks, where nodes are items (e.g., movies) and edges are constructed from top-N recommendations (e.g., related movies). In particular, we focus on evaluating the reachability and navigability of recommendation networks and investigate the following questions: (i) How well do recommendation networks support navigation and exploratory search? (ii) What is the influence of parameters, in particular different recommendation algorithms and the number of recommendations shown, on reachability and navigability? and (iii) How can reachability and navigability be improved in these networks? We tackle these questions by first evaluating the reachability of recommendation networks by investigating their structural properties. Second, we evaluate navigability by simulating three different models of information seeking scenarios. We find that with standard algorithms, recommender systems are not well suited to navigation and exploration and propose methods to modify recommendations to improve this. Our work extends from one-click-based evaluations of recommender systems towards multi-click analysis (i.e., sequences of dependent clicks) and presents a general, comprehensive approach to evaluating navigability of arbitrary recommendation networks.
△ Less
Submitted 29 July, 2015;
originally announced July 2015.
-
Random Surfers on a Web Encyclopedia
Authors:
Florian Geigl,
Daniel Lamprecht,
Rainer Hofmann-Wellenhof,
Simon Walk,
Markus Strohmaier,
Denis Helic
Abstract:
The random surfer model is a frequently used model for simulating user navigation behavior on the Web. Various algorithms, such as PageRank, are based on the assumption that the model represents a good approximation of users browsing a website. However, the way users browse the Web has been drastically altered over the last decade due to the rise of search engines. Hence, new adaptations for the e…
▽ More
The random surfer model is a frequently used model for simulating user navigation behavior on the Web. Various algorithms, such as PageRank, are based on the assumption that the model represents a good approximation of users browsing a website. However, the way users browse the Web has been drastically altered over the last decade due to the rise of search engines. Hence, new adaptations for the established random surfer model might be required, which better capture and simulate this change in navigation behavior. In this article we compare the classical uniform random surfer to empirical navigation and page access data in a Web Encyclopedia. Our high level contributions are (i) a comparison of stationary distributions of different types of the random surfer to quantify the similarities and differences between those models as well as (ii) new insights into the impact of search engines on traditional user navigation. Our results suggest that the behavior of the random surfer is almost similar to those of users - as long as users do not use search engines. We also find that classical website navigation structures, such as navigation hierarchies or breadcrumbs, only exercise limited influence on user navigation anymore. Rather, a new kind of navigational tools (e.g., recommendation systems) might be needed to better reflect the changes in browsing behavior of existing users.
△ Less
Submitted 4 August, 2015; v1 submitted 16 July, 2015;
originally announced July 2015.
-
Activity Dynamics in Collaboration Networks
Authors:
Simon Walk,
Denis Helic,
Florian Geigl,
Markus Strohmaier
Abstract:
Many online collaboration networks struggle to gain user activity and become self-sustaining due to the ramp-up problem or dwindling activity within the system. Prominent examples include online encyclopedias such as (Semantic) MediaWikis, Question and Answering portals such as StackOverflow, and many others. Only a small fraction of these systems manage to reach self-sustaining activity, a level…
▽ More
Many online collaboration networks struggle to gain user activity and become self-sustaining due to the ramp-up problem or dwindling activity within the system. Prominent examples include online encyclopedias such as (Semantic) MediaWikis, Question and Answering portals such as StackOverflow, and many others. Only a small fraction of these systems manage to reach self-sustaining activity, a level of activity that prevents the system from reverting to a non-active state. In this paper, we model and analyze activity dynamics in synthetic and empirical collaboration networks. Our approach is based on two opposing and well-studied principles: (i) without incentives, users tend to lose interest to contribute and thus, systems become inactive, and (ii) people are susceptible to actions taken by their peers (social or peer influence). With the activity dynamics model that we introduce in this paper we can represent typical situations of such collaboration networks. For example, activity in a collaborative network, without external impulses or investments, will vanish over time, eventually rendering the system inactive. However, by appropriately manipulating the activity dynamics and/or the underlying collaboration networks, we can jump-start a previously inactive system and advance it towards an active state. To be able to do so, we first describe our model and its underlying mechanisms. We then provide illustrative examples of empirical datasets and characterize the barrier that has to be breached by a system before it can become self-sustaining in terms of critical mass and activity dynamics. Additionally, we expand on this empirical illustration and introduce a new metric p---the Activity Momentum---to assess the activity robustness of collaboration networks.
△ Less
Submitted 1 February, 2016; v1 submitted 7 May, 2015;
originally announced May 2015.
-
HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web
Authors:
Philipp Singer,
Denis Helic,
Andreas Hotho,
Markus Strohmaier
Abstract:
When users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the factors that drive the production of these trails can be useful for e.g., improving underlying network structures, predicting user clicks or enhancing recommendati…
▽ More
When users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the factors that drive the production of these trails can be useful for e.g., improving underlying network structures, predicting user clicks or enhancing recommendations. In this work, we present a general approach called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states. Our approach utilizes Markov chain models with Bayesian inference. The main idea is to incorporate hypotheses as informative Dirichlet priors and to leverage the sensitivity of Bayes factors on the prior for comparing hypotheses with each other. For eliciting Dirichlet priors from hypotheses, we present an adaption of the so-called (trial) roulette method. We demonstrate the general mechanics and applicability of HypTrails by performing experiments with (i) synthetic trails for which we control the mechanisms that have produced them and (ii) empirical trails stemming from different domains including website navigation, business reviews and online music played. Our work expands the repertoire of methods available for studying human trails on the Web.
△ Less
Submitted 26 March, 2015; v1 submitted 11 November, 2014;
originally announced November 2014.
-
How to Apply Markov Chains for Modeling Sequential Edit Patterns in Collaborative Ontology-Engineering Projects
Authors:
Simon Walk,
Philipp Singer,
Markus Strohmaier,
Denis Helic,
Natalya F. Noy,
Mark Musen
Abstract:
With the growing popularity of large-scale collaborative ontology-engineering projects, such as the creation of the 11th revision of the International Classification of Diseases, we need new methods and insights to help project- and community-managers to cope with the constantly growing complexity of such projects. In this paper, we present a novel application of Markov chains to model sequential…
▽ More
With the growing popularity of large-scale collaborative ontology-engineering projects, such as the creation of the 11th revision of the International Classification of Diseases, we need new methods and insights to help project- and community-managers to cope with the constantly growing complexity of such projects. In this paper, we present a novel application of Markov chains to model sequential usage patterns that can be found in the change-logs of collaborative ontology-engineering projects. We provide a detailed presentation of the analysis process, describing all the required steps that are necessary to apply and determine the best fitting Markov chain model. Amongst others, the model and results allow us to identify structural properties and regularities as well as predict future actions based on usage sequences. We are specifically interested in determining the appropriate Markov chain orders which postulate on how many previous actions future ones depend on. To demonstrate the practical usefulness of the extracted Markov chains we conduct sequential pattern analyses on a large-scale collaborative ontology-engineering dataset, the International Classification of Diseases in its 11th revision. To further expand on the usefulness of the presented analysis, we show that the collected sequential patterns provide potentially actionable information for user-interface designers, ontology-engineering tool developers and project-managers to monitor, coordinate and dynamically adapt to the natural development processes that occur when collaboratively engineering an ontology. We hope that presented work will spur a new line of ontology-development tools, evaluation-techniques and new insights, further taking the interactive nature of the collaborative ontology-engineering process into consideration.
△ Less
Submitted 16 February, 2016; v1 submitted 5 March, 2014;
originally announced March 2014.
-
Detecting Memory and Structure in Human Navigation Patterns Using Markov Chain Models of Varying Order
Authors:
Philipp Singer,
Denis Helic,
Behnam Taraghi,
Markus Strohmaier
Abstract:
One of the most frequently used models for understanding human navigation on the Web is the Markov chain model, where Web pages are represented as states and hyperlinks as probabilities of navigating from one page to another. Predominantly, human navigation on the Web has been thought to satisfy the memoryless Markov property stating that the next page a user visits only depends on her current pag…
▽ More
One of the most frequently used models for understanding human navigation on the Web is the Markov chain model, where Web pages are represented as states and hyperlinks as probabilities of navigating from one page to another. Predominantly, human navigation on the Web has been thought to satisfy the memoryless Markov property stating that the next page a user visits only depends on her current page and not on previously visited ones. This idea has found its way in numerous applications such as Google's PageRank algorithm and others. Recently, new studies suggested that human navigation may better be modeled using higher order Markov chain models, i.e., the next page depends on a longer history of past clicks. Yet, this finding is preliminary and does not account for the higher complexity of higher order Markov chain models which is why the memoryless model is still widely used. In this work we thoroughly present a diverse array of advanced inference methods for determining the appropriate Markov chain order. We highlight strengths and weaknesses of each method and apply them for investigating memory and structure of human navigation on the Web. Our experiments reveal that the complexity of higher order models grows faster than their utility, and thus we confirm that the memoryless model represents a quite practical model for human navigation on a page level. However, when we expand our analysis to a topical level, where we abstract away from specific page transitions to transitions between topics, we find that the memoryless assumption is violated and specific regularities can be observed. We report results from experiments with two types of navigational datasets (goal-oriented vs. free form) and observe interesting structural differences that make a strong argument for more contextual studies of human navigation in future work.
△ Less
Submitted 4 June, 2014; v1 submitted 4 February, 2014;
originally announced February 2014.