[1,2]\fnmManran \surZhu

[1]\orgdivDepartment of Network and Data Science, \orgnameCentral European University, \orgaddress\streetQuellenstraße 51, \cityVienna, \postcode1100, \stateVienna, \countryAustria

2]\orgdivCenter for Collective Learning, CIAS, \orgnameCorvinus University of Budapest, \orgaddress\streetFővám tér 8, \cityBudapest, \postcode1093, \stateBudapest, \countryHungary

Milgram’s experiment in the knowledge space: Individual navigation strategies

[email protected] \fnmJános \surKertész [email protected] * [

Abstract

Data deluge characteristic for our times has led to information overload, posing a significant challenge to effectively finding our way through the digital landscape. Addressing this issue requires an in-depth understanding of how we navigate through the abundance of information. Previous research has discovered multiple patterns in how individuals navigate in the geographic, social, and information spaces, yet individual differences in strategies for navigation in the knowledge space has remained largely unexplored. To bridge the gap, we conducted an online experiment where participants played a navigation game on Wikipedia and completed questionnaires about their personal information. Utilizing a graph embedding trained on the English Wikipedia, our study identified distinctive strategies that participants adopt: when the target is a famous person, participants typically use the geographical and occupational information of the target to navigate, reminiscent of hub-driven and proximity-driven approaches, respectively. We discovered that many participants playing the same game exhibit a ”wisdom of the crowd” effect: The set of strategies provide a good estimate for the information landscape around the target indicating that the individual differences complement each other.

keywords:

Navigation, Online experiment, Wikipedia, Graph embedding

1 Introduction

Navigating from one place to another is a crucial ability for animals, enabling them to locate essential resources such as food, mates, and habitats [1, 2]. Seeking resources occurs not only in the physical space but also in more abstract spaces, such as when we look for the right person for assistance in the social space [3, 4], or when searching for an answer to a question online in the knowledge space [5]. With the accumulation of massive information online in the past decades, information overload has become a significant challenge for our generation, making efficient way-finding in the information space crucial [6]. To tackle this challenge, we first need to understand how we navigate the information space.

Thoughts on navigation in the social space can be traced to a 1929 thought experiment by the Hungarian author Frigyes Karinthy [7]. Witnessing the burgeoning advancements in communication technologies that seemed to shrink the distances between people, Karinthy suggested that one could choose any one of the at that time Earth’s 1.5 billion people and attempt to connect with them solely through a chain of personal acquaintances. He speculated that it would take no more than five intermediaries to make the connection. Four decades later Stanley Milgram independently implemented the thought experiment[8], where he asked individuals in Omaha, Nebraska, and Wichita, Kansas, to try and send a letter to a person in Boston, Massachusetts, by handing it off through a chain of friends. Despite significant geographical and social separation, many of the letters successfully reached their destination within about six steps on average. This experiment was repeated in 2003 [9] through an email-based version involving roughly 100,000 participants and 18 targets across 13 countries, which essentially replicated Milgram’s findings. These experiments collectively underscore the profound ability of humans to navigate through social networks to connect with the target person.

Previous research have found that our efficient navigation ability in social space is linked to the structure of the social network. Watts et al. [10] observed that the way we are connected socially is highly structured: we all possess different identities and belong to groups characterized by specific social attributes. These group structures naturally form hierarchies, akin to the departmental organization in universities or companies, where individuals belong to groups, which in turn belong to larger groups. They demonstrated that social networks created from this hierarchical structure through simple linking rules are navigable: utilizing a greedy decentralized search algorithm where one always chooses the next step to be the one that’s closest to the target, one can navigate to any target person in a small number of steps steps. Independently, Kleinberg[11] proved that for a network formed from a tree graph under certain linking conditions, a greedy decentralized search algorithm could reach any target in polylogarithmic time. This theory was later empirically confirmed by Adamic et al. [12] who demonstrated that given the email logs at HP Labs, a greedy decentralized search could effectively leverage the organizational hierarchy to find short paths to the target. In fact, different hierarchies can be utilized: in the context of social navigation, individuals typically rely on either geographical or occupational hierarchies to facilitate their navigation [9, 13].

Extending social navigation to the realm of knowledge spaces, attention has been directed towards online information-seeking behaviors, especially on the Wikipedia [14] platform, noted for its wide topic range and significant user interaction. Recently Wikipedia navigation games like Wikispeedia [15] and the Wiki Game [16] gained popularity online, where players are challenged to move from one Wikipedia article (source page) to another (target page) along a chain of hyperlinks of the visited Wikipedia articles. Digital traces left by game players have presented us with an excellent opportunity to examine human navigation behaviors in the knowledge space. Researchers have uncovered intriguing patterns in the navigation game. West et. al. [17] discovered that the players typically leverage the degree of the articles on the Wikipedia network and its textual similarity to the target page to guide their wayfinding on the Wikipedia. In the early game phase players navigate more strongly according to the degree, whereas textual similarity becomes more important at the homing-in phase. Looking at the navigation decision players made at each step, studies have found that compared to the greedy decentralized search algorithm, human players are more biased and stochastic: their current navigation decisions are biased by their previous decisions [18], and they sometimes randomly select their next moves, particularly in the early stages of the game [19].

While previous research has shed light on how we navigate on the social and knowledge network, a comprehensive understanding of how we navigate differently remains elusive. Milgram’s experiment highlighted that people rely on the geographical and occupational information of the target for social navigation, but what are the reasons for these preferences and their implications? Studies have shown that factors such as age, gender, and place of origin can significantly influence navigational performance in physical spaces [20, 21, 22, 23, 24]. Do similar disparities exist in navigating knowledge spaces? Previous studies could not address these questions, for several reasons. Experiment on social navigation recorded the navigation trajectories, but the underlying social network is usually hard to measure. Studies on the Wikipedia navigation games benefit from the well defined structure of the Wikipedia network, but there are not enough data of players playing the same game since the source and target pages are typically randomly assigned. Additionally, the absence of demographic information on participants hampers the exploration of how individual traits impact navigation patterns.

To overcome these limitations, we conducted an online experiment where we hired 802 participants online from the US to play nine rounds of the Wikipedia navigation games and then complete a survey that included questions about their demographic information and other factors potentially relevant to their navigation behavior. Building on the valuable insights gained from previous studies on social navigation, we tailored our experiment to focus on social navigation within the information space: the source and target pages for the navigation games in our study were selected to be well-known individuals from various professions, genders, and historical periods (see Table 2 for the source and target pages of each game). In our previous research [25], we discovered that people exhibit different navigation performances, which could be partially attributed to individual characteristics, such as age and ability to speak a foreign language. In this study, using a graph embedding trained on the English Wikipedia network, we aim to identify the diverse navigation strategies employed by participants and examine how these strategies are influenced by the information landscape and individual characteristics.

2 Results

2.1 Navigation paths in the knowledge space

To understand the underlying knowledge that users rely on to navigate towards a target within the Wikipedia network, our study focused on analyzing Wikipedia pages from two main perspectives: their semantic relationships and their positions within the knowledge hierarchy. For semantic relationships, which refer to how the meaning or context of articles relates to one another, we trained a 64 dimensional embedding using the DeepWalk algorithm [26]. Each article is represented as a point (vector) in a space where articles with similar topics are placed closer together. This method allows us to measure how ”close” two articles $a_{i}$ and $a_{j}$ are by the cosine similarity of their respective vectors $\vec{v}_{i}$ and $\vec{v}_{j}$ in the embedding space:

c(a_{i},a_{j})=1+cos(\vec{v}_{i},\vec{v}_{j})

(1)

On the hierarchical side, we looked at how Wikipedia articles are ranked based on their connections to other articles. We used a specific scoring method, referred to as a hierarchical score [27], to evaluate each article’s position in the network. This score $h(a_{i})$ is calculated by looking at how many links article $a_{i}$ receives (in-degree, $k_{in}(i)$ ) and how many links it sends out (out-degree, $k_{out}(i)$ ) (Equation 2). An article is considered higher in the hierarchy if it has a lot of incoming and outgoing links:

h(a_{i})=\frac{k_{in}^{3/2}(i)+k_{out}^{3/2}(i)}{k_{in}(i)+k_{out}(i)}

(2)

Principal Component Analysis (PCA) [28] was employed to reduce the original 64-dimensional space into a two-dimensional representation to simplify the categorization of the navigational trajectories of the participants within the knowledge domain and to help visualize them. Fig. 1 illustrates the articles visited and the paths taken by participants in the experiment, which started from the source page ’Barack Obama’ and ended at the target page ’Vincent van Gogh’ (see Fig. 4 and Fig. 5 for visualizations of all nine games). Three predominant navigational strategies can be observed in the figure. A subset of participants leveraged the occupational details of the target page, specifically its association with artists, navigating directly to art-related pages from those closely linked to ’Barack Obama’ (depicted as the orange trajectory). Another group utilized geographic information pertaining to the target, in this case, ’Netherlands’ and ’France’, initially moving to pages associated with these countries before proceeding directly to the target page (indicated by the blue trajectories). A third strategy involved a combination of both occupational and geographic information, where participants first navigated to pages related to European countries or cities, subsequently reaching the target through art-related pages (represented by the green paths).

Refer to caption — Figure 1: The figures show the visited Wikipedia articles (figure a) and the successful navigation paths (figure b) in the game with the source page ”Barack Obama” and the target page ”Vincent van Gogh”. Articles belonging to the Geography, Occupation and Source group (see Methods 3.3 for the classification method) are shown as blue, orange and green dots in figure a, and paths categorized as geographical, occupational and mixed are shown as blue, orange and green lines in figure b. Note that the set of mixed paths overlap with the sets of geographical paths and occupational paths. The geographical/occupational paths here refer to the paths in respective sets excluding the mixed paths.

2.2 Navigation strategies

The vector representations of articles allowed a quantitative analysis of the navigation routes employed by participants. To do that, we first clustered the articles visited during the games into three clusters based on their pairwise distances utilizing the KMeans clustering algorithm [29] (for details see Methods 3.3). We found that across all nine games in our experiment, the three clusters discovered this way consistently correspond to three groups of articles, each serving distinct navigational purposes:

•

Occupation Group $\mathcal{O}$ : This cluster comprises articles closely related semantically to the target page (see Fig. 6 for the distribution of the visited articles’ proximity to the target page for each of the nine games in the Geography, Occupation and Source groups respectively). For instance, articles related to art are prominent in this group if the target is ”Vincent van Gogh,” whereas science-related pages are prevalent if the target is ”Albert Einstein”. These articles, often closely interconnected, serve as direct goals for the participants, depicted as orange dots in Fig. 1a.
•

Geography Group $\mathcal{C}$ : This group includes articles pertaining to the countries or cities associated with the target individual, marked as blue dots in Fig. 1a. Although these articles are not as semantically close to the target (see Fig. 6) as those in the Occupation Group, they are linked to a wider array of topics. Participants target these articles as intermediate goals, anticipating links to the target page within their content.
•

Source Group $\mathcal{S}$ : Articles in this cluster are closely related to the source page (see Fig. 6 for the distribution of the visited articles’ proximity to the source page for each of the nine games in the Geography, Occupation and Source groups respectively), illustrated as green dots in Fig. 1a. These articles are initial points of departure, with users expecting them to lead towards the Occupation or Geography Groups.

Given the article groups, we observed that among the successful navigation paths the last Wikipedia article clicked before reaching the target typically falls into either the occupation or geography groups, suggesting that the paths can be effectively classified based on the category of the last article clicked: paths ending with an article from the occupation group are termed ’occupational paths,’ whereas those ending with an article from the geography group are termed ’geographical paths.’ Within the two categories, paths that contain articles from both the occupation and geography groups are identified as ”mixed paths.” These paths demonstrate characteristics of both occupational and geographical paths, indicating a blend of strategies. lastly the few navigation paths that ending with an article from the source group are termed ’other paths’. Fig. 1a illustrates these navigation strategies using color-coded lines, demonstrating that such categorization effectively captures the three distinct route types observed. For the visualization of all nine games, please refer to Fig. 4 and Fig. 5 in the Appendix.

2.3 Hub-driven vs proximity-driven approach

The distinction between geographical and occupational navigation strategies observed in knowledge space navigation can be elucidated through an analogy with transportation networks. In road networks, where shortcuts between destinations are limited, travelers often rely on a proximity-driven strategy, targeting at locations that are close to their final destination due to the necessity of traversing adjacent locations. Conversely, in networks rich with shortcuts, such as airline networks, a hub-driven strategy becomes more viable. Hubs, despite not necessarily being close to the final destination, offer extensive connections across numerous locations. Fig. 2a-b show the distribution of the last clicked articles of the occupational and geographical paths in terms of a) their semantic proximity to the target page, measured by cosine similarity $c$ (Eq. 1), and b) their hierarchical positioning within the knowledge network, measured by the hierarchical score $h$ (Eq. 2). We found that the last clicks in occupational paths are significantly closer to the target page than those in geographical paths in all the games (t-value M=35.65, SD=19.57), suggesting a proximity-driven approach. In contrast, the last clicks in geographical paths tend to rank higher in the knowledge hierarchy, pointing to a hub-driven approach (t-value M=17.54, SD=9.68). The division of the geographical and occupational strategies is further illustrated in Fig. 2c, which shows the pairwise distance among the last clicked articles ordered by its cosine distance to the target page. Note that game O0 is removed from our analysis as an outlier since its success rate is much lower than others (¿ 3 $\sigma$ ), and contains only three geographical paths.

Fig. 3 visualises the navigation paths in terms of the hierarchical score and closeness score of the articles in the paths. As is shown, the distinction between the proximity-driven approach and hub-driven approach is also present at the whole paths level: some paths first ascend high in the hierarchy to reach the hubs and then descend towards the target, while others maintain a low profile, aiming to minimize distance from the target. To quantify to what extent a navigation path is proximity-driven or hub-driven, we calculated the average hierarchical score $H_{j}$ and average closeness score $C_{j}$ for each navigation path $j$ consisting of articles $A_{j}={a_{k}}$ defined below (Eq. 3), where $N_{j}=|A_{j}|$ is the number of the articles in the path, $h(a_{k})$ the hierarchical score (Eq. 2) of article $a_{k}$ , and $c(a_{k})$ the closeness to the target page (Eq. 1). As are shown in the subfigures in Fig. 3, occupational paths are generally more proximity-driven (t-value M=19.04, SD=6.33) and geographical paths more hub-driven (t-value M=9.32, SD=3.22), although some occupational paths also utilize hubs to get to the target (mixed paths).

	$\displaystyle H_{j}$	$\displaystyle=\frac{1}{N_{j}}\sum_{a_{k}\in A_{j}}h(a_{k})$		(3)
	$\displaystyle C_{j}$	$\displaystyle=\frac{1}{N_{j}}\sum_{a_{k}\in A_{j}}c(a_{k},target)$		(4)

Is hub-driven approach more effective than proximity-driven approach? To address this question, we implemented linear regression models to predict the performance of the players measured by the time (in seconds) and steps saved in the Speed-race and Least-clicks respectively. Here, we focus solely on successful navigation paths, positing that minimizing time and steps reflects superior player performance. Table 1 shows that the effectiveness of navigation strategies is moderated by the game’s timing conditions. Specifically, in the Least-clicks games lacking a time restriction, both hub-driven and proximity-driven approaches can significantly improve performance. Conversely, in timed Speed-race games, while the hub-driven strategy remains beneficial, the proximity-driven strategy has the opposite effect. This difference may stem from the fact that it takes more time to identify pages closely related to the target, as opposed to directly jump to a highly connected Wikipedia page.

Table 1: The table shows the linear regression results for the fitness of hub-driven and proximity-driven navigation strategies, measured as the seconds saved in the Speed-race games or steps saved in the Least-clicks games.

	Dependent variable:
	Seconds saved	Steps saved
	Speed-race games	Least-click games
Steps	$-$ 7.066 ${}^{***}$ (0.345)
Seconds		$-$ 0.002 ${}^{***}$ (0.0001)
Hub-driven score	3.248 ${}^{**}$ (1.051)	0.405 ${}^{***}$ (0.041)
Proximity-driven score	$-$ 3.259 ${}^{**}$ (1.103)	0.200 ${}^{***}$ (0.039)
Source page knowledge	3.268 ${}^{**}$ (1.116)	0.030 (0.037)
Target page knowledge	1.436 (1.032)	0.034 (0.036)
Game Round	1.027 ${}^{**}$ (0.321)	$-$ 0.012 (0.011)
Constant	107.372 ${}^{***}$ (4.010)	3.209 ${}^{***}$ (0.110)
Observations	1,174	1,707
R ${}^{2}$	0.367	0.203
Adjusted R ${}^{2}$	0.359	0.196
Residual Std. Error	27.904 (df = 1159)	1.182 (df = 1692)
F Statistic	47.984 ${}^{***}$ (df = 14; 1159)	30.765 ${}^{***}$ (df = 14; 1692)
Note:	${}^{}$ p $<$ 0.05; ${}^{}$ p $<$ 0.01; ${}^{**}$ p $<$ 0.001

2.4 Wisdom of the crowd

To understand why individuals prefer one navigation strategy over another—whether due to personal preference or the structure of the knowledge space—we began by examining the split between geographical and occupational paths. This distinction reveals whether players aimed for the target through articles nearby or opted for more distant hubs. We analyzed all incoming neighbors of the target page within the Wikipedia network, determining how many were proximate (cosine distance below 0.3) versus those more distant (cosine distance above 0.3). This analysis was compared to player strategies: the frequency with which players reached the target via nearby neighbors at the last step (occupational) as opposed to distant neighbors (geographical). We discovered that the ratio of occupational paths closely matched the ratio of proximate in-neighbors of the target page (adjusted R squared = 0.96) in six out of nine games, and the rest three games (O0, B0, O1) were among the top four games where the players were least familiar with the target pages (Table 2). To determine whether this effect was attributable to the Wikipedia network’s structure, we generated 10 synthetic paths for each empirical navigation path, maintaining the path length (see Methods 3.4). We found that the synthetic paths have a biased tendency towards the geographical strategy (Fig. 2). The result is robust to changes in the cosine distance threshold, as shown in Fig. 8. Our findings suggest the presence of crowd wisdom: we are adept at leveraging the knowledge landscape surrounding a target page and planning our routes accordingly.

To explore how individual characteristics influence participants’ preferences for navigation strategies, we developed several regression models. These models use personal characteristics (see Methods 3.1) and features of the current and previous games as independent variables, with the choice of a geographical or occupational navigation path as the dependent variable. We performed logistic regressions for the first and second rounds of the experiment separately to remove significant relationships lacking robustness. The regression results (Table 3) indicate that no individual characteristic consistently has a significant impact on the participants’ selection of navigation strategies across both rounds of the experiment. When we changed the dependent variable to measure how hub-driven (H) or proximity-driven (C) a navigation path is, we observed that no individual characteristics consistently had a significant influence on participants’ choices of navigation strategies. The exception was left-handed individuals, who showed a significant tendency towards a hub-driven approach in both rounds of the games (see Table 4). Interestingly, the timing condition of the game was found to have an impact: in Speed-race games, where a time constraint exists, participants tended to favor a hub-driven approach, whereas in Least-clicks games, which impose a limit on the number of steps, participants were more inclined towards a proximity-driven approach.

3 Methods

3.1 The experiment

Our longitudinal study comprises two rounds of online experiments, the first conducted in January 2020 and the second in October 2023. Participants were sourced from Prolific [30], a well-regarded crowdsourcing platform for behavioral studies [31]. The experiments were conducted on the Qualtrics [32] platform, where we embedded Wikipedia navigation games into the Qualtrics survey using custom JavaScript, followed by a survey. We utilized the 20190820 English Wiki Dump [33] for the navigation games in both experiment rounds. This Wikipedia snapshot includes 5.9 million nodes and 133.6 million edges.

In our experiment, each participant engages in nine rounds of the Wikipedia navigation game and completes a survey afterward. The source and target Wikipedia articles for each game are shown in Table 2. Participants have the choice between a Speed-race game or a Least-clicks game challenge. To win, they must navigate to the target page within 150 seconds for Speed-race games or in 7 steps for Least-clicks games. During each game, the interface displays pages visited earlier in the current game on the left margin, allowing players to backtrack to any of those pages (backclick). Following the game session, the survey sessions commence with a Big Five personality test [34], assessing participants’ five personality traits: openness to experience, conscientiousness, extroversion, agreeableness, and neuroticism. Following this, we pose six categories of questions to gather information about participants’ i) employment status, ii) educational background, iii) spatial navigation habits, and their previous experience with iv) the Wikipedia navigation game, v) the Wikipedia website, and vi) computer games. We also inquire about demographic details, including age, gender, ethnicity, political affiliation, and language skills. An attention check question is included at the survey’s end, requiring participants to slide a bar to the left. A full list of the survey questions and the encoded variables are listed in the Supplementary Information.

3.2 Embedding of the Wikipedia articles

To quantify the similarity among the Wikipedia articles, we trained a 64-dimensional node embedding for each Wikipage $a_{i}$ across the English Wikipedia graph $G$ , employing the DeepWalk algorithm [26]. Graph embedding serves as a method to represent each graph node as a numerical vector $\vec{v}_{i}$ within a continuous space, positioning similar nodes in proximity to one another. This technique enables the measurement of dissimilarity between two nodes by calculating the distance between their corresponding vectors. Specifically, our graph embedding allocates a 64-dimensional numerical vector $\vec{v}_{i}$ to each Wikipedia page $a_{i}$ , which is then used to establish a semantic distance measure between pairs of Wikipages:

d(a_{i},a_{j})=1-\frac{\vec{v}_{i}\cdot\vec{v}_{j}}{\|\vec{v}_{i}\|\|\vec{v}_{% j}\|},

where the semantic distance $d(a_{i},a_{j})$ between the Wikipedia pages $a_{i}$ and $a_{j}$ is determined by the cosine distance between their graph embeddings $\vec{v}_{i}$ and $\vec{v}_{j}$ . For subsequent calculations, unless otherwise specified, all embedding vectors are normalized to have a unit length. To assess the effectiveness of our embedding, we conducted tests using the WikipediaSimilarity 353 Test [35], an adaptation of the earlier dataset, WordSimilarity 353 Test [36], designed to evaluate semantic relatedness among words. Our graph embedding achieved a Spearman rank correlation score of 0.667 with the WikipediaSimilarity 353 test, demonstrating performance on par with the current best measures of semantic relatedness for Wikipedia pages [37].

3.3 Categorization of the visited articles

The graph embedding technique enables us to categorize visited Wikipedia articles based on their semantic distances. For this categorization, we utilized the KMeans clustering algorithm [29], dividing the visited articles into three clusters in each game. The Euclidean distance was selected as the measure between Wikipedia pages for clustering purposes, as the KMeans algorithm calculates the centroids of data points. To accommodate the varying frequencies of article visits, we assigned a weight $w_{i}=\log(n_{i})+0.1$ to each Wikipedia page $a_{i}$ , where $n_{i}$ represents the total number of visits to the article, and the constant $0.1$ ensures that articles visited only once still receive a base weight. Fig. 4 illustrates the identified clusters and their representation in a two-dimensional space, where the horizontal and vertical axes correspond to the first and second principal components of the embedding vectors, respectively, reduced in dimensionality through Principal Components Analysis [28]. The three clusters were subsequently labeled as follows: the cluster containing the source page was termed the Source Group, the cluster with the target page was called the Occupation Group, and the remaining cluster was named the Geography Group.The clustering method employed resulted in an average Silhouette coefficient of 0.18 (SD = 0.020) across all nine games. When considering the distance between articles solely within the two-dimensional space reduced to the first two principal components, the average Silhouette coefficient improves to 0.45 (SD = 0.052). It’s important to note, however, that our primary objective isn’t to achieve optimal clustering but rather to extract insights at a more coarse-grained level.

3.4 Generation of synthetic paths

To understand whether participants’ preferences for the navigation strategies are due to the network structure, we generated 10 synthetic paths for each successful navigation path while preserving the original path length. A synthetic path corresponding to an empirical navigation path with $n$ steps is generated as follows step by step. Starting from the source page, we gather the out-neighbors of the source page that can reach the target within $n$ steps on the Wikipedia network and randomly select one as the next step. Moving to the chosen step, we again collect the out-neighbors of the chosen step and randomly choose one among those that can reach the target in $n-1$ steps. This procedure is repeated until the target page is chosen as the last step.

4 Discussion

Our study investigated the navigation strategies employed by participants in navigation tasks on the Wikipedia network. By utilizing a graph embedding trained on the English Wikipedia network to assess semantic distances among articles and a local hierarchical score for article hierarchy, we found that participants generally rely on geographical and occupational information about the target person to guide their social navigation in the knowledge space, reminiscent of hub-driven and proximity-driven approaches respectively on the English Wikipedia network. The effectiveness of these approaches is influenced by the timing conditions of the navigation tasks: in Least-clicks games without time constraints, both hub-driven and proximity-driven strategies can significantly enhance performance. However, in timed Speed-race games, while the hub-driven strategy remains advantageous, the proximity-driven strategy tends to be detrimental. The division between occupational and geographical navigation strategies suggests a ”wisdom of the crowd” effect, indicating that the collective strategies accurately reflect the information landscape surrounding the target, a wisdom not biased by individual traits.

In our experiment, we implemented social navigation within the information space, where participants’ navigation trajectories reflect their thought processes rather than the people in their social networks. We observed that the division of occupational and geographical navigation paths discovered in previous work on social network navigation [8, 13, 9] also exist in information space navigation. Interestingly, this division mirrors the information landscape on Wikipedia surrounding notable individuals, suggesting that representation of the geographical origin and occupation of people may be foundational to our mental or cognitive map of the social world. Indeed, prior research has indicated that our hippocampus is capable of representing abstract quantities, such as a person’s affiliations and power within social encounters [3], facilitating the search for suitable assistance in finding accommodation or employment.

Previous research on wayfinding in the information network [17] have studied the interplay between degree and proximity of the nodes on the network within a single player’s navigation trajectory. Our findings extend this by showing that this interplay occurs not only in the navigation process of individual players but also at a macro level across different players. As discussed in Results 2.3, this trade-off is a natural outcome of the structure of the knowledge network which underlies our navigation. In a landscape where shortcuts connecting distant locations are scarce, a proximity-driven approach becomes natural. Conversely, in the presence of hubs connecting numerous locations, a hub-driven navigation approach may be more advantageous.

Our study is subject to several limitations. First, restricting navigation tasks to person-to-person may overlook potential changes in navigation patterns when source and target pages are not well-known individuals but, for instance, lesser-known individuals or non-human concepts such as objects, events, or theoretical ideas. Second, the impact of the participants’ prior knowledge with the source and target Wikipedia pages on navigation strategies, which is reasonably expected but not observed in our experiment, may be due to self-evaluated survey responses not accurately capturing participants’ objective prior knowledge. In future work, incorporating more objective measures of factors like spatial navigation skills and knowledge about the source and target Wikipedia page will be necessary. Lastly, our analysis only included successful navigation attempts. Develo** methodologies to analyze failed navigation attempts could offer deeper insights into providing personalized guidance to improve navigation in the online information space.

Our study extends prior research on individual differences in navigation strategies within the knowledge network. A logical progression would be to introduce navigation tasks where source and target pages are not limited to well-known individuals, but instead include lesser-known individuals or non-human concepts such as objects, events, or theoretical ideas. Furthermore, investigating algorithms to enhance online navigation support presents a promising research direction.

Declarations

\bmhead

Availability of data and materials

The datasets generated and/or analysed during the current study are not publicly available due to ethical reasons but are available from the corresponding author on reasonable request.

\bmhead

Competing interests

The authors declare that they have no competing interests.

\bmhead

Funding

This project was supported by the Humboldt Foundation within the Research Group Linkage Program. JK and MZ were partially supported through ERC grant No. 810115-DYNASET. MZ acknowledges further support from 101086712-LearnData-HORIZON-WIDERA-2022-TALENTS-01 financed by EUROPEAN RESEARCH EXECUTIVE AGENCY (REA), CORDIS. JK acknowledges further support from Horizon 2020 ”INFRAIA-01-2018-2019” project ”SoBigData++”, grant No. 871042.

\bmhead

Authors’ contributions

All authors contributed to the conception and design of the research. MZ led the experiment and collected the data. All authors analyzed the data and wrote the paper.

\bmhead

Acknowledgements

We are grateful to Markus Strohmaier for his valuable advice.

\bmhead

Ethics declarations

All subjects gave their informed consent for inclusion before they participated in the study. The protocol of the study was approved by the Ethics Committee of Central European University (reference number: 2022-2023/1/EX). All methods of the study were carried out following the principles of the Belmont Report.

\bmhead

Supplementary information

The supplementary materials are included.

Appendix A Figures

This section contains the supplementary figures for the study.

Appendix B Tables

This section contains the supplementary tables for the study.

Table 2: The table shows the

R_{0}

and

R_{emp}

defined in Results 2.4 and the participants’ average prior knowledge of the target page in each of the nine games, ordered by increasing order.

	Source page	Target page	R0	R_emp	Prior knowledge
B0	Donald Trump	Pyotr Ilyich Tchaikovsky	0.68	0.21	0.87
A2	Marie Curie	Chuck Berry	0.78	0.80	1.06
O1	Steve Jobs	Charlie Chaplin	0.35	0.79	1.54
O0	Alexander the Great	Tim Burton	0.51	0.83	1.58
A0	Barack Obama	Vincent van Gogh	0.48	0.51	1.76
B2	Angelina Jolie	Charles Darwin	0.16	0.27	1.97
O2	Elizabeth I of England	Albert Einstein	0.31	0.31	2.03
B1	Jeff Bezos	Kanye West	0.73	0.78	2.32
A1	Bill Gates	Eminem	0.69	0.75	2.35

Table 3: The table shows the logistic regression results for the geographical and occupational navigation strategies. Please note that only variables that are significant (p

<

0.01) in at least one round of the experiment are shown in the table. Additionally, variables representing which game was played are also omitted from the table for better readability.

	Dependent variable:
	Is Geographical Path		Is Occupational Path
	First Round	Second Round	First Round	Second Round
Play computer games frequently	0.088 (0.083)	0.387 ${}^{***}$ (0.116)	$-$ 0.088 (0.082)	$-$ 0.412 ${}^{***}$ (0.116)
Like to play computer games	0.094 (0.137)	$-$ 0.506 ${}^{**}$ (0.176)	$-$ 0.077 (0.135)	0.528 ${}^{**}$ (0.175)
Left-handed	0.659 ${}^{***}$ (0.192)	$-$ 0.237 (0.261)	$-$ 0.587 ${}^{**}$ (0.189)	0.100 (0.257)
Job is intensive	0.197 ${}^{*}$ (0.084)	0.053 (0.101)	$-$ 0.216 ${}^{**}$ (0.082)	$-$ 0.067 (0.100)
Age	$-$ 0.031 ${}^{**}$ (0.010)	$-$ 0.006 (0.010)	0.032 ${}^{**}$ (0.010)	0.008 (0.010)
Speaks a foreign language	0.258 (0.158)	0.524 ${}^{**}$ (0.193)	$-$ 0.267 (0.156)	$-$ 0.554 ${}^{**}$ (0.192)
Current game round	0.032 (0.029)	0.107 ${}^{**}$ (0.037)	$-$ 0.030 (0.029)	$-$ 0.104 ${}^{**}$ (0.036)
Constant	1.424 (1.081)	$-$ 1.944 (1.408)	$-$ 1.098 (1.066)	1.860 (1.405)
Observations	1,495	1,095	1,495	1,095
Log Likelihood	$-$ 683.128	$-$ 474.402	$-$ 700.664	$-$ 477.243
Akaike Inf. Crit.	1,458.256	1,040.804	1,493.328	1,046.486
Note:	${}^{}$ p $<$ 0.05; ${}^{}$ p $<$ 0.01; ${}^{**}$ p $<$ 0.001

Table 4: The table shows the linear regression results for the hub-driven and proximity-driven scores for the Speed-race games and Least-clicks games. Please note that only variables that are significant (p

<

0.01) in at least one round of the experiment are shown in the table. Additionally, variables representing which game was played are also omitted from the table for better readability.

	Dependent variable:
	H		C
	First Round	Second Round	First Round	Second Round
extraversion	$-$ 0.003 ${}^{**}$ (0.001)	0.001 (0.001)	0.004 ${}^{**}$ (0.001)	0.001 (0.001)
Played the game before	0.019 ${}^{**}$ (0.007)	$-$ 0.006 (0.008)	$-$ 0.015 (0.009)	$-$ 0.012 (0.009)
Use Wikipedia frequently	0.022 ${}^{**}$ (0.007)	0.021 ${}^{*}$ (0.008)	$-$ 0.009 (0.008)	$-$ 0.012 (0.009)
Play computer games frequently	0.001 (0.005)	0.017 ${}^{*}$ (0.007)	$-$ 0.009 (0.006)	$-$ 0.022 ${}^{**}$ (0.008)
Left-handed	0.043 ${}^{***}$ (0.013)	$-$ 0.033 ${}^{*}$ (0.016)	$-$ 0.038 ${}^{*}$ (0.015)	0.015 (0.018)
Employed	0.002 (0.011)	0.019 (0.013)	$-$ 0.003 (0.013)	$-$ 0.040 ${}^{**}$ (0.015)
Age	$-$ 0.003 ${}^{***}$ (0.001)	$-$ 0.001 (0.001)	0.003 ${}^{***}$ (0.001)	0.001 (0.001)
Has time constraint	0.038 ${}^{***}$ (0.010)	0.024 ${}^{*}$ (0.012)	$-$ 0.061 ${}^{***}$ (0.011)	$-$ 0.078 ${}^{***}$ (0.013)
Constant	0.613 ${}^{***}$ (0.072)	0.305 ${}^{***}$ (0.087)	0.241 ${}^{**}$ (0.083)	0.471 ${}^{***}$ (0.098)
Observations	1,495	1,095	1,495	1,095
R ${}^{2}$	0.268	0.220	0.250	0.262
Adjusted R ${}^{2}$	0.245	0.186	0.227	0.230
Residual Std. Error	0.170 (df = 1449)	0.166 (df = 1049)	0.198 (df = 1449)	0.186 (df = 1049)
F Statistic	11.782 ${}^{***}$ (df = 45; 1449)	6.566 ${}^{***}$ (df = 45; 1049)	10.741 ${}^{***}$ (df = 45; 1449)	8.281 ${}^{***}$ (df = 45; 1049)
Note:	${}^{}$ p $<$ 0.05; ${}^{}$ p $<$ 0.01; ${}^{**}$ p $<$ 0.001

References

\bibcommenthead

Epstein et al. [2017] Epstein, R.A., Patai, E.Z., Julian, J.B., Spiers, H.J.: The cognitive map in humans: spatial navigation and beyond. Nature neuroscience 20(11), 1504–1513 (2017)

Stachenfeld et al. [2017] Stachenfeld, K.L., Botvinick, M.M., Gershman, S.J.: The hippocampus as a predictive map. Nature neuroscience 20(11), 1643–1653 (2017)

Tavares et al. [2015] Tavares, R.M., Mendelsohn, A., Grossman, Y., Williams, C.H., Shapiro, M., Trope, Y., Schiller, D.: A map for social navigation in the human brain. Neuron 87(1), 231–243 (2015)

Schafer and Schiller [2018] Schafer, M., Schiller, D.: Navigating social space. Neuron 100(2), 476–489 (2018)

Pirolli and Card [1999] Pirolli, P., Card, S.: Information foraging. Psychological review 106(4), 643 (1999)

Bush et al. [1945] Bush, V., et al.: As we may think. The atlantic monthly 176(1), 101–108 (1945)

Karinthy [1929] Karinthy, F.: Chain-links in: Everything is Different. Atheneneum (1929). http://vadeker.net/articles/Karinthy-Chain-Links_1929.pdf

Milgram [1967] Milgram, S.: The small world problem. Psychology today 2(1), 60–67 (1967)

Dodds et al. [2003] Dodds, P.S., Muhamad, R., Watts, D.J.: An experimental study of search in global social networks. science 301(5634), 827–829 (2003)

Watts et al. [2002] Watts, D.J., Dodds, P.S., Newman, M.E.: Identity and search in social networks. science 296(5571), 1302–1305 (2002)

Kleinberg [2001] Kleinberg, J.: Small-world phenomena and the dynamics of information. Advances in neural information processing systems 14 (2001)

Adamic and Adar [2005] Adamic, L., Adar, E.: How to search a social network. Social networks 27(3), 187–203 (2005)

Killworth and Bernard [1978] Killworth, P.D., Bernard, H.R.: The reversal small-world experiment. Social networks 1(2), 159–192 (1978)

[14] Wikipedia, the free encyclopedia. (2024). https://en.wikipedia.org/ Accessed 2024

[15] Wikispeedia (2024). https://dlab.epfl.ch/wikispeedia/play/ Accessed 2024

[16] The Wiki Game - Wikipedia Game - Explore Wikipedia! (2024). https://www.thewikigame.com/ Accessed 2024

West and Leskovec [2012] West, R., Leskovec, J.: Human wayfinding in information networks. In: Proceedings of the 21st International Conference on World Wide Web, pp. 619–628 (2012)

Singer et al. [2014] Singer, P., Helic, D., Taraghi, B., Strohmaier, M.: Detecting memory and structure in human navigation patterns using markov chain models of varying order. PloS one 9(7), 102070 (2014)

Helic et al. [2013] Helic, D., Strohmaier, M., Granitzer, M., Scherer, R.: Models of human navigation in information networks based on decentralized search. In: Proceedings of the 24th ACM Conference on Hypertext and Social Media, pp. 89–98 (2013)

Newcombe [2018] Newcombe, N.S.: Individual variation in human navigation. Current Biology 28(17), 1004–1008 (2018)

Weisberg et al. [2014] Weisberg, S.M., Schinazi, V.R., Newcombe, N.S., Shipley, T.F., Epstein, R.A.: Variations in cognitive maps: understanding individual differences in navigation. Journal of Experimental Psychology: Learning, Memory, and Cognition 40(3), 669 (2014)

Nazareth et al. [2019] Nazareth, A., Huang, X., Voyer, D., Newcombe, N.: A meta-analysis of sex differences in human navigation skills. Psychonomic bulletin & review 26(5), 1503–1528 (2019)

Coutrot et al. [2022] Coutrot, A., Manley, E., Goodroe, S., Gahnstrom, C., Filomena, G., Yesiltepe, D., Dalton, R., Wiener, J., Hölscher, C., Hornberger, M., et al.: Entropy of city street networks linked to future spatial navigation ability. Nature, 1–7 (2022)

Spiers et al. [2021] Spiers, H.J., Coutrot, A., Hornberger, M.: Explaining world-wide variation in navigation ability from millions of people: Citizen science project sea hero quest. Topics in Cognitive Science (2021)

Zhu et al. [2023] Zhu, M., Yasseri, T., Kertész, J.: Individual differences in knowledge network navigation. arXiv preprint arXiv:2303.10036 (2023)

Perozzi et al. [2014] Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)

Muchnik et al. [2007] Muchnik, L., Itzhack, R., Solomon, S., Louzoun, Y.: Self-emergence of knowledge trees: Extraction of the wikipedia hierarchies. Physical review E 76(1), 016106 (2007)

Jolliffe [2002] Jolliffe, I.T.: Principal component analysis for special types of data (2002)

MacQueen et al. [1967] MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967). Oakland, CA, USA

[30] Prolific — Quickly find research participants you can trust (2024). https://www.prolific.com/ Accessed 2024

Douglas et al. [2023] Douglas, B.D., Ewell, P.J., Brauer, M.: Data quality in online human-subjects research: Comparisons between mturk, prolific, cloudresearch, qualtrics, and sona. Plos one 18(3), 0279720 (2023)

[32] Qualtrics XM - Experience Management Software (2024). https://www.qualtrics.com/uk/ Accessed 2024

[33] Wikimedia Downloads (2024). https://dumps.wikimedia.org/ Accessed 2024

Goldberg [1992] Goldberg, L.R.: The development of markers for the big-five factor structure. Psychological assessment 4(1), 26 (1992)

Witten and Milne [2008] Witten, I.H., Milne, D.N.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links (2008)

Finkelstein et al. [2001] Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: The concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, pp. 406–414 (2001)

Singer et al. [2013] Singer, P., Niebler, T., Strohmaier, M., Hotho, A.: Computing semantic relatedness from human navigational paths: A case study on wikipedia. International Journal on Semantic Web and Information Systems (IJSWIS) 9(4), 41–70 (2013)