Search | arXiv e-print repository

arXiv:2405.00335 [pdf]

Finding the white male: The prevalence and consequences of algorithmic gender and race bias in political Google searches

Authors: Tobias Rohrbach, Mykola Makhortykh, Maryna Sydorova

Abstract: Search engines like Google have become major information gatekeepers that use artificial intelligence (AI) to determine who and what voters find when searching for political information. This article proposes and tests a framework of algorithmic representation of minoritized groups in a series of four studies. First, two algorithm audits of political image searches delineate how search engines ref… ▽ More Search engines like Google have become major information gatekeepers that use artificial intelligence (AI) to determine who and what voters find when searching for political information. This article proposes and tests a framework of algorithmic representation of minoritized groups in a series of four studies. First, two algorithm audits of political image searches delineate how search engines reflect and uphold structural inequalities by under- and misrepresenting women and non-white politicians. Second, two online experiments show that these biases in algorithmic representation in turn distort perceptions of the political reality and actively reinforce a white and masculinized view of politics. Together, the results have substantive implications for the scientific understanding of how AI technology amplifies biases in political perceptions and decision-making. The article contributes to ongoing public debates and cross-disciplinary research on algorithmic fairness and injustice. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 30 pages, 5 figures

arXiv:2403.02931 [pdf]

Improving the quality of individual-level online information tracking: challenges of existing approaches and introduction of a new content- and long-tail sensitive academic solution

Authors: Silke Adam, Mykola Makhortykh, Michaela Maier, Viktor Aigenseer, Aleksandra Urman, Teresa Gil Lopez, Clara Christner, Ernesto de León, Roberto Ulloa

Abstract: This article evaluates the quality of data collection in individual-level desktop information tracking used in the social sciences and shows that the existing approaches face sampling issues, validity issues due to the lack of content-level data and their disregard of the variety of devices and long-tail consumption patterns as well as transparency and privacy issues. To overcome some of these pro… ▽ More This article evaluates the quality of data collection in individual-level desktop information tracking used in the social sciences and shows that the existing approaches face sampling issues, validity issues due to the lack of content-level data and their disregard of the variety of devices and long-tail consumption patterns as well as transparency and privacy issues. To overcome some of these problems, the article introduces a new academic tracking solution, WebTrack, an open source tracking tool maintained by a major European research institution. The design logic, the interfaces and the backend requirements for WebTrack, followed by a detailed examination of strengths and weaknesses of the tool, are discussed. Finally, using data from 1185 participants, the article empirically illustrates how an improvement in the data collection through WebTrack leads to new innovative shifts in the processing of tracking data. As WebTrack allows collecting the content people are exposed to on more than classical news platforms, we can strongly improve the detection of politics-related information consumption in tracking data with the application of automated content analysis compared to traditional approaches that rely on the list-based identification of news. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 73 pages

arXiv:2401.13832 [pdf]

Algorithmically Curated Lies: How Search Engines Handle Misinformation about US Biolabs in Ukraine

Authors: Elizaveta Kuznetsova, Mykola Makhortykh, Maryna Sydorova, Aleksandra Urman, Ilaria Vitulano, Martha Stolze

Abstract: The growing volume of online content prompts the need for adopting algorithmic systems of information curation. These systems range from web search engines to recommender systems and are integral for hel** users stay informed about important societal developments. However, unlike journalistic editing the algorithmic information curation systems (AICSs) are known to be subject to different forms… ▽ More The growing volume of online content prompts the need for adopting algorithmic systems of information curation. These systems range from web search engines to recommender systems and are integral for hel** users stay informed about important societal developments. However, unlike journalistic editing the algorithmic information curation systems (AICSs) are known to be subject to different forms of malperformance which make them vulnerable to possible manipulation. The risk of manipulation is particularly prominent in the case when AICSs have to deal with information about false claims that underpin propaganda campaigns of authoritarian regimes. Using as a case study of the Russian disinformation campaign concerning the US biolabs in Ukraine, we investigate how one of the most commonly used forms of AICSs - i.e. web search engines - curate misinformation-related content. For this aim, we conduct virtual agent-based algorithm audits of Google, Bing, and Yandex search outputs in June 2022. Our findings highlight the troubling performance of search engines. Even though some search engines, like Google, were less likely to return misinformation results, across all languages and locations, the three search engines still mentioned or promoted a considerable share of false content (33% on Google; 44% on Bing, and 70% on Yandex). We also find significant disparities in misinformation exposure based on the language of search, with all search engines presenting a higher number of false stories in Russian. Location matters as well with users from Germany being more likely to be exposed to search results promoting false information. These observations stress the possibility of AICSs being vulnerable to manipulation, in particular in the case of the unfolding propaganda campaigns, and underline the importance of monitoring performance of these systems to prevent it. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: 19 pages, 5 figures

arXiv:2401.13079 [pdf]

doi 10.1007/978-981-99-7184-8_4.

No AI After Auschwitz? Bridging AI and Memory Ethics in the Context of Information Retrieval of Genocide-Related Information

Authors: Mykola Makhortykh

Abstract: The growing application of artificial intelligence (AI) in the field of information retrieval (IR) affects different domains, including cultural heritage. By facilitating organisation and retrieval of large volumes of heritage-related content, AI-driven IR systems inform users about a broad range of historical phenomena, including genocides (e.g. the Holocaust). However, it is currently unclear to… ▽ More The growing application of artificial intelligence (AI) in the field of information retrieval (IR) affects different domains, including cultural heritage. By facilitating organisation and retrieval of large volumes of heritage-related content, AI-driven IR systems inform users about a broad range of historical phenomena, including genocides (e.g. the Holocaust). However, it is currently unclear to what degree IR systems are capable of dealing with multiple ethical challenges associated with the curation of genocide-related information. To address this question, this chapter provides an overview of ethical challenges associated with the human curation of genocide-related information using a three-part framework inspired by Belmont criteria (i.e. curation challenges associated with respect for individuals, beneficence and justice/fairness). Then, the chapter discusses to what degree the above-mentioned challenges are applicable to the ways in which AI-driven IR systems deal with genocide-related information and what can be the potential ways of bridging AI and memory ethics in this context. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 17 pages

Journal ref: In Ethics in Artificial Intelligence: Bias, Fairness and Beyond (pp. 71-85) Springer (2023)

arXiv:2401.11194 [pdf, other]

Map** the Field of Algorithm Auditing: A Systematic Literature Review Identifying Research Trends, Linguistic and Geographical Disparities

Authors: Aleksandra Urman, Mykola Makhortykh, Aniko Hannak

Abstract: The increasing reliance on complex algorithmic systems by online platforms has sparked a growing need for algorithm auditing, a research methodology evaluating these systems' functionality and societal impact. In this paper, we systematically review algorithm auditing studies and identify trends in their methodological approaches, the geographic distribution of authors, and the selection of platfo… ▽ More The increasing reliance on complex algorithmic systems by online platforms has sparked a growing need for algorithm auditing, a research methodology evaluating these systems' functionality and societal impact. In this paper, we systematically review algorithm auditing studies and identify trends in their methodological approaches, the geographic distribution of authors, and the selection of platforms, languages, geographies, and group-based attributes in the focus of auditing research. We present evidence of a significant skew of research focus toward Western contexts, particularly the US, and a disproportionate reliance on English language data. Additionally, our analysis indicates a tendency in algorithm auditing studies to focus on a narrow set of group-based attributes, often operationalized in simplified ways, which might obscure more nuanced aspects of algorithmic bias and discrimination. By conducting this review, we aim to provide a clearer understanding of the current state of the algorithm auditing field and identify gaps that need to be addressed for a more inclusive and representative research landscape. △ Less

Submitted 20 January, 2024; originally announced January 2024.

arXiv:2312.13096 [pdf]

In Generative AI we Trust: Can Chatbots Effectively Verify Political Information?

Authors: Elizaveta Kuznetsova, Mykola Makhortykh, Victoria Vziatysheva, Martha Stolze, Ani Baghumyan, Aleksandra Urman

Abstract: This article presents a comparative analysis of the ability of two large language model (LLM)-based chatbots, ChatGPT and Bing Chat, recently rebranded to Microsoft Copilot, to detect veracity of political information. We use AI auditing methodology to investigate how chatbots evaluate true, false, and borderline statements on five topics: COVID-19, Russian aggression against Ukraine, the Holocaus… ▽ More This article presents a comparative analysis of the ability of two large language model (LLM)-based chatbots, ChatGPT and Bing Chat, recently rebranded to Microsoft Copilot, to detect veracity of political information. We use AI auditing methodology to investigate how chatbots evaluate true, false, and borderline statements on five topics: COVID-19, Russian aggression against Ukraine, the Holocaust, climate change, and LGBTQ+ related debates. We compare how the chatbots perform in high- and low-resource languages by using prompts in English, Russian, and Ukrainian. Furthermore, we explore the ability of chatbots to evaluate statements according to political communication concepts of disinformation, misinformation, and conspiracy theory, using definition-oriented prompts. We also systematically test how such evaluations are influenced by source bias which we model by attributing specific claims to various political and social actors. The results show high performance of ChatGPT for the baseline veracity evaluation task, with 72 percent of the cases evaluated correctly on average across languages without pre-training. Bing Chat performed worse with a 67 percent accuracy. We observe significant disparities in how chatbots evaluate prompts in high- and low-resource languages and how they adapt their evaluations to political communication concepts with ChatGPT providing more nuanced outputs than Bing Chat. Finally, we find that for some veracity detection-related tasks, the performance of chatbots varied depending on the topic of the statement or the source to which it is attributed. These findings highlight the potential of LLM-based chatbots in tackling different forms of false information in online environments, but also points to the substantial variation in terms of how such potential is realized due to specific factors, such as language of the prompt or the topic. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 22 pages, 8 figures

arXiv:2311.09969 [pdf]

Examining bias perpetuation in academic search engines: an algorithm audit of Google and Semantic Scholar

Authors: Celina Kacperski, Mona Bielig, Mykola Makhortykh, Maryna Sydorova, Roberto Ulloa

Abstract: Researchers rely on academic web search engines to find scientific sources, but search engine mechanisms may selectively present content that aligns with biases embedded in the queries. This study examines whether confirmation-biased queries prompted into Google Scholar and Semantic Scholar will yield skewed results. Six queries (topics across health and technology domains such as "vaccines" or "i… ▽ More Researchers rely on academic web search engines to find scientific sources, but search engine mechanisms may selectively present content that aligns with biases embedded in the queries. This study examines whether confirmation-biased queries prompted into Google Scholar and Semantic Scholar will yield skewed results. Six queries (topics across health and technology domains such as "vaccines" or "internet use") were analyzed for disparities in search results. We confirm that biased queries (targeting "benefits" or "risks") affect search results in line with the bias, with technology-related queries displaying more significant disparities. Overall, Semantic Scholar exhibited fewer disparities than Google Scholar. Topics rated as more polarizing did not consistently show more skewed results. Academic search results that perpetuate confirmation bias have strong implications for both researchers and citizens searching for evidence. More research is needed to explore how scientific inquiry and academic search engines interact. △ Less

Submitted 21 November, 2023; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2310.03458 [pdf, other]

User Attitudes to Content Moderation in Web Search

Authors: Aleksandra Urman, Aniko Hannak, Mykola Makhortykh

Abstract: Internet users highly rely on and trust web search engines, such as Google, to find relevant information online. However, scholars have documented numerous biases and inaccuracies in search outputs. To improve the quality of search results, search engines employ various content moderation practices such as interface elements informing users about potentially dangerous websites and algorithmic mech… ▽ More Internet users highly rely on and trust web search engines, such as Google, to find relevant information online. However, scholars have documented numerous biases and inaccuracies in search outputs. To improve the quality of search results, search engines employ various content moderation practices such as interface elements informing users about potentially dangerous websites and algorithmic mechanisms for downgrading or removing low-quality search results. While the reliance of the public on web search engines and their use of moderation practices is well-established, user attitudes towards these practices have not yet been explored in detail. To address this gap, we first conducted an overview of content moderation practices used by search engines, and then surveyed a representative sample of the US adult population (N=398) to examine the levels of support for different moderation practices applied to potentially misleading and/or potentially offensive content in web search. We also analyzed the relationship between user characteristics and their support for specific moderation practices. We find that the most supported practice is informing users about potentially misleading or offensive content, and the least supported one is the complete removal of search results. More conservative users and users with lower levels of trust in web search results are more likely to be against content moderation in web search. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2305.14358 [pdf]

Shall androids dream of genocides? How generative AI can change the future of memorialization of mass atrocities

Authors: Mykola Makhortykh, Eve M. Zucker, David J. Simon, Daniel Bultmann, Roberto Ulloa

Abstract: The memorialization of mass atrocities such as war crimes and genocides facilitates the remembrance of past suffering, honors those who resisted the perpetrators, and helps prevent the distortion of historical facts. Digital technologies have transformed memorialization practices by enabling less top-down and more creative approaches to remember mass atrocities. At the same time, they may also fac… ▽ More The memorialization of mass atrocities such as war crimes and genocides facilitates the remembrance of past suffering, honors those who resisted the perpetrators, and helps prevent the distortion of historical facts. Digital technologies have transformed memorialization practices by enabling less top-down and more creative approaches to remember mass atrocities. At the same time, they may also facilitate the spread of denialism and distortion, attempt to justify past crimes and attack the dignity of victims. The emergence of generative forms of artificial intelligence (AI), which produce textual and visual content, has the potential to revolutionize the field of memorialization even further. AI can identify patterns in training data to create new narratives for representing and interpreting mass atrocities - and do so in a fraction of the time it takes for humans. The use of generative AI in this context raises numerous questions: For example, can the paucity of training data on mass atrocities distort how AI interprets some atrocity-related inquiries? How important is the ability to differentiate between human- and AI-made content concerning mass atrocities? Can AI-made content be used to promote false information concerning atrocities? This article addresses these and other questions by examining the opportunities and risks associated with using generative AIs for memorializing mass atrocities. It also discusses recommendations for AIs integration in memorialization practices to steer the use of these technologies toward a more ethical and sustainable direction. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: 22 pages

arXiv:2211.04746 [pdf]

Novelty in news search: a longitudinal study of the 2020 US elections

Authors: Roberto Ulloa, Mykola Makhortykh, Aleksandra Urman, Juhi Kulshrestha

Abstract: The 2020 US elections news coverage was extensive, with new pieces of information generated rapidly. This evolving scenario presented an opportunity to study the performance of search engines in a context in which they had to quickly process information as it was published. We analyze novelty, a measurement of new items that emerge in the top news search results, to compare the coverage and visibi… ▽ More The 2020 US elections news coverage was extensive, with new pieces of information generated rapidly. This evolving scenario presented an opportunity to study the performance of search engines in a context in which they had to quickly process information as it was published. We analyze novelty, a measurement of new items that emerge in the top news search results, to compare the coverage and visibility of different topics. We conduct a longitudinal study of news results of five search engines collected in short-bursts (every 21 minutes) from two regions (Oregon, US and Frankfurt, Germany), starting on election day and lasting until one day after the announcement of Biden as the winner. We find more new items emerging for election related queries ("joe biden", "donald trump" and "us elections") compared to topical (e.g., "coronavirus") or stable (e.g., "holocaust") queries. We demonstrate differences across search engines and regions over time, and we highlight imbalances between candidate queries. When it comes to news search, search engines are responsible for such imbalances, either due to their algorithms or the set of news sources they rely on. We argue that such imbalances affect the visibility of political candidates in news searches during electoral periods. △ Less

Submitted 9 November, 2022; originally announced November 2022.

arXiv:2209.11120 [pdf]

This is what a pandemic looks like: Visual framing of COVID-19 on search engines

Authors: Mykola Makhortykh, Aleksandra Urman, Roberto Ulloa

Abstract: In today's high-choice media environment, search engines play an integral role in informing individuals and societies about the latest events. The importance of search algorithms is even higher at the time of crisis, when users search for information to understand the causes and the consequences of the current situation and decide on their course of action. In our paper, we conduct a comparative a… ▽ More In today's high-choice media environment, search engines play an integral role in informing individuals and societies about the latest events. The importance of search algorithms is even higher at the time of crisis, when users search for information to understand the causes and the consequences of the current situation and decide on their course of action. In our paper, we conduct a comparative audit of how different search engines prioritize visual information related to COVID-19 and what consequences it has for the representation of the pandemic. Using a virtual agent-based audit approach, we examine image search results for the term "coronavirus" in English, Russian and Chinese on five major search engines: Google, Yandex, Bing, Yahoo, and DuckDuckGo. Specifically, we focus on how image search results relate to generic news frames (e.g., the attribution of responsibility, human interest, and economics) used in relation to COVID-19 and how their visual composition varies between the search engines. △ Less

Submitted 22 September, 2022; originally announced September 2022.

Comments: 18 pages, 1 figure, 3 tables

arXiv:2207.00489 [pdf]

Panning for gold: Lessons learned from the platform-agnostic automated detection of political content in textual data

Authors: Mykola Makhortykh, Ernesto de León, Aleksandra Urman, Clara Christner, Maryna Sydorova, Silke Adam, Michaela Maier, Teresa Gil-Lopez

Abstract: The growing availability of data about online information behaviour enables new possibilities for political communication research. However, the volume and variety of these data makes them difficult to analyse and prompts the need for develo** automated content approaches relying on a broad range of natural language processing techniques (e.g. machine learning- or neural network-based ones). In… ▽ More The growing availability of data about online information behaviour enables new possibilities for political communication research. However, the volume and variety of these data makes them difficult to analyse and prompts the need for develo** automated content approaches relying on a broad range of natural language processing techniques (e.g. machine learning- or neural network-based ones). In this paper, we discuss how these techniques can be used to detect political content across different platforms. Using three validation datasets, which include a variety of political and non-political textual documents from online platforms, we systematically compare the performance of three groups of detection techniques relying on dictionaries, supervised machine learning, or neural networks. We also examine the impact of different modes of data preprocessing (e.g. stemming and stopword removal) on the low-cost implementations of these techniques using a large set (n = 66) of detection models. Our results show the limited impact of preprocessing on model performance, with the best results for less noisy data being achieved by neural network- and machine-learning-based models, in contrast to the more robust performance of dictionary-based models on noisy data. △ Less

Submitted 1 July, 2022; originally announced July 2022.

arXiv:2112.01278 [pdf]

Where the Earth is flat and 9/11 is an inside job: A comparative algorithm audit of conspiratorial information in web search results

Authors: Aleksandra Urman, Mykola Makhortykh, Roberto Ulloa, Juhi Kulshrestha

Abstract: Web search engines are important online information intermediaries that are frequently used and highly trusted by the public despite multiple evidence of their outputs being subjected to inaccuracies and biases. One form of such inaccuracy, which so far received little scholarly attention, is the presence of conspiratorial information, namely pages promoting conspiracy theories. We address this ga… ▽ More Web search engines are important online information intermediaries that are frequently used and highly trusted by the public despite multiple evidence of their outputs being subjected to inaccuracies and biases. One form of such inaccuracy, which so far received little scholarly attention, is the presence of conspiratorial information, namely pages promoting conspiracy theories. We address this gap by conducting a comparative algorithm audit to examine the distribution of conspiratorial information in search results across five search engines: Google, Bing, DuckDuckGo, Yahoo and Yandex. Using a virtual agent-based infrastructure, we systematically collect search outputs for six conspiracy theory-related queries (flat earth, new world order, qanon, 9/11, illuminati, george soros) across three locations (two in the US and one in the UK) and two observation periods (March and May 2021). We find that all search engines except Google consistently displayed conspiracy-promoting results and returned links to conspiracy-dedicated websites in their top results, although the share of such content varied across queries. Most conspiracy-promoting results came from social media and conspiracy-dedicated websites while conspiracy-debunking information was shared by scientific websites and, to a lesser extent, legacy media. The fact that these observations are consistent across different locations and time periods highlight the possibility of some search engines systematically prioritizing conspiracy-promoting content and, thus, amplifying their distribution in the online environments. △ Less

Submitted 6 December, 2021; v1 submitted 2 December, 2021; originally announced December 2021.

arXiv:2106.14072 [pdf, ps, other]

doi 10.1007/978-3-030-78818-6_5

Detecting race and gender bias in visual representation of AI on web search engines

Authors: Mykola Makhortykh, Aleksandra Urman, Roberto Ulloa

Abstract: Web search engines influence perception of social reality by filtering and ranking information. However, their outputs are often subjected to bias that can lead to skewed representation of subjects such as professional occupations or gender. In our paper, we use a mixed-method approach to investigate presence of race and gender bias in representation of artificial intelligence (AI) in image search… ▽ More Web search engines influence perception of social reality by filtering and ranking information. However, their outputs are often subjected to bias that can lead to skewed representation of subjects such as professional occupations or gender. In our paper, we use a mixed-method approach to investigate presence of race and gender bias in representation of artificial intelligence (AI) in image search results coming from six different search engines. Our findings show that search engines prioritize anthropomorphic images of AI that portray it as white, whereas non-white images of AI are present only in non-Western search engines. By contrast, gender representation of AI is more diverse and less skewed towards a specific gender that can be attributed to higher awareness about gender bias in search outputs. Our observations indicate both the the need and the possibility for addressing bias in representation of societally relevant subjects, such as technological innovation, and emphasize the importance of designing new approaches for detecting bias in information retrieval systems. △ Less

Submitted 26 June, 2021; originally announced June 2021.

Comments: 16 pages, 3 figures

ACM Class: H.3.3

Journal ref: In Advances in Bias and Fairness in Information Retrieval (pp. 36-50). Springer (2021)

arXiv:2106.05831 [pdf]

doi 10.1177/01655515221093029

Scaling up Search Engine Audits: Practical Insights for Algorithm Auditing

Authors: Roberto Ulloa, Mykola Makhortykh, Aleksandra Urman

Abstract: Algorithm audits have increased in recent years due to a growing need to independently assess the performance of automatically curated services that process, filter, and rank the large and dynamic amount of information available on the internet. Among several methodologies to perform such audits, virtual agents stand out because they offer the ability to perform systematic experiments, simulating… ▽ More Algorithm audits have increased in recent years due to a growing need to independently assess the performance of automatically curated services that process, filter, and rank the large and dynamic amount of information available on the internet. Among several methodologies to perform such audits, virtual agents stand out because they offer the ability to perform systematic experiments, simulating human behaviour without the associated costs of recruiting participants. Motivated by the importance of research transparency and replicability of results, this paper focuses on the challenges of such an approach. It provides methodological details, recommendations, lessons learned, and limitations based on our experience of setting up experiments for eight search engines (including main, news, image and video sections) with hundreds of virtual agents placed in different regions. We demonstrate the successful performance of our research infrastructure across multiple data collections, with diverse experimental designs, and point to different changes and strategies that improve the quality of the method. We conclude that virtual agents are a promising venue for monitoring the performance of algorithms across long periods of time, and we hope that this paper can serve as a basis for further research in this area. △ Less

Submitted 25 April, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

arXiv:2106.02715 [pdf, other]

doi 10.1145/3442442.3452306

Auditing Source Diversity Bias in Video Search Results Using Virtual Agents

Authors: Aleksandra Urman, Mykola Makhortykh, Roberto Ulloa

Abstract: We audit the presence of domain-level source diversity bias in video search results. Using a virtual agent-based approach, we compare outputs of four Western and one non-Western search engines for English and Russian queries. Our findings highlight that source diversity varies substantially depending on the language with English queries returning more diverse outputs. We also find disproportionate… ▽ More We audit the presence of domain-level source diversity bias in video search results. Using a virtual agent-based approach, we compare outputs of four Western and one non-Western search engines for English and Russian queries. Our findings highlight that source diversity varies substantially depending on the language with English queries returning more diverse outputs. We also find disproportionately high presence of a single platform, YouTube, in top search outputs for all Western search engines except Google. At the same time, we observe that Youtube's major competitors such as Vimeo or Dailymotion do not appear in the sampled Google's video search results. This finding suggests that Google might be downgrading the results from the main competitors of Google-owned Youtube and highlights the necessity for further studies focusing on the presence of own-content bias in Google's search results. △ Less

Submitted 4 June, 2021; originally announced June 2021.

Journal ref: WWW '21: Companion Proceedings of the Web Conference 2021

arXiv:2105.04961 [pdf, other]

You Are How (and Where) You Search? Comparative Analysis of Web Search Behaviour Using Web Tracking Data

Authors: Aleksandra Urman, Mykola Makhortykh

Abstract: We conduct a comparative analysis of desktop web search behaviour of users from Germany (n=558) and Switzerland (n=563) based on a combination of web tracking and survey data. We find that web search accounts for 13% of all desktop browsing, with the share being higher in Switzerland than in Germany. We find that in over 50% of cases users clicked on the first search result, with over 97% of all c… ▽ More We conduct a comparative analysis of desktop web search behaviour of users from Germany (n=558) and Switzerland (n=563) based on a combination of web tracking and survey data. We find that web search accounts for 13% of all desktop browsing, with the share being higher in Switzerland than in Germany. We find that in over 50% of cases users clicked on the first search result, with over 97% of all clicks being made on the first page of search outputs. Most users rely on Google when conducting searches, and users preferences for other engines are related to their demographics. We also test relationships between user demographics and daily number of searches, average share of search activities among tracked events by user as well as the tendency to click on higher- or lower-ranked results. We find differences in such relationships between the two countries that highlights the importance of comparative research in this domain. Further, we observe differences in the temporal patterns of web search use between women and men, marking the necessity of disaggregating data by gender in observational studies regarding online information behaviour. △ Less

Submitted 11 May, 2021; originally announced May 2021.

arXiv:2105.00756 [pdf]

doi 10.1177/08944393211006863

The Matter of Chance: Auditing Web Search Results Related to the 2020 U.S. Presidential Primary Elections Across Six Search Engines

Authors: Aleksandra Urman, Mykola Makhortykh, Roberto Ulloa

Abstract: We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default - that is nonpersonalized - conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at t… ▽ More We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default - that is nonpersonalized - conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for "us elections", "donald trump", "joe biden" and "bernie sanders" queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research. △ Less

Submitted 3 May, 2021; originally announced May 2021.

Journal ref: Social Science Computer Review (2021)

Showing 1–18 of 18 results for author: Makhortykh, M