Search | arXiv e-print repository

Delving into ChatGPT usage in academic writing through excess vocabulary

Authors: Dmitry Kobak, Rita González-Márquez, Emőke-Ágnes Horvát, Jan Lause

Abstract: Recent large language models (LLMs) can generate and revise text with human-level performance, and have been widely commercialized in systems like ChatGPT. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists have been using them to assist their scholarly writing. How wide-spread is LLM usage in th… ▽ More Recent large language models (LLMs) can generate and revise text with human-level performance, and have been widely commercialized in systems like ChatGPT. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists have been using them to assist their scholarly writing. How wide-spread is LLM usage in the academic literature currently? To answer this question, we use an unbiased, large-scale approach, free from any assumptions on academic LLM usage. We study vocabulary changes in 14 million PubMed abstracts from 2010-2024, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. Our analysis based on excess words usage suggests that at least 10% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, and was as high as 30% for some PubMed sub-corpora. We show that the appearance of LLM-based writing assistants has had an unprecedented impact in the scientific literature, surpassing the effect of major world events such as the Covid pandemic. △ Less

Submitted 3 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: v2: Updating dataset, figures and numbers to include all PubMed abstracts until end of June 2024

arXiv:2402.17495 [pdf, ps, other]

The Unwanted Dissemination of Science: The Usage of Academic Articles as Ammunition in Contested Discursive Arenas on Twitter

Authors: Richard Zhang, Emőke-Ágnes Horvát

Abstract: Twitter is a common site of offensive language. Prior literature has shown that the emotional content of tweets can heavily impact their diffusion when discussing political topics. We extend prior work to look at offensive tweets that link to academic articles. Using a mixed methods approach, we identify three findings: firstly, offensive language is common in tweets that refer to academic article… ▽ More Twitter is a common site of offensive language. Prior literature has shown that the emotional content of tweets can heavily impact their diffusion when discussing political topics. We extend prior work to look at offensive tweets that link to academic articles. Using a mixed methods approach, we identify three findings: firstly, offensive language is common in tweets that refer to academic articles, and vary widely by subject matter. Secondly, discourse analysis reveals that offensive tweets commonly use academic articles to promote or attack political ideologies. Lastly, we show that offensive tweets reach a smaller audience than their non-offensive counterparts. Our analysis of these offensive tweets reveal how academic articles are being shared on Twitter not for the sake of disseminating new knowledge, but rather to as argumentative tools in controversial and combative discourses. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 16 pages, 8 tables, submitted to CSCW '24

arXiv:2308.00405 [pdf]

Who benefits from altmetrics? The effect of team gender composition on the link between online visibility and citation impact

Authors: Orsolya Vásárhelyi, Emőke-Ágnes Horvát

Abstract: Online science dissemination has quickly become crucial in promoting scholars' work. Recent literature has demonstrated a lack of visibility for women's research, where women's articles receive fewer academic citations than men's. The informetric and scientometric community has briefly examined gender-based inequalities in online visibility. However, the link between online sharing of scientific w… ▽ More Online science dissemination has quickly become crucial in promoting scholars' work. Recent literature has demonstrated a lack of visibility for women's research, where women's articles receive fewer academic citations than men's. The informetric and scientometric community has briefly examined gender-based inequalities in online visibility. However, the link between online sharing of scientific work and citation impact for teams with different gender compositions remains understudied. Here we explore whether online visibility is hel** women overcome the gender-based citation penalty. Our analyses cover the three broad research areas of Computer Science, Engineering, and Social Sciences, which have different gender representation, adoption of online science dissemination practices, and citation culture. We create a quasi-experimental setting by applying Coarsened Exact Matching, which enables us to isolate the effects of team gender composition and online visibility on the number of citations. We find that online visibility positively affects citations across research areas, while team gender composition interacts differently with visibility in these research areas. Our results provide essential insights into gendered citation patterns and online visibility, inviting informed discussions about decreasing the citation gap. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: 20 pages, 2 figures, 5 tables

ACM Class: J.4

Journal ref: Proceedings 19th International Society of Scientometrics and Informetrics Conference, Bloomington, Indiana, 2023

arXiv:2306.15684 [pdf, other]

Understanding (Ir)rational Herding Online

Authors: Henry K. Dambanemuya, Johannes Wachs, Emőke-Ágnes Horvát

Abstract: Investigations of social influence in collective decision-making have become possible due to recent technologies and platforms that record interactions in far larger groups than could be studied before. Herding and its impact on decision-making are critical areas of practical interest and research study. However, despite theoretical work suggesting that it matters whether individuals choose who to… ▽ More Investigations of social influence in collective decision-making have become possible due to recent technologies and platforms that record interactions in far larger groups than could be studied before. Herding and its impact on decision-making are critical areas of practical interest and research study. However, despite theoretical work suggesting that it matters whether individuals choose who to imitate based on cues such as experience or whether they herd at random, there is little empirical analysis of this distinction. To demonstrate the distinction between what the literature calls "rational" and "irrational" herding, we use data on tens of thousands of loans from a well-established online peer-to-peer (p2p) lending platform. First, we employ an empirical measure of memory in complex systems to measure herding in lending. Then, we illustrate a network-based approach to visualize herding. Finally, we model the impact of herding on collective outcomes. Our study reveals that loan performance is not solely determined by whether the lenders engage in herding or not. Instead, the interplay between herding and the imitated lenders' prior success on the platform predicts loan outcomes. In short, herds led by expert lenders tend to pick loans that do not default. We discuss the implications of this under-explored aspect of herding for platform designers, borrowers, and lenders. Our study advances collective intelligence theories based on a case of high-stakes group decision-making online. △ Less

Submitted 22 June, 2023; originally announced June 2023.

ACM Class: J.4

arXiv:2306.13250 [pdf, other]

Emergent Influence Networks in Good-Faith Online Discussions

Authors: Henry K. Dambanemuya, Daniel Romero, Emőke-Ágnes Horvát

Abstract: Town hall-type debates are increasingly moving online, irrevocably transforming public discourse. Yet, we know relatively little about crucial social dynamics that determine which arguments are more likely to be successful. This study investigates the impact of one's position in the discussion network created via responses to others' arguments on one's persuasiveness in unfacilitated online debate… ▽ More Town hall-type debates are increasingly moving online, irrevocably transforming public discourse. Yet, we know relatively little about crucial social dynamics that determine which arguments are more likely to be successful. This study investigates the impact of one's position in the discussion network created via responses to others' arguments on one's persuasiveness in unfacilitated online debates. We propose a novel framework for measuring the impact of network position on persuasiveness, using a combination of social network analysis and machine learning. Complementing existing studies investigating the effect of linguistic aspects on persuasiveness, we show that the user's position in a discussion network influences their persuasiveness online. Moreover, the recognition of successful persuasion further increases this dominant network position. Our findings offer important insights into the complex social dynamics of online discourse and provide practical insights for organizations and individuals seeking to understand the interplay between influential positions in a discussion network and persuasive strategies in digital spaces. △ Less

Submitted 22 June, 2023; originally announced June 2023.

ACM Class: J.4

arXiv:2303.16302 [pdf]

Retracted Articles about COVID-19 Vaccines Enable Vaccine Misinformation on Twitter

Authors: Rod Abhari, Esteban Villa-Turek, Nicholas Vincent, Henry Dambanemuya, Emőke-Ágnes Horvát

Abstract: Retracted scientific articles about COVID-19 vaccines have proliferated false claims about vaccination harms and discouraged vaccine acceptance. Our study analyzed the topical content of 4,876 English-language tweets about retracted COVID-19 vaccine research and found that 27.4% of tweets contained retraction-related misinformation. Misinformed tweets either ignored the retraction, or less commonl… ▽ More Retracted scientific articles about COVID-19 vaccines have proliferated false claims about vaccination harms and discouraged vaccine acceptance. Our study analyzed the topical content of 4,876 English-language tweets about retracted COVID-19 vaccine research and found that 27.4% of tweets contained retraction-related misinformation. Misinformed tweets either ignored the retraction, or less commonly, politicized the retraction using conspiratorial rhetoric. To address this, Twitter and other social media platforms should expand their efforts to address retraction-related misinformation. △ Less

Submitted 28 March, 2023; originally announced March 2023.

arXiv:2207.13815 [pdf, other]

Information Retention in the Multi-platform Sharing of Science

Authors: Sohyeon Hwang, Emőke-Ágnes Horvát, Daniel M. Romero

Abstract: The public interest in accurate scientific communication, underscored by recent public health crises, highlights how content often loses critical pieces of information as it spreads online. However, multi-platform analyses of this phenomenon remain limited due to challenges in data collection. Collecting mentions of research tracked by Altmetric LLC, we examine information retention in the over 4… ▽ More The public interest in accurate scientific communication, underscored by recent public health crises, highlights how content often loses critical pieces of information as it spreads online. However, multi-platform analyses of this phenomenon remain limited due to challenges in data collection. Collecting mentions of research tracked by Altmetric LLC, we examine information retention in the over 4 million online posts referencing 9,765 of the most-mentioned scientific articles across blog sites, Facebook, news sites, Twitter, and Wikipedia. To do so, we present a burst-based framework for examining online discussions about science over time and across different platforms. To measure information retention we develop a keyword-based computational measure comparing an online post to the scientific article's abstract. We evaluate our measure using ground truth data labeled by within field experts. We highlight three main findings: first, we find a strong tendency towards low levels of information retention, following a distinct trajectory of loss except when bursts of attention begin in social media. Second, platforms show significant differences in information retention. Third, sequences involving more platforms tend to be associated with higher information retention. These findings highlight a strong tendency towards information loss over time - posing a critical concern for researchers, policymakers, and citizens alike - but suggest that multi-platform discussions may improve information retention overall. △ Less

Submitted 12 March, 2023; v1 submitted 27 July, 2022; originally announced July 2022.

Comments: 12 pages, 8 figures, accepted at the International AAAI Conference on Web and Social Media (ICWSM, 2023)

arXiv:2206.07754 [pdf, other]

doi 10.1140/epjds/s13688-023-00377-7

Novelty and Cultural Evolution in Modern Popular Music

Authors: Katherine O'Toole, Emőke-Ágnes Horvát

Abstract: The ubiquity of digital music consumption has made it possible to extract information about modern music that allows us to perform large scale analysis of stylistic change over time. In order to uncover underlying patterns in cultural evolution, we examine the relationship between the established characteristics of different genres and styles, and the introduction of novel ideas that fuel this ong… ▽ More The ubiquity of digital music consumption has made it possible to extract information about modern music that allows us to perform large scale analysis of stylistic change over time. In order to uncover underlying patterns in cultural evolution, we examine the relationship between the established characteristics of different genres and styles, and the introduction of novel ideas that fuel this ongoing creative evolution. To understand how this dynamic plays out and shapes the cultural ecosystem, we compare musical artifacts to their contemporaries to identify novel artifacts, study the relationship between novelty and commercial success, and connect this to the changes in musical content that we can observe over time. Using Music Information Retrieval (MIR) data and lyrics from Billboard Hot 100 songs between 1974-2013, we calculate a novelty score for each song's aural attributes and lyrics. Comparing both scores to the popularity of the song following its release, we uncover key patterns in the relationship between novelty and audience reception. Additionally, we look at the link between novelty and the likelihood that a song was influential given where its MIR and lyrical features fit within the larger trends we observed. △ Less

Submitted 27 February, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

Journal ref: EPJ Data Science 12 (2023) 1-25

arXiv:2206.07210 [pdf, other]

Beyond Words: An Experimental Study of Signaling in Crowdfunding

Authors: Henry K. Dambanemuya, Eunseo Choi, Darren Gergle, Emőke-Ágnes Horvát

Abstract: Increasingly, crowdfunding is transforming financing for many people worldwide. Yet we know relatively little about how, why, and when funding outcomes are impacted by signaling between funders. We conduct two studies of N=500 and N=750 participants involved in crowdfunding to investigate the effect of certain characteristics of ``crowd signals'' on the decision to fund. We find that, under a vari… ▽ More Increasingly, crowdfunding is transforming financing for many people worldwide. Yet we know relatively little about how, why, and when funding outcomes are impacted by signaling between funders. We conduct two studies of N=500 and N=750 participants involved in crowdfunding to investigate the effect of certain characteristics of ``crowd signals'' on the decision to fund. We find that, under a variety of conditions, contributions of heterogeneous amounts arriving at varying time intervals are significantly more likely to be selected than homogeneous contribution amounts and times. The impact of signaling is strongest among participants who are susceptible to social influence. The effect is remarkably general across different project types, fundraising goals, participant interest in the projects, and participants' altruistic attitudes. Critically, the role of crowd signals in decision-making is typically unrecognized by participants. Our results underscore the fundamental nature of social signaling in crowdfunding, informing strategies for platforms, funders, and project creators. △ Less

Submitted 10 January, 2024; v1 submitted 14 June, 2022; originally announced June 2022.

ACM Class: J.4

arXiv:2206.05330 [pdf, other]

The Gender Gap in Scholarly Self-Promotion on Social Media

Authors: Hao Peng, Misha Teplitskiy, Daniel M. Romero, Emőke-Ágnes Horvát

Abstract: Self-promotion in science is ubiquitous but may not be exercised equally by men and women. Research on self-promotion in other domains suggests that, due to bias in self-assessment and adverse reactions to non-gender-conforming behaviors (``pushback''), women tend to self-promote less often than men. We test whether this pattern extends to scholars by examining self-promotion over six years using… ▽ More Self-promotion in science is ubiquitous but may not be exercised equally by men and women. Research on self-promotion in other domains suggests that, due to bias in self-assessment and adverse reactions to non-gender-conforming behaviors (``pushback''), women tend to self-promote less often than men. We test whether this pattern extends to scholars by examining self-promotion over six years using 23M Tweets about 2.8M research papers by 3.5M authors. Overall, women are about 28% less likely than men to self-promote their papers even after accounting for important confounds, and this gap has grown over time. Moreover, differential adoption of Twitter does not explain the gender gap, which is large even in relatively gender-balanced broad research areas, where bias in self-assessment and pushback are expected to be smaller. Further, the gap increases with higher performance and status, being most pronounced for productive women from top-ranked institutions who publish in high-impact journals. Critically, we find differential returns with respect to gender: while self-promotion is associated with increased tweets of papers, the increase is smaller for women than for men. Our findings suggest that self-promotion varies meaningfully by gender and help explain gender differences in the visibility of scientific ideas. △ Less

Submitted 10 October, 2023; v1 submitted 10 June, 2022; originally announced June 2022.

arXiv:2203.04228 [pdf, other]

Online Engagement with Retracted Articles: Who, When, and How?

Authors: Henry K. Dambanemuya, Rod Abhari, Nicholas Vincent, Emőke-Ágnes Horvát

Abstract: Retracted research discussed on social media can spread misinformation. Yet we lack an understanding of how retracted articles are mentioned by academic and non-academic users. This is especially relevant on Twitter due to the platform's prominent role in science communication. Here, we analyze the pre- and post-retraction differences in Twitter attention and engagement metrics for over 3,800 retr… ▽ More Retracted research discussed on social media can spread misinformation. Yet we lack an understanding of how retracted articles are mentioned by academic and non-academic users. This is especially relevant on Twitter due to the platform's prominent role in science communication. Here, we analyze the pre- and post-retraction differences in Twitter attention and engagement metrics for over 3,800 retracted English-language articles alongside comparable non-retracted articles. We subset these findings according to five user types detected by our supervised learning classifier: members of the public, academics, bots, science practitioners, and science communicators. We find that retracted articles receive greater user attention (tweet count) and engagement (likes, retweets, and replies) than non-retracted articles, especially among members of the public and bots, with the majority of user engagement happening before retraction. Our results highlight the prominent role of non-experts in discussions of retracted research and suggest an opportunity for social media platforms to contribute towards early detection of problematic scientific research online. △ Less

Submitted 29 January, 2024; v1 submitted 8 March, 2022; originally announced March 2022.

ACM Class: K.4.0

arXiv:2110.07798 [pdf, other]

doi 10.1073/pnas.2119086119

Dynamics of Cross-Platform Attention to Retracted Papers

Authors: Hao Peng, Daniel M. Romero, Emőke-Ágnes Horvát

Abstract: Retracted papers often circulate widely on social media, digital news and other websites before their official retraction. The spread of potentially inaccurate or misleading results from retracted papers can harm the scientific community and the public. Here we quantify the amount and type of attention 3,851 retracted papers received over time in different online platforms. Comparing to a set of n… ▽ More Retracted papers often circulate widely on social media, digital news and other websites before their official retraction. The spread of potentially inaccurate or misleading results from retracted papers can harm the scientific community and the public. Here we quantify the amount and type of attention 3,851 retracted papers received over time in different online platforms. Comparing to a set of non-retracted control papers from the same journals, with similar publication year, number of co-authors and author impact, we show that retracted papers receive more attention after publication not only on social media, but also on heavily curated platforms, such as news outlets and knowledge repositories, amplifying the negative impact on the public. At the same time, we find that posts on Twitter tend to express more criticism about retracted than about control papers, suggesting that criticism-expressing tweets could contain factual information about problematic papers. Most importantly, around the time they are retracted, papers generate discussions that are primarily about the retraction incident rather than about research findings, showing that by this point papers have exhausted attention to their results and highlighting the limited effect of retractions. Our findings reveal the extent to which retracted papers are discussed on different online platforms and identify at scale audience criticism towards them. In this context, we show that retraction is not an effective tool to reduce online attention to problematic papers. △ Less

Submitted 15 June, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

arXiv:2101.06315 [pdf, other]

A Multi-Platform Study of Crowd Signals Associated with Successful Online Fundraising

Authors: Henry K. Dambanemuya, Emőke-Ágnes Horvát

Abstract: The growing popularity of online fundraising (aka "crowdfunding") has attracted significant research on the subject. In contrast to previous studies that attempt to predict the success of crowdfunded projects based on specific characteristics of the projects and their creators, we present a more general approach that focuses on crowd dynamics and is robust to the particularities of different crowd… ▽ More The growing popularity of online fundraising (aka "crowdfunding") has attracted significant research on the subject. In contrast to previous studies that attempt to predict the success of crowdfunded projects based on specific characteristics of the projects and their creators, we present a more general approach that focuses on crowd dynamics and is robust to the particularities of different crowdfunding platforms. We rely on a multi-method analysis to investigate the correlates, predictive importance, and quasi-causal effects of features that describe crowd dynamics in determining the success of crowdfunded projects. By applying a multi-method analysis to a study of fundraising in three different online markets, we uncover general crowd dynamics that ultimately decide which projects will succeed. In all analyses and across the three different platforms, we consistently find that funders' behavioural signals (1) are significantly correlated with fundraising success; (2) approximate fundraising outcomes better than the characteristics of projects and their creators such as credit grade, company valuation, and subject domain; and (3) have significant quasi-causal effects on fundraising outcomes while controlling for potentially confounding project variables. By showing that universal features deduced from crowd behaviour are predictive of fundraising success on different crowdfunding platforms, our work provides design-relevant insights about novel types of collective decision-making online. This research inspires thus potential ways to leverage cues from the crowd and catalyses research into crowd-aware system design. △ Less

Submitted 15 January, 2021; originally announced January 2021.

Comments: To appear in the Proceedings of the ACM (PACM) Human-Computer Interaction CSCW'21

ACM Class: J.4

arXiv:2101.05044 [pdf, other]

Publishing patterns reflect political polarization in news media

Authors: Nick Hagar, Johannes Wachs, Emőke-Ágnes Horvát

Abstract: Digital news outlets rely on a variety of outside contributors, from freelance journalists, to political commentators, to executives and politicians. These external dependencies create a network among news outlets, traced along the contributors they share. Using connections between outlets, we demonstrate how contributors' publishing trajectories tend to align with outlet political leanings. We al… ▽ More Digital news outlets rely on a variety of outside contributors, from freelance journalists, to political commentators, to executives and politicians. These external dependencies create a network among news outlets, traced along the contributors they share. Using connections between outlets, we demonstrate how contributors' publishing trajectories tend to align with outlet political leanings. We also show how polarized clustering of outlets translates to differences in the topics of news covered and the style and tone of articles published. In addition, we demonstrate how contributors who cross partisan divides tend to focus on less explicitly political topics. This work addresses an important gap in the media polarization literature, by highlighting how structural factors on the production side of news media create an ecosystem shaped by political leanings, independent of the priorities of any one person or organization. △ Less

Submitted 14 January, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

arXiv:2009.07202 [pdf]

Network Structures of Collective Intelligence: The Contingent Benefits of Group Discussion

Authors: Joshua Becker, Abdullah Almaatouq, Emőke-Ágnes Horvát

Abstract: Research on belief formation has produced contradictory findings on whether and when communication between group members will improve the accuracy of numeric estimates such as economic forecasts, medical diagnoses, and job candidate assessments. While some evidence suggests that carefully mediated processes such as the "Delphi method" produce more accurate beliefs than unstructured discussion, oth… ▽ More Research on belief formation has produced contradictory findings on whether and when communication between group members will improve the accuracy of numeric estimates such as economic forecasts, medical diagnoses, and job candidate assessments. While some evidence suggests that carefully mediated processes such as the "Delphi method" produce more accurate beliefs than unstructured discussion, others argue that unstructured discussion outperforms mediated processes. Still others argue that independent individuals produce the most accurate beliefs. This paper shows how network theories of belief formation can resolve these inconsistencies, even when groups lack apparent structure as in informal conversation. Emergent network structures of influence interact with the pre-discussion belief distribution to moderate the effect of communication on belief formation. As a result, communication sometimes increases and sometimes decreases the accuracy of the average belief in a group. The effects differ for mediated processes and unstructured communication, such that the relative benefit of each communication format depends on both group dynamics as well as the statistical properties of pre-interaction beliefs. These results resolve contradictions in previous research and offer practical recommendations for teams and organizations. △ Less

Submitted 8 March, 2021; v1 submitted 15 September, 2020; originally announced September 2020.

Comments: 27 pages including Appendix preregistration at https://osf.io/9xq2j replication data and code at https://github.com/joshua-a-becker/emergent-network-structure

Showing 1–15 of 15 results for author: Horvát, E