Search | arXiv e-print repository

An open dataset of article processing charges from six large scholarly publishers (2019-2023)

Authors: Leigh-Ann Butler, Madelaine Hare, Nina Schönfelder, Eric Schares, Juan Pablo Alperin, Stefanie Haustein

Abstract: This paper introduces a dataset of article processing charges (APCs) produced from the price lists of six large scholarly publishers - Elsevier, Frontiers, PLOS, MDPI, Springer Nature and Wiley - between 2019 and 2023. APC price lists were downloaded from publisher websites each year as well as via Wayback Machine snapshots to retrieve fees per journal per year. The dataset includes journal metada… ▽ More This paper introduces a dataset of article processing charges (APCs) produced from the price lists of six large scholarly publishers - Elsevier, Frontiers, PLOS, MDPI, Springer Nature and Wiley - between 2019 and 2023. APC price lists were downloaded from publisher websites each year as well as via Wayback Machine snapshots to retrieve fees per journal per year. The dataset includes journal metadata, APC collection method, and annual APC price list information in several currencies (USD, EUR, GBP, CHF, JPY, CAD) for 8,712 unique journals and 36,618 journal-year combinations. The dataset was generated to allow for more precise analysis of APCs and can support library collection development and scientometric analysis estimating APCs paid in gold and hybrid OA journals. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 13 pages, 3 figures, 4 tables

arXiv:2404.01985 [pdf]

The open access coverage of OpenAlex, Scopus and Web of Science

Authors: Marc-Andre Simard, Isabel Basson, Madelaine Hare, Vincent Lariviere, Philippe Mongeon

Abstract: Diamond open access (OA) journals offer a publishing model that is free for both authors and readers, but their lack of indexing in major bibliographic databases presents challenges in assessing the uptake of these journals. Furthermore, OA characteristics such as publication language and country of publication have often been used to support the argument that OA journals are more diverse and aim… ▽ More Diamond open access (OA) journals offer a publishing model that is free for both authors and readers, but their lack of indexing in major bibliographic databases presents challenges in assessing the uptake of these journals. Furthermore, OA characteristics such as publication language and country of publication have often been used to support the argument that OA journals are more diverse and aim to serve a local community, but there is a current lack of empirical evidence related to the geographical and linguistic characteristics of OA journals. Using OpenAlex and the Directory of Open Access Journals as a benchmark, this paper investigates the coverage of diamond and gold through authorship and journal coverage in the Web of Science and Scopus by field, country, and language. Results show their lower coverage in WoS and Scopus, and the local scope of diamond OA. The share of English-only journals is considerably higher among gold journals. High-income countries have the highest share of authorship in every domain and type of journal, except for diamond journals in the social sciences and humanities. Understanding the current landscape of diamond OA indexing can aid the scholarly communications network with advancing policy and practices towards more inclusive OA models. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2308.04379 [pdf]

Who Re-Uses Data? A Bibliometric Analysis of Dataset Citations

Authors: Geoff Krause, Madelaine Hare, Mike Smit, Philippe Mongeon

Abstract: Open data is receiving increased attention and support in academic environments, with one justification being that shared data may be re-used in further research. But what evidence exists for such re-use, and what is the relationship between the producers of shared datasets and researchers who use them? Using a sample of data citations from OpenAlex, this study investigates the relationship betwee… ▽ More Open data is receiving increased attention and support in academic environments, with one justification being that shared data may be re-used in further research. But what evidence exists for such re-use, and what is the relationship between the producers of shared datasets and researchers who use them? Using a sample of data citations from OpenAlex, this study investigates the relationship between creators and citers of datasets at the individual, institutional, and national levels. We find that the vast majority of datasets have no recorded citations, and that most cited datasets only have a single citation. Rates of self-citation by individuals and institutions tend towards the low end of previous findings and vary widely across disciplines. At the country level, the United States is by far the most prominent exporter of re-used datasets, while importation is more evenly distributed. Understanding where and how the sharing of data between researchers, institutions, and countries takes place is essential to develo** open research practices. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: 36 pages, 10 figures

arXiv:2306.16554 [pdf]

Do you cite what you tweet? Investigating the relationship between tweeting and citing research articles

Authors: Madelaine Hare, Geoff Krause, Keith MacKnight, Timothy D. Bowman, Rodrigo Costas, Philippe Mongeon

Abstract: The last decade of altmetrics research has demonstrated that altmetrics have a low to moderate correlation with citations, depending on the platform and the discipline, among other factors. Most past studies used academic works as their unit of analysis to determine whether the attention they received on Twitter was a good predictor of academic engagement. Our work revisits the relationship betwee… ▽ More The last decade of altmetrics research has demonstrated that altmetrics have a low to moderate correlation with citations, depending on the platform and the discipline, among other factors. Most past studies used academic works as their unit of analysis to determine whether the attention they received on Twitter was a good predictor of academic engagement. Our work revisits the relationship between tweets and citations where the tweet itself is the unit of analysis, and the question is to determine if, at the individual level, the act of tweeting an academic work can shed light on the likelihood of the act of citing that same work. We model this relationship by considering the research activity of the tweeter and its relationship to the tweeted work. Results show that tweeters are more likely to cite works affiliated with their same institution, works published in journals in which they also have published, and works in which they hold authorship. It finds that the older the academic age of a tweeter the less likely they are to cite what they tweet, though there is a positive relationship between citations and the number of works they have published and references they have accumulated over time. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2104.06913 [pdf]

doi 10.1145/3437359.3465563

Managing Cloud networking costs for data-intensive applications by provisioning dedicated network links

Authors: Igor Sfiligoi, Michael Hare, David Schultz, Frank Würthwein, Benedikt Riedel, Tom Hutton, Steve Barnet, Vladimir Brik

Abstract: Many scientific high-throughput applications can benefit from the elastic nature of Cloud resources, especially when there is a need to reduce time to completion. Cost considerations are usually a major issue in such endeavors, with networking often a major component; for data-intensive applications, egress networking costs can exceed the compute costs. Dedicated network links provide a way to low… ▽ More Many scientific high-throughput applications can benefit from the elastic nature of Cloud resources, especially when there is a need to reduce time to completion. Cost considerations are usually a major issue in such endeavors, with networking often a major component; for data-intensive applications, egress networking costs can exceed the compute costs. Dedicated network links provide a way to lower the networking costs, but they do add complexity. In this paper we provide a description of a 100 fp32 PFLOPS Cloud burst in support of IceCube production compute, that used Internet2 Cloud Connect service to provision several logically-dedicated network links from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and Google Cloud Platform, that in aggregate enabled approximately 100 Gbps egress capability to on-prem storage. It provides technical details about the provisioning process, the benefits and limitations of such a setup and an analysis of the costs incurred. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Comments: 8 pages, 7 figures, 4 tables, to be published in proceedings of PEARC21

Showing 1–5 of 5 results for author: Hare, M