-
Amplifying Academic Research through YouTube: Engagement Metrics as Predictors of Citation Impact
Authors:
Olga Zagovora,
Talisa Schwal,
Katrin Weller
Abstract:
This study explores the interplay between YouTube engagement metrics and the academic impact of cited publications within video descriptions, amid declining trust in traditional journalism and increased reliance on social media for information. By analyzing data from Altmetric.com and YouTube's API, it assesses how YouTube video features relate to citation impact. Initial results suggest that vide…
▽ More
This study explores the interplay between YouTube engagement metrics and the academic impact of cited publications within video descriptions, amid declining trust in traditional journalism and increased reliance on social media for information. By analyzing data from Altmetric.com and YouTube's API, it assesses how YouTube video features relate to citation impact. Initial results suggest that videos citing scientific publications and garnering high engagement-likes, comments, and references to other publications-may function as a filtering mechanism or even as a predictor of impactful research.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Between Flat-Earthers and Fitness Coaches: Who is Citing Scientific Publications in YouTube Video Descriptions?
Authors:
Olga Zagovora,
Katrin Weller
Abstract:
In this study, we undertake an extensive analysis of YouTube channels that reference research publications in their video descriptions, offering a unique insight into the intersection of digital media and academia. Our investigation focuses on three principal aspects: the background of YouTube channel owners, their thematic focus, and the nature of their operational dynamics, specifically addressi…
▽ More
In this study, we undertake an extensive analysis of YouTube channels that reference research publications in their video descriptions, offering a unique insight into the intersection of digital media and academia. Our investigation focuses on three principal aspects: the background of YouTube channel owners, their thematic focus, and the nature of their operational dynamics, specifically addressing whether they work individually or in groups. Our results highlight a strong emphasis on content related to science and engineering, as well as health, particularly in channels managed by individual researchers and academic institutions. However, there is a notable variation in the popularity of these channels, with professional YouTubers and commercial media entities often outperforming in terms of viewer engagement metrics like likes, comments, and views. This underscores the challenge academic channels face in attracting a wider audience. Further, we explore the role of academic actors on YouTube, scrutinizing their impact in disseminating research and the types of publications they reference. Despite a general inclination towards professional academic topics, these channels displayed a varied effectiveness in spotlighting highly cited research. Often, they referenced a wide array of publications, indicating a diverse but not necessarily impact-focused approach to content selection.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Modeling the Galactic Chemical Evolution of Helium
Authors:
Miqaela K. Weller,
David H. Weinberg,
James W. Johnson
Abstract:
We examine the galactic chemical evolution (GCE) of $^4$He in one-zone and multi-zone models, with particular attention to theoretical predictions and empirical constraints on IMF-averaged yields. Published models of massive star winds and core collapse supernovae span a factor of 2 -- 3 in the IMF-averaged $^4$He yield, $y\mathrm{_{He}^{CC}}$. Published models of intermediate mass, asymptotic gia…
▽ More
We examine the galactic chemical evolution (GCE) of $^4$He in one-zone and multi-zone models, with particular attention to theoretical predictions and empirical constraints on IMF-averaged yields. Published models of massive star winds and core collapse supernovae span a factor of 2 -- 3 in the IMF-averaged $^4$He yield, $y\mathrm{_{He}^{CC}}$. Published models of intermediate mass, asymptotic giant branch (AGB) stars show better agreement on the IMF-averaged yield, $y\mathrm{_{He}^{AGB}}$, and they predict that more than half of this yield comes from stars with $M=4-8 M_\odot$, making AGB $^4$He enrichment rapid compared to Fe enrichment from Type Ia supernovae. Although our GCE models include many potentially complicating effects, the short enrichment time delay and mild metallicity dependence of the predicted yields makes the results quite simple: across a wide range of metallicity and age, the non-primordial $^4$He mass fraction $ΔY = Y-Y_{\mathrm{P}}$ is proportional to the abundance of promptly produced $α$-elements, like oxygen, with $ΔY/Z_{\mathrm{O}} \approx (y\mathrm{_{He}^{CC}}+y\mathrm{_{He}^{AGB}})/y\mathrm{_{O}^{CC}}$. Reproducing solar abundances with our fiducial choice of the oxygen yield $y\mathrm{_{O}^{CC}}=0.0071$ implies $y\mathrm{_{He}^{CC}}+y\mathrm{_{He}^{AGB}} \approx 0.022$, i.e., $0.022M_\odot$ of net $^4$He production per solar mass of star formation. Our GCE models with this yield normalization are consistent with most available observations, though the implied $y\mathrm{_{He}^{CC}}$ is low compared to most of the published massive star models. More precise measurements of $ΔY$ in stars and gas across a wide range of metallicity and [$α$/Fe] ratio could test our models more stringently, either confirming the simple picture suggested by our calculations or revealing surprises in the evolution of the second most abundant element.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
XTable in Action: Seamless Interoperability in Data Lakes
Authors:
Ashvin Agrawal,
Tim Brown,
Anoop Johnson,
Jesús Camacho-Rodríguez,
Kyle Weller,
Carlo Curino,
Raghu Ramakrishnan
Abstract:
Contemporary approaches to data management are increasingly relying on unified analytics and AI platforms to foster collaboration, interoperability, seamless access to reliable data, and high performance. Data Lakes featuring open standard table formats such as Delta Lake, Apache Hudi, and Apache Iceberg are central components of these data architectures. Choosing the right format for managing a t…
▽ More
Contemporary approaches to data management are increasingly relying on unified analytics and AI platforms to foster collaboration, interoperability, seamless access to reliable data, and high performance. Data Lakes featuring open standard table formats such as Delta Lake, Apache Hudi, and Apache Iceberg are central components of these data architectures. Choosing the right format for managing a table is crucial for achieving the objectives mentioned above. The challenge lies in selecting the best format, a task that is onerous and can yield temporary results, as the ideal choice may shift over time with data growth, evolving workloads, and the competitive development of table formats and processing engines. Moreover, restricting data access to a single format can hinder data sharing resulting in diminished business value over the long term. The ability to seamlessly interoperate between formats and with negligible overhead can effectively address these challenges. Our solution in this direction is an innovative omni-directional translator, XTable, that facilitates writing data in one format and reading it in any format, thus achieving the desired format interoperability. In this work, we demonstrate the effectiveness of XTable through application scenarios inspired by real-world use cases.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Total Error Sheets for Datasets (TES-D) -- A Critical Guide to Documenting Online Platform Datasets
Authors:
Leon Fröhling,
Indira Sen,
Felix Soldner,
Leonie Steinbrinker,
Maria Zens,
Katrin Weller
Abstract:
This paper proposes a template for documenting datasets that have been collected from online platforms for research purposes. The template should help to critically reflect on data quality and increase transparency in research fields that make use of online platform data. The paper describes our motivation, outlines the procedure for develo** a specific documentation template that we refer to as…
▽ More
This paper proposes a template for documenting datasets that have been collected from online platforms for research purposes. The template should help to critically reflect on data quality and increase transparency in research fields that make use of online platform data. The paper describes our motivation, outlines the procedure for develo** a specific documentation template that we refer to as TES-D (Total Error Sheets for Datasets) and has the current version of the template, guiding questions and a manual attached as supplementary material. The TES-D approach builds upon prior work in designing error frameworks for data from online platforms, namely the Total Error Framework for digital traces of human behavior on online platforms (TED-On, https://doi.org/10.1093/poq/nfab018).
△ Less
Submitted 25 June, 2023;
originally announced June 2023.
-
Map** Progenitors of Binary Black Holes and Neutron Stars with Binary Population Synthesis
Authors:
Miqaela K. Weller,
Jennifer A. Johnson
Abstract:
The first directly observed gravitational wave event, GW150914, featuring the merger of two massive black holes, highlighted the need to determine how these systems of compact remnant binaries are formed. We use the binary population synthesis code COSMIC (Compact Object Synthesis and Monte Carlo Investigation Code) to predict the types of massive stars that will show significant radial velocity v…
▽ More
The first directly observed gravitational wave event, GW150914, featuring the merger of two massive black holes, highlighted the need to determine how these systems of compact remnant binaries are formed. We use the binary population synthesis code COSMIC (Compact Object Synthesis and Monte Carlo Investigation Code) to predict the types of massive stars that will show significant radial velocity variations, indicative of a potential compact object (i.e. a black hole or neutron star) orbiting the star. We "observe" the binaries generated in the populations with a similar number of epochs and RV accuracy as planned for the Milky Way Mapper. In this analysis, we are especially interested in systems where a compact remnant is orbiting a massive O or B star as these systems survived the first supernova and neutron star kick. We test the ability of the Milky Way Mapper observing strategy to distinguish among different mass loss and kick prescriptions. We find that Wolf-Rayet stars or hot subdwarfs in binaries could be detectable (i.e. luminous, high delta RV max), viable progenitors of such objects, while the different prescriptions primarily affect the number of sources.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
'I Updated the <ref>': The Evolution of References in the English Wikipedia and the Implications for Altmetrics
Authors:
Olga Zagovora,
Roberto Ulloa,
Katrin Weller,
Fabian Flöck
Abstract:
With this work, we present a publicly available dataset of the history of all the references (more than 55 million) ever used in the English Wikipedia until June 2019. We have applied a new method for identifying and monitoring references in Wikipedia, so that for each reference we can provide data about associated actions: creation, modifications, deletions, and reinsertions. The high accuracy of…
▽ More
With this work, we present a publicly available dataset of the history of all the references (more than 55 million) ever used in the English Wikipedia until June 2019. We have applied a new method for identifying and monitoring references in Wikipedia, so that for each reference we can provide data about associated actions: creation, modifications, deletions, and reinsertions. The high accuracy of this method and the resulting dataset was confirmed via a comprehensive crowdworker labelling campaign. We use the dataset to study the temporal evolution of Wikipedia references as well as users' editing behaviour. We find evidence of a mostly productive and continuous effort to improve the quality of references: (1) there is a persistent increase of reference and document identifiers (DOI, PubMedID, PMC, ISBN, ISSN, ArXiv ID), and (2) most of the reference curation work is done by registered humans (not bots or anonymous editors). We conclude that the evolution of Wikipedia references, including the dynamics of the community processes that tend to them should be leveraged in the design of relevance indexes for altmetrics, and our dataset can be pivotal for such effort.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
TED-On: A Total Error Framework for Digital Traces of Human Behavior on Online Platforms
Authors:
Indira Sen,
Fabian Floeck,
Katrin Weller,
Bernd Weiss,
Claudia Wagner
Abstract:
Peoples' activities and opinions recorded as digital traces online, especially on social media and other web-based platforms, offer increasingly informative pictures of the public. They promise to allow inferences about populations beyond the users of the platforms on which the traces are recorded, representing real potential for the Social Sciences and a complement to survey-based research. But t…
▽ More
Peoples' activities and opinions recorded as digital traces online, especially on social media and other web-based platforms, offer increasingly informative pictures of the public. They promise to allow inferences about populations beyond the users of the platforms on which the traces are recorded, representing real potential for the Social Sciences and a complement to survey-based research. But the use of digital traces brings its own complexities and new error sources to the research enterprise. Recently, researchers have begun to discuss the errors that can occur when digital traces are used to learn about humans and social phenomena. This article synthesizes this discussion and proposes a systematic way to categorize potential errors, inspired by the Total Survey Error (TSE) Framework developed for survey methodology. We introduce a conceptual framework to diagnose, understand, and document errors that may occur in studies based on such digital traces. While there are clear parallels to the well-known error sources in the TSE framework, the new "Total Error Framework for Digital Traces of Human Behavior on Online Platforms" (TED-On) identifies several types of error that are specific to the use of digital traces. By providing a standard vocabulary to describe these errors, the proposed framework is intended to advance communication and research concerning the use of digital traces in scientific social research.
△ Less
Submitted 3 June, 2021; v1 submitted 18 July, 2019;
originally announced July 2019.
-
What increases (social) media attention: Research impact, author prominence or title attractiveness?
Authors:
Olga Zagovora,
Katrin Weller,
Milan Janosov,
Claudia Wagner,
Isabella Peters
Abstract:
Do only major scientific breakthroughs hit the news and social media, or does a 'catchy' title help to attract public attention? How strong is the connection between the importance of a scientific paper and the (social) media attention it receives? In this study we investigate these questions by analysing the relationship between the observed attention and certain characteristics of scientific pap…
▽ More
Do only major scientific breakthroughs hit the news and social media, or does a 'catchy' title help to attract public attention? How strong is the connection between the importance of a scientific paper and the (social) media attention it receives? In this study we investigate these questions by analysing the relationship between the observed attention and certain characteristics of scientific papers from two major multidisciplinary journals: Nature Communication (NC) and Proceedings of the National Academy of Sciences (PNAS). We describe papers by features based on the linguistic properties of their titles and centrality measures of their authors in their co-authorship network. We identify linguistic features and collaboration patterns that might be indicators for future attention, and are characteristic to different journals, research disciplines, and media sources.
△ Less
Submitted 17 September, 2018;
originally announced September 2018.
-
Analysing Timelines of National Histories across Wikipedia Editions: A Comparative Computational Approach
Authors:
Anna Samoilenko,
Florian Lemmerich,
Katrin Weller,
Maria Zens,
Markus Strohmaier
Abstract:
Portrayals of history are never complete, and each description inherently exhibits a specific viewpoint and emphasis. In this paper, we aim to automatically identify such differences by computing timelines and detecting temporal focal points of written history across languages on Wikipedia. In particular, we study articles related to the history of all UN member states and compare them in 30 langu…
▽ More
Portrayals of history are never complete, and each description inherently exhibits a specific viewpoint and emphasis. In this paper, we aim to automatically identify such differences by computing timelines and detecting temporal focal points of written history across languages on Wikipedia. In particular, we study articles related to the history of all UN member states and compare them in 30 language editions. We develop a computational approach that allows to identify focal points quantitatively, and find that Wikipedia narratives about national histories (i) are skewed towards more recent events (recency bias) and (ii) are distributed unevenly across the continents with significant focus on the history of European countries (Eurocentric bias). We also establish that national historical timelines vary across language editions, although average interlingual consensus is rather high. We hope that this paper provides a starting point for a broader computational analysis of written history on Wikipedia and elsewhere.
△ Less
Submitted 24 May, 2017;
originally announced May 2017.
-
Think before you collect: Setting up a data collection approach for social media studies
Authors:
Philipp Mayr,
Katrin Weller
Abstract:
This chapter discusses important challenges of designing the data collection setup for social media studies. It outlines how it is necessary to carefully think about which data to collect and to use, and to recognize the effects that a specific data collection approach may have on the types of analyses that can be carried out and the results that can be expected in a study. We will highlight impor…
▽ More
This chapter discusses important challenges of designing the data collection setup for social media studies. It outlines how it is necessary to carefully think about which data to collect and to use, and to recognize the effects that a specific data collection approach may have on the types of analyses that can be carried out and the results that can be expected in a study. We will highlight important questions one should ask before setting up a data collection framework and relate them to the different options for accessing social media data. The chapter will mainly be illustrated with examples from studying Twitter and Facebook. A case study studying political communication around the 2013 elections in Germany should serve as a practical application scenario. In this case study we constructed several social media datasets based on different collection approaches, using data from Facebook and Twitter.
△ Less
Submitted 30 October, 2017; v1 submitted 23 January, 2016;
originally announced January 2016.
-
Scaling Limits of Random Graphs from Subcritical Classes
Authors:
Konstantinos Panagiotou,
Benedikt Stufler,
Kerstin Weller
Abstract:
We study the uniform random graph $\mathsf{C}_n$ with $n$ vertices drawn from a subcritical class of connected graphs. Our main result is that the rescaled graph $\mathsf{C}_n / \sqrt{n}$ converges to the Brownian Continuum Random Tree $\mathcal{T}_{\mathsf{e}}$ multiplied by a constant scaling factor that depends on the class under consideration. In addition, we provide subgaussian tail bounds fo…
▽ More
We study the uniform random graph $\mathsf{C}_n$ with $n$ vertices drawn from a subcritical class of connected graphs. Our main result is that the rescaled graph $\mathsf{C}_n / \sqrt{n}$ converges to the Brownian Continuum Random Tree $\mathcal{T}_{\mathsf{e}}$ multiplied by a constant scaling factor that depends on the class under consideration. In addition, we provide subgaussian tail bounds for the diameter $\text{D}(\mathsf{C}_n)$ and height $\text{H}(\mathsf{C}_n^\bullet)$ of the rooted random graph $\mathsf{C}_n^\bullet$. We give analytic expressions for the scaling factor of several classes, including for example the prominent class of outerplanar graphs. Our methods also enable us to study first passage percolation on $\mathsf{C}_n$, where we show the convergence to $\mathcal{T}_{\mathsf{e}}$ under an appropriate rescaling.
△ Less
Submitted 14 November, 2014; v1 submitted 7 November, 2014;
originally announced November 2014.
-
Social Media Monitoring of the Campaigns for the 2013 German Bundestag Elections on Facebook and Twitter
Authors:
Lars Kaczmirek,
Philipp Mayr,
Ravi Vatrapu,
Arnim Bleier,
Manuela Blumenberg,
Tobias Gummer,
Abid Hussain,
Katharina Kinder-Kurlanda,
Kaveh Manshaei,
Mark Thamm,
Katrin Weller,
Alexander Wenz,
Christof Wolf
Abstract:
As more and more people use social media to communicate their view and perception of elections, researchers have increasingly been collecting and analyzing data from social media platforms. Our research focuses on social media communication related to the 2013 election of the German parlia-ment [translation: Bundestagswahl 2013]. We constructed several social media datasets using data from Faceboo…
▽ More
As more and more people use social media to communicate their view and perception of elections, researchers have increasingly been collecting and analyzing data from social media platforms. Our research focuses on social media communication related to the 2013 election of the German parlia-ment [translation: Bundestagswahl 2013]. We constructed several social media datasets using data from Facebook and Twitter. First, we identified the most relevant candidates (n=2,346) and checked whether they maintained social media accounts. The Facebook data was collected in November 2013 for the period of January 2009 to October 2013. On Facebook we identified 1,408 Facebook walls containing approximately 469,000 posts. Twitter data was collected between June and December 2013 finishing with the constitution of the government. On Twitter we identified 1,009 candidates and 76 other agents, for example, journalists. We estimated the number of relevant tweets to exceed eight million for the period from July 27 to September 27 alone. In this document we summarize past research in the literature, discuss possibilities for research with our data set, explain the data collection procedures, and provide a description of the data and a discussion of issues for archiving and dissemination of social media data.
△ Less
Submitted 1 April, 2014; v1 submitted 16 December, 2013;
originally announced December 2013.
-
Asymptotic properties of some minor-closed classes of graphs
Authors:
Mireille Bousquet-Mélou,
Kerstin Weller
Abstract:
Let A be a minor-closed class of labelled graphs, and let G_n be a random graph sampled uniformly from the set of n-vertex graphs of A. When n is large, what is the probability that G_n is connected? How many components does it have? How large is its biggest component? Thanks to the work of McDiarmid and his collaborators, these questions are now solved when all excluded minors are 2-connected. Us…
▽ More
Let A be a minor-closed class of labelled graphs, and let G_n be a random graph sampled uniformly from the set of n-vertex graphs of A. When n is large, what is the probability that G_n is connected? How many components does it have? How large is its biggest component? Thanks to the work of McDiarmid and his collaborators, these questions are now solved when all excluded minors are 2-connected. Using exact enumeration, we study a collection of classes A excluding non-2-connected minors, and show that their asymptotic behaviour may be rather different from the 2-connected case. This behaviour largely depends on the nature of dominant singularity of the generating function C(z) that counts connected graphs of A. We classify our examples accordingly, thus taking a first step towards a classification of minor-closed classes of graphs. Furthermore, we investigate a parameter that has not received any attention in this context yet: the size of the root component. It follows non-gaussian limit laws (beta and gamma), and clearly deserves a systematic investigation.
△ Less
Submitted 5 February, 2014; v1 submitted 15 March, 2013;
originally announced March 2013.
-
Photon Counting EMCCDs: New Opportunities for High Time Resolution Astrophysics
Authors:
Craig Mackay,
Keith Weller,
Frank Suess
Abstract:
Electron Multiplying CCDs (EMCCDs) are used much less often than they might be because of the challenges they offer camera designers more comfortable with the design of slow-scan detector systems. However they offer an entirely new range of opportunities in astrophysical instrumentation. This paper will show some of the exciting new results obtained with these remarkable devices and talk about the…
▽ More
Electron Multiplying CCDs (EMCCDs) are used much less often than they might be because of the challenges they offer camera designers more comfortable with the design of slow-scan detector systems. However they offer an entirely new range of opportunities in astrophysical instrumentation. This paper will show some of the exciting new results obtained with these remarkable devices and talk about their potential in other areas of astrophysical application. We will then describe how they may be operated to give the very best performance at the lowest possible light levels. We will show that clock induced charge may be reduced to negligible levels and that, with care, devices may be clocked at significantly higher speeds than usually achieved. As an example of the advantages offered by these detectors we will show how a multi-detector EMCCD curvature wavefront sensor will revolutionise the sensitivity of adaptive optics instruments and been able to deliver the highest resolution images ever taken in the visible or the near infrared.
△ Less
Submitted 17 July, 2012;
originally announced July 2012.
-
High-resolution imaging and spectroscopy in the visible from large ground-based telescopes with natural guide stars
Authors:
Craig Mackay,
Tim D. Staley,
David King,
Frank Suess,
Keith Weller
Abstract:
Near-diffraction limited imaging and spectroscopy in the visible on large (8-10 meter) class telescopes has proved to be beyond the capabilities of current adaptive optics technologies, even when using laser guide stars. The need for high resolution visible imaging in any part of the sky suggests that a rather different approach is needed. This paper describes the results of simulations, experimen…
▽ More
Near-diffraction limited imaging and spectroscopy in the visible on large (8-10 meter) class telescopes has proved to be beyond the capabilities of current adaptive optics technologies, even when using laser guide stars. The need for high resolution visible imaging in any part of the sky suggests that a rather different approach is needed. This paper describes the results of simulations, experiments and astronomical observations that show that a combination of low order adaptive optic correction using a 4-field curvature sensor and fast Lucky Imaging strategies with a photon counting CCD camera systems should deliver 20-25 milliarcsecond resolution in the visible with reference stars as faint as 18.5 magnitude in I band on large telescopes. Such an instrument may be used to feed an integral field spectrograph efficiently using configurations that will also be described.
△ Less
Submitted 9 July, 2010;
originally announced July 2010.
-
High-Speed, Photon Counting CCD Cameras for Astronomy
Authors:
Craig Mackay,
Tim D. Staley,
David King,
Frank Suess,
Keith Weller
Abstract:
The design of electron multiplying CCD cameras require a very different approach from that appropriate for slow scan CCD operation. This paper describes the main problems in using electron multiplying CCDs for high-speed, photon counting applications in astronomy and how these may be substantially overcome. With careful design it is possible to operate the E2V Technologies L3CCDs at rates well in…
▽ More
The design of electron multiplying CCD cameras require a very different approach from that appropriate for slow scan CCD operation. This paper describes the main problems in using electron multiplying CCDs for high-speed, photon counting applications in astronomy and how these may be substantially overcome. With careful design it is possible to operate the E2V Technologies L3CCDs at rates well in excess of that claimed by the manufacturer, and that levels of clock induced charge dramatically lower than those experienced with commercial cameras that need to operate at unity gain. Measurements of the performance of the E2V Technologies CCD201 operating at 26 MHz will be presented together with a guide to the effective reduction of clock induced charge levels. Examples of astronomical results obtained with our cameras are presented.
△ Less
Submitted 9 July, 2010;
originally announced July 2010.