-
BIP! NDR (NoDoiRefs): A Dataset of Citations From Papers Without DOIs in Computer Science Conferences and Workshops
Authors:
Paris Koloveas,
Serafeim Chatzopoulos,
Christos Tryfonopoulos,
Thanasis Vergoulis
Abstract:
In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, rais…
▽ More
In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, raising limitations to citation analysis. While the Microsoft Academic Graph (MAG) previously addressed this issue by providing substantial coverage, its discontinuation has created a void in available data. BIP! NDR aims to alleviate this issue and enhance the research assessment processes within the field of Computer Science. To accomplish this, it leverages a workflow that identifies and retrieves Open Science papers lacking DOIs from the DBLP Corpus, and by performing text analysis, it extracts citation information directly from their full text. The current version of the dataset contains more than 510K citations made by approximately 60K open access Computer Science conference or workshop papers that, according to DBLP, do not have a DOI.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Piloting topic-aware research impact assessment features in BIP! Services
Authors:
Serafeim Chatzopoulos,
Kleanthis Vichos,
Ilias Kanellos,
Thanasis Vergoulis
Abstract:
Various research activities rely on citation-based impact indicators. However these indicators are usually globally computed, hindering their proper interpretation in applications like research assessment and knowledge discovery. In this work, we advocate for the use of topic-aware categorical impact indicators, to alleviate the aforementioned problem. In addition, we extend BIP! Services to suppo…
▽ More
Various research activities rely on citation-based impact indicators. However these indicators are usually globally computed, hindering their proper interpretation in applications like research assessment and knowledge discovery. In this work, we advocate for the use of topic-aware categorical impact indicators, to alleviate the aforementioned problem. In addition, we extend BIP! Services to support those indicators and showcase their benefits in real-world research activities.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
BIP! Scholar: A Service to Facilitate Fair Researcher Assessment
Authors:
Thanasis Vergoulis,
Serafeim Chatzopoulos,
Kleanthis Vichos,
Ilias Kanellos,
Andrea Mannocci,
Natalia Manola,
Paolo Manghi
Abstract:
In recent years, assessing the performance of researchers has become a burden due to the extensive volume of the existing research output. As a result, evaluators often end up relying heavily on a selection of performance indicators like the h-index. However, over-reliance on such indicators may result in reinforcing dubious research practices, while overlooking important aspects of a researcher's…
▽ More
In recent years, assessing the performance of researchers has become a burden due to the extensive volume of the existing research output. As a result, evaluators often end up relying heavily on a selection of performance indicators like the h-index. However, over-reliance on such indicators may result in reinforcing dubious research practices, while overlooking important aspects of a researcher's career, such as their exact role in the production of particular research works or their contribution to other important types of academic or research activities (e.g., production of datasets, peer reviewing). In response, a number of initiatives that attempt to provide guidelines towards fairer research assessment frameworks have been established. In this work, we present BIP! Scholar, a Web-based service that offers researchers the opportunity to set up profiles that summarise their research careers taking into consideration well-established guidelines for fair research assessment, facilitating the work of evaluators who want to be more compliant with the respective practices.
△ Less
Submitted 6 May, 2022;
originally announced May 2022.
-
Atrapos: Real-time Evaluation of Metapath Query Workloads
Authors:
Serafeim Chatzopoulos,
Thanasis Vergoulis,
Dimitrios Skoutas,
Theodore Dalamagas,
Christos Tryfonopoulos,
Panagiotis Karras
Abstract:
Heterogeneous information networks (HINs) represent different types of entities and relationships between them. Exploring, analysing, and extracting knowledge from such networks relies on metapath queries that identify pairs of entities connected by relationships of diverse semantics. While the real-time evaluation of metapath query workloads on large, web-scale HINs is highly demanding in computa…
▽ More
Heterogeneous information networks (HINs) represent different types of entities and relationships between them. Exploring, analysing, and extracting knowledge from such networks relies on metapath queries that identify pairs of entities connected by relationships of diverse semantics. While the real-time evaluation of metapath query workloads on large, web-scale HINs is highly demanding in computational cost, current approaches do not exploit interrelationships among the queries. In this paper, we present ATRAPOS, a new approach for the real-time evaluation of metapath query workloads that leverages a combination of efficient sparse matrix multiplication and intermediate result caching. ATRAPOS selects intermediate results to cache and reuse by detecting frequent sub-metapaths among workload queries in real time, using a tailor-made data structure, the Overlap Tree, and an associated caching policy. Our experimental study on real data shows that ATRAPOS accelerates exploratory data analysis and mining on HINs, outperforming off-the-shelf caching approaches and state-of-the-art research prototypes in all examined scenarios.
-- Note that this version of our work is more extended than the one presented in TheWebConf 2023 (doi: 10.1145/3543507.3583322)
△ Less
Submitted 25 May, 2023; v1 submitted 11 January, 2022;
originally announced January 2022.
-
SCHeMa: Scheduling Scientific Containers on a Cluster of Heterogeneous Machines
Authors:
Thanasis Vergoulis,
Konstantinos Zagganas,
Loukas Kavouras,
Martin Reczko,
Stelios Sartzetakis,
Theodore Dalamagas
Abstract:
In the era of data-driven science, conducting computational experiments that involve analysing large datasets using heterogeneous computational clusters, is part of the everyday routine for many scientists. Moreover, to ensure the credibility of their results, it is very important for these analyses to be easily reproducible by other researchers. Although various technologies, that could facilitat…
▽ More
In the era of data-driven science, conducting computational experiments that involve analysing large datasets using heterogeneous computational clusters, is part of the everyday routine for many scientists. Moreover, to ensure the credibility of their results, it is very important for these analyses to be easily reproducible by other researchers. Although various technologies, that could facilitate the work of scientists in this direction, have been introduced in the recent years, there is still a lack of open source platforms that combine them to this end. In this work, we describe and demonstrate SCHeMa, an open-source platform that facilitates the execution and reproducibility of computational analysis on heterogeneous clusters, leveraging containerization, experiment packaging, workflow management, and machine learning technologies.
△ Less
Submitted 22 March, 2022; v1 submitted 24 March, 2021;
originally announced March 2021.
-
BIP! DB: A Dataset of Impact Measures for Scientific Publications
Authors:
Thanasis Vergoulis,
Ilias Kanellos,
Claudio Atzori,
Andrea Mannocci,
Serafeim Chatzopoulos,
Sandro La Bruzzo,
Natalia Manola,
Paolo Manghi
Abstract:
The growth rate of the number of scientific publications is constantly increasing, creating important challenges in the identification of valuable research and in various scholarly data management applications, in general. In this context, measures which can effectively quantify the scientific impact could be invaluable. In this work, we present BIP! DB, an open dataset that contains a variety of…
▽ More
The growth rate of the number of scientific publications is constantly increasing, creating important challenges in the identification of valuable research and in various scholarly data management applications, in general. In this context, measures which can effectively quantify the scientific impact could be invaluable. In this work, we present BIP! DB, an open dataset that contains a variety of impact measures calculated for a large collection of more than 100 million scientific publications from various disciplines.
△ Less
Submitted 6 May, 2022; v1 submitted 28 January, 2021;
originally announced January 2021.
-
Simplifying Impact Prediction for Scientific Articles
Authors:
Thanasis Vergoulis,
Ilias Kanellos,
Giorgos Giannopoulos,
Theodore Dalamagas
Abstract:
Estimating the expected impact of an article is valuable for various applications (e.g., article/cooperator recommendation). Most existing approaches attempt to predict the exact number of citations each article will receive in the near future, however this is a difficult regression analysis problem. Moreover, most approaches rely on the existence of rich metadata for each article, a requirement t…
▽ More
Estimating the expected impact of an article is valuable for various applications (e.g., article/cooperator recommendation). Most existing approaches attempt to predict the exact number of citations each article will receive in the near future, however this is a difficult regression analysis problem. Moreover, most approaches rely on the existence of rich metadata for each article, a requirement that cannot be adequately fulfilled for a large number of them. In this work, we take advantage of the fact that solving a simpler machine learning problem, that of classifying articles based on their expected impact, is adequate for many real world applications and we propose a simplified model that can be trained using minimal article metadata. Finally, we examine various configurations of this model and evaluate their effectiveness in solving the aforementioned classification problem.
△ Less
Submitted 30 December, 2020;
originally announced December 2020.
-
Ranking Papers by their Short-Term Scientific Impact
Authors:
Ilias Kanellos,
Thanasis Vergoulis,
Dimitris Sacharidis,
Theodore Dalamagas,
Yannis Vassiliou
Abstract:
The constantly increasing rate at which scientific papers are published makes it difficult for researchers to identify papers that currently impact the research field of their interest. Hence, approaches to effectively identify papers of high impact have attracted great attention in the past. In this work, we present a method that seeks to rank papers based on their estimated short-term impact, as…
▽ More
The constantly increasing rate at which scientific papers are published makes it difficult for researchers to identify papers that currently impact the research field of their interest. Hence, approaches to effectively identify papers of high impact have attracted great attention in the past. In this work, we present a method that seeks to rank papers based on their estimated short-term impact, as measured by the number of citations received in the near future. Similar to previous work, our method models a researcher as she explores the paper citation network. The key aspect is that we incorporate an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher's preference to read papers which received a lot of attention recently. A detailed experimental evaluation on four real citation datasets across disciplines, shows that our approach is more effective than previous work in ranking papers based on their short-term impact.
△ Less
Submitted 20 April, 2021; v1 submitted 1 June, 2020;
originally announced June 2020.