Search | arXiv e-print repository

arXiv:2303.01994 [pdf, other]

Discovery and Recognition of Formula Concepts using Machine Learning

Authors: Philipp Scharpf, Moritz Schubotz, Howard S. Cohl, Corinna Breitinger, Bela Gipp

Abstract: Citation-based Information Retrieval (IR) methods for scientific documents have proven effective for IR applications, such as Plagiarism Detection or Literature Recommender Systems in academic disciplines that use many references. In science, technology, engineering, and mathematics, researchers often employ mathematical concepts through formula notation to refer to prior knowledge. Our long-term… ▽ More Citation-based Information Retrieval (IR) methods for scientific documents have proven effective for IR applications, such as Plagiarism Detection or Literature Recommender Systems in academic disciplines that use many references. In science, technology, engineering, and mathematics, researchers often employ mathematical concepts through formula notation to refer to prior knowledge. Our long-term goal is to generalize citation-based IR methods and apply this generalized method to both classical references and mathematical concepts. In this paper, we suggest how mathematical formulas could be cited and define a Formula Concept Retrieval task with two subtasks: Formula Concept Discovery (FCD) and Formula Concept Recognition (FCR). While FCD aims at the definition and exploration of a 'Formula Concept' that names bundled equivalent representations of a formula, FCR is designed to match a given formula to a prior assigned unique mathematical concept identifier. We present machine learning-based approaches to address the FCD and FCR tasks. We then evaluate these approaches on a standardized test collection (NTCIR arXiv dataset). Our FCD approach yields a precision of 68% for retrieving equivalent representations of frequent formulas and a recall of 72% for extracting the formula name from the surrounding text. FCD and FCR enable the citation of formulas within mathematical documents and facilitate semantic search and question answering as well as document similarity assessments for plagiarism detection or recommender systems. △ Less

Submitted 19 March, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: Accepted by Scientometrics (Springer) journal

MSC Class: 68P20 (Primary); 68T50 (Secondary) ACM Class: H.3.3; I.2.7

arXiv:2205.05414 [pdf]

doi 10.1145/3529372.3533281

Recommending Research Papers to Chemists: A Specialized Interface for Chemical Entity Exploration

Authors: Corinna Breitinger, Kay Herklotz, Tim Flegelskamp, Norman Meuschke

Abstract: Researchers and scientists increasingly rely on specialized information retrieval (IR) or recommendation systems (RS) to support them in their daily research tasks. Paper recommender systems are one such tool scientists use to stay on top of the ever-increasing number of academic publications in their field. Improving research paper recommender systems is an active research field. However, less re… ▽ More Researchers and scientists increasingly rely on specialized information retrieval (IR) or recommendation systems (RS) to support them in their daily research tasks. Paper recommender systems are one such tool scientists use to stay on top of the ever-increasing number of academic publications in their field. Improving research paper recommender systems is an active research field. However, less research has focused on how the interfaces of research paper recommender systems can be tailored to suit the needs of different research domains. For example, in the field of biomedicine and chemistry, researchers are not only interested in textual relevance but may also want to discover or compare the contained chemical entity information found in a paper's full text. Existing recommender systems for academic literature do not support the discovery of this non-textual, but semantically valuable, chemical entity data. We present the first implementation of a specialized chemistry paper recommender system capable of visualizing the contained chemical structures, chemical formulae, and synonyms for chemical compounds within the document's full text. We review existing tools and related research in this field before describing the implementation of our ChemVis system. With the help of chemists, we are expanding the functionality of ChemVis, and will perform an evaluation of recommendation performance and usability in future work. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Author's preprint version. Final publication to appear in Proceedings of ACM/IEEE Joint Conference on Digital Libraries (JCDL'22)

arXiv:2202.06354 [pdf]

Non-fungible Tokens: Promise or Peril?

Authors: Arsalan Parham, Corinna Breitinger

Abstract: Non-fungible tokens or NFTs are the digital assets on a blockchain. NFTs are unique and they cannot be divided like cryptocurrencies. NFTs could store digital ownership of an artwork or collections or can be fan tokens or tickets for clubs. NFTs are based on a smart contract on a blockchain network which supports them, such as Ethereum, Cardano or Polkadot. Most of the NFTs are now minted on Ether… ▽ More Non-fungible tokens or NFTs are the digital assets on a blockchain. NFTs are unique and they cannot be divided like cryptocurrencies. NFTs could store digital ownership of an artwork or collections or can be fan tokens or tickets for clubs. NFTs are based on a smart contract on a blockchain network which supports them, such as Ethereum, Cardano or Polkadot. Most of the NFTs are now minted on Ethereum (ERC-20) network, but it has some main issues like high transaction fees and low speed. There are lots of domains which can be benefited from NFT technology such as art, music, gaming, sport and wildlife conservation. NFTs could be also bought or sold on lots of NFT marketplaces such as OpenSea and Chiliz. The trend is in a huge hype because the market cap and popularity of NFTs are growing significantly. △ Less

Submitted 13 February, 2022; originally announced February 2022.

arXiv:2109.07791 [pdf, other]

A Qualitative Evaluation of User Preference for Link-based vs. Text-based Recommendations of Wikipedia Articles

Authors: Malte Ostendorff, Corinna Breitinger, Bela Gipp

Abstract: Literature recommendation systems (LRS) assist readers in the discovery of relevant content from the overwhelming amount of literature available. Despite the widespread adoption of LRS, there is a lack of research on the user-perceived recommendation characteristics for fundamentally different approaches to content-based literature recommendation. To complement existing quantitative studies on lit… ▽ More Literature recommendation systems (LRS) assist readers in the discovery of relevant content from the overwhelming amount of literature available. Despite the widespread adoption of LRS, there is a lack of research on the user-perceived recommendation characteristics for fundamentally different approaches to content-based literature recommendation. To complement existing quantitative studies on literature recommendation, we present qualitative study results that report on users' perceptions for two contrasting recommendation classes: (1) link-based recommendation represented by the Co-Citation Proximity (CPA) approach, and (2) text-based recommendation represented by Lucene's MoreLikeThis (MLT) algorithm. The empirical data analyzed in our study with twenty users and a diverse set of 40 Wikipedia articles indicate a noticeable difference between text- and link-based recommendation generation approaches along several key dimensions. The text-based MLT method receives higher satisfaction ratings in terms of user-perceived similarity of recommended articles. In contrast, the CPA approach receives higher satisfaction scores in terms of diversity and serendipity of recommendations. We conclude that users of literature recommendation systems can benefit most from hybrid approaches that combine both link- and text-based approaches, where the user's information needs and preferences should control the weighting for the approaches used. The optimal weighting of multiple approaches used in a hybrid recommendation system is highly dependent on a user's shifting needs. △ Less

Submitted 16 September, 2021; originally announced September 2021.

Comments: Accepted for publication at ICADL 2021

arXiv:2005.12099 [pdf, other]

doi 10.1007/978-3-030-53518-6_15

AutoMSC: Automatic Assignment of Mathematics Subject Classification Labels

Authors: Moritz Schubotz, Philipp Scharpf, Olaf Teschke, Andreas Kuehnemund, Corinna Breitinger, Bela Gipp

Abstract: Authors of research papers in the fields of mathematics, and other math-heavy disciplines commonly employ the Mathematics Subject Classification (MSC) scheme to search for relevant literature. The MSC is a hierarchical alphanumerical classification scheme that allows librarians to specify one or multiple codes for publications. Digital Libraries in Mathematics, as well as reviewing services, such… ▽ More Authors of research papers in the fields of mathematics, and other math-heavy disciplines commonly employ the Mathematics Subject Classification (MSC) scheme to search for relevant literature. The MSC is a hierarchical alphanumerical classification scheme that allows librarians to specify one or multiple codes for publications. Digital Libraries in Mathematics, as well as reviewing services, such as zbMATH and Mathematical Reviews (MR) rely on these MSC labels in their workflows to organize the abstracting and reviewing process. Especially, the coarse-grained classification determines the subject editor who is responsible for the actual reviewing process. In this paper, we investigate the feasibility of automatically assigning a coarse-grained primary classification using the MSC scheme, by regarding the problem as a multi-class classification machine learning task. We find that our method achieves an (F_1)-score of over 77%, which is remarkably close to the agreement of zbMATH and MR ((F_1)-score of 81%). Moreover, we find that the method's confidence score allows for reducing the effort by 86% compared to the manual coarse-grained classification effort while maintaining a precision of 81% for automatically classified articles. △ Less

Submitted 9 November, 2020; v1 submitted 25 May, 2020; originally announced May 2020.

Journal ref: Intelligent Computer Mathematics - 13thInternational Conference, {CICM} 2020, Bertinoro, Italy, July 26-31, 2020, Proceedings

arXiv:2002.02712 [pdf, other]

doi 10.1145/3366423.3380218

Discovering Mathematical Objects of Interest -- A Study of Mathematical Notations

Authors: Andre Greiner-Petter, Moritz Schubotz, Fabian Mueller, Corinna Breitinger, Howard S. Cohl, Akiko Aizawa, Bela Gipp

Abstract: Mathematical notation, i.e., the writing system used to communicate concepts in mathematics, encodes valuable information for a variety of information search and retrieval systems. Yet, mathematical notations remain mostly unutilized by today's systems. In this paper, we present the first in-depth study on the distributions of mathematical notation in two large scientific corpora: the open access… ▽ More Mathematical notation, i.e., the writing system used to communicate concepts in mathematics, encodes valuable information for a variety of information search and retrieval systems. Yet, mathematical notations remain mostly unutilized by today's systems. In this paper, we present the first in-depth study on the distributions of mathematical notation in two large scientific corpora: the open access arXiv (2.5B mathematical objects) and the mathematical reviewing service for pure and applied mathematics zbMATH (61M mathematical objects). Our study lays a foundation for future research projects on mathematical information retrieval for large scientific corpora. Further, we demonstrate the relevance of our results to a variety of use-cases. For example, to assist semantic extraction systems, to improve scientific search engines, and to facilitate specialized math recommendation systems. The contributions of our presented research are as follows: (1) we present the first distributional analysis of mathematical formulae on arXiv and zbMATH; (2) we retrieve relevant mathematical objects for given textual search queries (e.g., linking $P_{n}^{(α, β)}\!\left(x\right)$ with `Jacobi polynomial'); (3) we extend zbMATH's search engine by providing relevant mathematical formulae; and (4) we exemplify the applicability of the results by presenting auto-completion for math inputs as the first contribution to math recommendation systems. To expedite future research projects, we have made available our source code and data. △ Less

Submitted 22 June, 2021; v1 submitted 7 February, 2020; originally announced February 2020.

Comments: Proceedings of The Web Conference 2020 (WWW'20), April 20--24, 2020, Taipei, Taiwan

arXiv:1909.02766 [pdf]

Giveme5W1H: A Universal System for Extracting Main Events from News Articles

Authors: Felix Hamborg, Corinna Breitinger, Bela Gipp

Abstract: Event extraction from news articles is a commonly required prerequisite for various tasks, such as article summarization, article clustering, and news aggregation. Due to the lack of universally applicable and publicly available methods tailored to news datasets, many researchers redundantly implement event extraction methods for their own projects. The journalistic 5W1H questions are capable of d… ▽ More Event extraction from news articles is a commonly required prerequisite for various tasks, such as article summarization, article clustering, and news aggregation. Due to the lack of universally applicable and publicly available methods tailored to news datasets, many researchers redundantly implement event extraction methods for their own projects. The journalistic 5W1H questions are capable of describing the main event of an article, i.e., by answering who did what, when, where, why, and how. We provide an in-depth description of an improved version of Giveme5W1H, a system that uses syntactic and domain-specific rules to automatically extract the relevant phrases from English news articles to provide answers to these 5W1H questions. Given the answers to these questions, the system determines an article's main event. In an expert evaluation with three assessors and 120 articles, we determined an overall precision of p=0.73, and p=0.82 for answering the first four W questions, which alone can sufficiently summarize the main event reported on in a news article. We recently made our system publicly available, and it remains the only universal open-source 5W1H extractor capable of being applied to a wide range of use cases in news analysis. △ Less

Submitted 6 September, 2019; originally announced September 2019.

arXiv:1904.00237 [pdf, other]

A decentralized method for making sensor measurements tamper-proof to support open science applications

Authors: Patrick Wortner, Moritz Schubotz, Corinna Breitinger, Stephan Leible, Bela Gipp

Abstract: Open science has become a synonym for modern, digital and inclusive science. Inclusion does not stop at open access. Inclusion also requires transparency through open datasets and the right and ability to take part in the knowledge creation process. This implies new challenges for digital libraries. Citizens should be able to contribute data in a curatable form to advance science. At the same time… ▽ More Open science has become a synonym for modern, digital and inclusive science. Inclusion does not stop at open access. Inclusion also requires transparency through open datasets and the right and ability to take part in the knowledge creation process. This implies new challenges for digital libraries. Citizens should be able to contribute data in a curatable form to advance science. At the same time, this data should be verifiable and attributable to its owner. Our research project focusses on securing and attributing incoming data streams from sensors. Our contribution is twofold. First, we analyze the promises of open science measurement data and point out how Blockchain technology changed the circumstances for data measurement in science projects using sensors. Second, we present an open hardware project capable of securing the integrity of data directly from the source using cryptographic methods. By using inexpensive modular components and open source software, we lower the barrier for participation in open science projects. We show how time series of measurement values using sensors, e.g., temperature, current, and vibration measurements, can be verifiably and immutably stored. The approach we propose enables time series data to be stored in a tamper-proof manner and securely timestamped on a blockchain to prevent any subsequent modification. △ Less

Submitted 2 April, 2019; v1 submitted 30 March, 2019; originally announced April 2019.

arXiv:1703.09108 [pdf]

Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia

Authors: Joeran Beel, Akiko Aizawa, Corinna Breitinger, Bela Gipp

Abstract: Only few digital libraries and reference managers offer recommender systems, although such systems could assist users facing information overload. In this paper, we introduce Mr. DLib's recommendations-as-a-service, which allows third parties to easily integrate a recommender system into their products. We explain the recommender approaches implemented in Mr. DLib (content-based filtering among ot… ▽ More Only few digital libraries and reference managers offer recommender systems, although such systems could assist users facing information overload. In this paper, we introduce Mr. DLib's recommendations-as-a-service, which allows third parties to easily integrate a recommender system into their products. We explain the recommender approaches implemented in Mr. DLib (content-based filtering among others), and present details on 57 million recommendations, which Mr. DLib delivered to its partner GESIS Sowiport. Finally, we outline our plans for future development, including integration into JabRef, establishing a living lab, and providing personalized recommendations. △ Less

Submitted 20 April, 2017; v1 submitted 27 March, 2017; originally announced March 2017.

Comments: Accepted for publication at the JCDL conference 2017

ACM Class: H.3.7, H.3.3

Showing 1–9 of 9 results for author: Breitinger, C