-
Using General Large Language Models to Classify Mathematical Documents
Authors:
Patrick D. F. Ion,
Stephen M. Watt
Abstract:
In this article we report on an initial exploration to assess the viability of using the general large language models (LLMs), recently made public, to classify mathematical documents. Automated classification would be useful from the applied perspective of improving the navigation of the literature and the more open-ended goal of identifying relations among mathematical results. The Mathematical…
▽ More
In this article we report on an initial exploration to assess the viability of using the general large language models (LLMs), recently made public, to classify mathematical documents. Automated classification would be useful from the applied perspective of improving the navigation of the literature and the more open-ended goal of identifying relations among mathematical results. The Mathematical Subject Classification MSC 2020, from MathSciNet and zbMATH, is widely used and there is a significant corpus of ground truth material in the open literature. We have evaluated the classification of preprint articles from arXiv.org according to MSC 2020. The experiment used only the title and abstract alone -- not the entire paper. Since this was early in the use of chatbots and the development of their APIs, we report here on what was carried out by hand. Of course, the automation of the process will have to follow if it is to be generally useful. We found that in about 60% of our sample the LLM produced a primary classification matching that already reported on arXiv. In about half of those instances, there were additional primary classifications that were not detected. In about 40% of our sample, the LLM suggested a different classification than what was provided. A detailed examination of these cases, however, showed that the LLM-suggested classifications were in most cases better than those provided.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
10 Years Later: The Mathematics Subject Classification and Linked Open Data
Authors:
Susanne Arndt,
Patrick Ion,
Mila Runnwerth,
Moritz Schubotz,
Olaf Teschke
Abstract:
Ten years ago, the Mathematics Subject Classification MSC 2010 was released, and a corresponding machine-readable Linked Open Data collection was published using the Simple Knowledge Organization System (SKOS). Now, the new MSC 2020 is out.
This paper recaps the last ten years of working on machine-readable MSC data and presents the new machine-readable MSC 2020. We describe the processing requi…
▽ More
Ten years ago, the Mathematics Subject Classification MSC 2010 was released, and a corresponding machine-readable Linked Open Data collection was published using the Simple Knowledge Organization System (SKOS). Now, the new MSC 2020 is out.
This paper recaps the last ten years of working on machine-readable MSC data and presents the new machine-readable MSC 2020. We describe the processing required to convert the version of record, as agreed by the editors of zbMATH and Mathematical Reviews, into the Linked Open Data form we call MSC2020-SKOS. The new form includes explicit marking of the changes from 2010 to 2020, some translations of English code descriptions into Chinese, Italian, and Russian, and extra material relating MSC to other mathematics classification efforts. We also outline future potential uses for MSC2020-SKOS in semantic indexing and sketch its embedding in a larger vision of scientific research data.
△ Less
Submitted 2 August, 2021; v1 submitted 29 July, 2021;
originally announced July 2021.
-
Reimplementing the Mathematical Subject Classification (MSC) as a Linked Open Dataset
Authors:
Christoph Lange,
Patrick Ion,
Anastasia Dimou,
Charalampos Bratsas,
Joseph Corneli,
Wolfram Sperber,
Michael Kohlhase,
Ioannis Antoniou
Abstract:
The Mathematics Subject Classification (MSC) is a widely used scheme for classifying documents in mathematics by subject. Its traditional, idiosyncratic conceptualization and representation makes the scheme hard to maintain and requires custom implementations of search, query and annotation support. This limits uptake e.g. in semantic web technologies in general and the creation and exploration of…
▽ More
The Mathematics Subject Classification (MSC) is a widely used scheme for classifying documents in mathematics by subject. Its traditional, idiosyncratic conceptualization and representation makes the scheme hard to maintain and requires custom implementations of search, query and annotation support. This limits uptake e.g. in semantic web technologies in general and the creation and exploration of connections between mathematics and related domains (e.g. science) in particular.
This paper presents the new official implementation of the MSC2010 as a Linked Open Dataset, building on SKOS (Simple Knowledge Organization System). We provide a brief overview of the dataset's structure, its available implementations, and first applications.
△ Less
Submitted 23 April, 2012;
originally announced April 2012.
-
Evolutionary Events in a Mathematical Sciences Research Collaboration Network
Authors:
Jason Cory Brunson,
Steve Fassino,
Antonio McInnes,
Monisha Narayan,
Brianna Richardson,
Christopher Franck,
Patrick Ion,
Reinhard Laubenbacher
Abstract:
This study examines long-term trends and shifting behavior in the collaboration network of mathematics literature, using a subset of data from Mathematical Reviews spanning 1985-2009. Rather than modeling the network cumulatively, this study traces the evolution of the "here and now" using fixed-duration sliding windows. The analysis uses a suite of common network diagnostics, including the distri…
▽ More
This study examines long-term trends and shifting behavior in the collaboration network of mathematics literature, using a subset of data from Mathematical Reviews spanning 1985-2009. Rather than modeling the network cumulatively, this study traces the evolution of the "here and now" using fixed-duration sliding windows. The analysis uses a suite of common network diagnostics, including the distributions of degrees, distances, and clustering, to track network structure. Several random models that call these diagnostics as parameters help tease them apart as factors from the values of others. Some behaviors are consistent over the entire interval, but most diagnostics indicate that the network's structural evolution is dominated by occasional dramatic shifts in otherwise steady trends. These behaviors are not distributed evenly across the network; stark differences in evolution can be observed between two major subnetworks, loosely thought of as "pure" and "applied", which approximately partition the aggregate. The paper characterizes two major events along the mathematics network trajectory and discusses possible explanatory factors.
△ Less
Submitted 4 February, 2015; v1 submitted 22 March, 2012;
originally announced March 2012.