Skip to main content

Showing 1–20 of 20 results for author: Greiner-Petter, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.00344  [pdf, other

    cs.CL cs.AI cs.IR

    Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange

    Authors: Ankit Satpute, Noah Giessing, Andre Greiner-Petter, Moritz Schubotz, Olaf Teschke, Akiko Aizawa, Bela Gipp

    Abstract: Large Language Models (LLMs) have demonstrated exceptional capabilities in various natural language tasks, often achieving performances that surpass those of humans. Despite these advancements, the domain of mathematics presents a distinctive challenge, primarily due to its specialized structure and the precision it demands. In this study, we adopted a two-step approach for investigating the profi… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: Accepted for publication at the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) July 14--18, 2024, Washington D.C.,USA

  2. arXiv:2403.07910  [pdf, other

    cs.CY cs.CL

    MAGPIE: Multi-Task Media-Bias Analysis Generalization for Pre-Trained Identification of Expressions

    Authors: Tomáš Horych, Martin Wessel, Jan Philip Wahle, Terry Ruas, Jerome Waßmuth, André Greiner-Petter, Akiko Aizawa, Bela Gipp, Timo Spinde

    Abstract: Media bias detection poses a complex, multifaceted problem traditionally tackled using single-task models and small in-domain datasets, consequently lacking generalizability. To address this, we introduce MAGPIE, the first large-scale multi-task pre-training approach explicitly tailored for media bias detection. To enable pre-training at scale, we present Large Bias Mixture (LBM), a compilation of… ▽ More

    Submitted 15 March, 2024; v1 submitted 26 February, 2024; originally announced March 2024.

  3. arXiv:2402.17311  [pdf, other

    cs.CL

    SKT5SciSumm -- A Hybrid Generative Approach for Multi-Document Scientific Summarization

    Authors: Huy Quoc To, Hung-Nghiep Tran, Andr'e Greiner-Petter, Felix Beierle, Akiko Aizawa

    Abstract: Summarization for scientific text has shown significant benefits both for the research community and human society. Given the fact that the nature of scientific text is distinctive and the input of the multi-document summarization task is substantially long, the task requires sufficient embedding generation and text truncation without losing important information. To tackle these issues, in this p… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  4. Taxonomy of Mathematical Plagiarism

    Authors: Ankit Satpute, Andre Greiner-Petter, Noah Gießing, Isabel Beckenbach, Moritz Schubotz, Olaf Teschke, Akiko Aizawa, Bela Gipp

    Abstract: Plagiarism is a pressing concern, even more so with the availability of large language models. Existing plagiarism detection systems reliably find copied and moderately reworded text but fail for idea plagiarism, especially in mathematical science, which heavily uses formal mathematical notation. We make two contributions. First, we establish a taxonomy of mathematical content reuse by annotating… ▽ More

    Submitted 31 May, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: 46th European Conference on Information Retrieval (ECIR)

  5. arXiv:2305.16433  [pdf, other

    cs.CL cs.SC stat.AP

    Neural Machine Translation for Mathematical Formulae

    Authors: Felix Petersen, Moritz Schubotz, Andre Greiner-Petter, Bela Gipp

    Abstract: We tackle the problem of neural machine translation of mathematical formulae between ambiguous presentation languages and unambiguous content languages. Compared to neural machine translation on natural language, mathematical formulae have a much smaller vocabulary and much longer sequences of symbols, while their translation requires extreme precision to satisfy mathematical information needs. In… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: Published at ACL 2023

  6. arXiv:2305.13193  [pdf, other

    cs.IR

    TEIMMA: The First Content Reuse Annotator for Text, Images, and Math

    Authors: Ankit Satpute, André Greiner-Petter, Moritz Schubotz, Norman Meuschke, Akiko Aizawa, Olaf Teschke, Bela Gipp

    Abstract: This demo paper presents the first tool to annotate the reuse of text, images, and mathematical formulae in a document pair -- TEIMMA. Annotating content reuse is particularly useful to develop plagiarism detection algorithms. Real-world content reuse is often obfuscated, which makes it challenging to identify such cases. TEIMMA allows entering the obfuscation type to enable novel classifications… ▽ More

    Submitted 13 June, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  7. Methods and Tools to Advance the Retrieval of Mathematical Knowledge from Digital Libraries for Search-, Recommendation-, and Assistance-Systems

    Authors: Bela Gipp, André Greiner-Petter, Moritz Schubotz, Norman Meuschke

    Abstract: This project investigated new approaches and technologies to enhance the accessibility of mathematical content and its semantic information for a broad range of information retrieval applications. To achieve this goal, the project addressed three main research challenges: (1) syntactic analysis of mathematical expressions, (2) semantic enrichment of mathematical expressions, and (3) evaluation usi… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: The final report for the DFG-Project MathIR - July 1st, 2018 - December 31st, 2022

    Report number: GI 1259-1 ACM Class: H.3.0

  8. Collaborative and AI-aided Exam Question Generation using Wikidata in Education

    Authors: Philipp Scharpf, Moritz Schubotz, Andreas Spitz, Andre Greiner-Petter, Bela Gipp

    Abstract: Since the COVID-19 outbreak, the use of digital learning or education platforms has significantly increased. Teachers now digitally distribute homework and provide exercise questions. In both cases, teachers need to continuously develop novel and individual questions. This process can be very time-consuming and should be facilitated and accelerated both through exchange with other teachers and by… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    MSC Class: 68Uxx ACM Class: H.4

  9. Caching and Reproducibility: Making Data Science experiments faster and FAIRer

    Authors: Moritz Schubotz, Ankit Satpute, Andre Greiner-Petter, Akiko Aizawa, Bela Gipp

    Abstract: Small to medium-scale data science experiments often rely on research software developed ad-hoc by individual scientists or small teams. Often there is no time to make the research software fast, reusable, and open access. The consequence is twofold. First, subsequent researchers must spend significant work hours building upon the proposed hypotheses or experimental framework. In the worst case, o… ▽ More

    Submitted 9 November, 2022; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: 8 pages, 1 table

    Journal ref: Frontiers in Research Metrics and Analytics, volume 7, 2022

  10. Comparative Verification of the Digital Library of Mathematical Functions and Computer Algebra Systems

    Authors: André Greiner-Petter, Howard S. Cohl, Abdou Youssef, Moritz Schubotz, Avi Trost, Rajen Dey, Akiko Aizawa, Bela Gipp

    Abstract: Digital mathematical libraries assemble the knowledge of years of mathematical research. Numerous disciplines (e.g., physics, engineering, pure and applied mathematics) rely heavily on compendia gathered findings. Likewise, modern research applications rely more and more on computational solutions, which are often calculated and verified by computer algebra systems. Hence, the correctness, accurac… ▽ More

    Submitted 31 March, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

    Journal ref: In: TACAS, Apr. 2022, pp. 87-105

  11. Automated Symbolic and Numerical Testing of DLMF Formulae using Computer Algebra Systems

    Authors: Howard S. Cohl, André Greiner-Petter, Moritz Schubotz

    Abstract: We have developed an automated procedure for symbolic and numerical testing of formulae extracted from the NIST Digital Library of Mathematical Functions (DLMF). For the NIST Digital Repository of Mathematical Formulae, we have developed conversion tools from semantic LaTeX to the Computer Algebra System (CAS) Maple which relies on Youssef's part-of-math tagger. We convert a test data subset of 4,… ▽ More

    Submitted 18 September, 2021; originally announced September 2021.

    Comments: Appeared in the Proceedings of the 11th International Conference on Intelligent Computer Mathematics (CICM) 2018

  12. Semantic Preserving Bijective Map**s of Mathematical Formulae between Document Preparation Systems and Computer Algebra Systems

    Authors: Howard S. Cohl, Moritz Schubotz, Abdou Youssef, André Greiner-Petter, Jürgen Gerhard, Bonita V. Saunders, Marjorie A. ~McClain

    Abstract: Document preparation systems like LaTeX offer the ability to render mathematical expressions as one would write these on paper. Using LaTeX, LaTeXML, and tools generated for use in the National Institute of Standards (NIST) Digital Library of Mathematical Functions, semantically enhanced mathematical LaTeX markup (semantic LaTeX) is achieved by using a semantic macro set. Computer algebra systems… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

    Comments: Proceedings of the 10th International Conference on Intelligent Computer Mathematics (CICM)

  13. MathTools: An Open API for Convenient MathML Handling

    Authors: André Greiner-Petter, Moritz Schubotz, Howard S. Cohl, Bela Gipp

    Abstract: Mathematical formulae carry complex and essential semantic information in a variety of formats. Accessing this information with different systems requires a standardized machine-readable format that is capable of encoding presentational and semantic information. Even though MathML is an official recommendation by W3C and an ISO standard for representing mathematical expressions, we could identify… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

    Comments: Published in Proceedings of the International Conference on Intelligent Computer Mathematics (CICM) 2018

  14. arXiv:2012.02413  [pdf

    cs.DL

    ARQMath Lab: An Incubator for Semantic Formula Search in zbMATH Open?

    Authors: Philipp Scharpf, Moritz Schubotz, Andre Greiner-Petter, Malte Ostendorff, Olaf Teschke, Bela Gipp

    Abstract: The zbMATH database contains more than 4 million bibliographic entries. We aim to provide easy access to these entries. Therefore, we maintain different index structures, including a formula index. To optimize the findability of the entries in our database, we continuously investigate new approaches to satisfy the information needs of our users. We believe that the findings from the ARQMath evalua… ▽ More

    Submitted 10 December, 2020; v1 submitted 4 December, 2020; originally announced December 2020.

    Comments: in Working Notes of {CLEF} 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, September 22-25, 2020 http://ceur-ws.org/Vol-2696/paper_200.pdf

  15. arXiv:2011.14616  [pdf, other

    cs.IR cs.MS

    Automatic Mathematical Information Retrieval to Perform Translations up to Computer Algebra Systems

    Authors: André Greiner-Petter

    Abstract: In mathematics, LaTeX is the de facto standard to prepare documents, e.g., scientific publications. While some formulae are still developed using pen and paper, more complicated mathematical expressions used more and more often with computer algebra systems. Mathematical expressions are often manually transcribed to computer algebra systems. The goal of my doctoral thesis is to improve the efficie… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

    Comments: Doctoral Consortium Paper at the Joint Conference on Digital Libraries (JCDL), Fort Worth, TX, USA, June 03-07, 2018

    Journal ref: Bulletin of IEEE Technical Committee on Digital Libraries 15.1 (Jan. 2019)

  16. Mathematical Formulae in Wikimedia Projects 2020

    Authors: Moritz Schubotz, André Greiner-Petter, Norman Meuschke, Olaf Teschke, Bela Gipp

    Abstract: This poster summarizes our contributions to Wikimedia's processing pipeline for mathematical formulae. We describe how we have supported the transition from rendering formulae as course-grained PNG images in 2001 to providing modern semantically enriched language-independent MathML formulae in 2020. Additionally, we describe our plans to improve the accessibility and discoverability of mathematica… ▽ More

    Submitted 6 May, 2020; v1 submitted 20 March, 2020; originally announced March 2020.

    Comments: Submitted to JCDL 2020: Proceedings of the ACM/ IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20), August 1-5, 2020, Virtual Event, China

  17. Discovering Mathematical Objects of Interest -- A Study of Mathematical Notations

    Authors: Andre Greiner-Petter, Moritz Schubotz, Fabian Mueller, Corinna Breitinger, Howard S. Cohl, Akiko Aizawa, Bela Gipp

    Abstract: Mathematical notation, i.e., the writing system used to communicate concepts in mathematics, encodes valuable information for a variety of information search and retrieval systems. Yet, mathematical notations remain mostly unutilized by today's systems. In this paper, we present the first in-depth study on the distributions of mathematical notation in two large scientific corpora: the open access… ▽ More

    Submitted 22 June, 2021; v1 submitted 7 February, 2020; originally announced February 2020.

    Comments: Proceedings of The Web Conference 2020 (WWW'20), April 20--24, 2020, Taipei, Taiwan

  18. Semantic Preserving Bijective Map**s for Expressions involving Special Functions in Computer Algebra Systems and Document Preparation Systems

    Authors: Andre Greiner-Petter, Moritz Schubotz, Howard S. Cohl, Bela Gipp

    Abstract: Purpose: Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write and Computer Algebra Systems (CAS) to calculate mathematical expressions. Usually, they translate the expressions manually between DPS and CAS. This process is time-consuming and error-prone. Our goal is to automate this translation. This paper uses Maple and Mathematica… ▽ More

    Submitted 27 June, 2019; originally announced June 2019.

    Comments: This work was supported by the German Research Foundation (DFG, grant GI-1259-1)

  19. arXiv:1905.08359  [pdf, other

    cs.DL cs.AI cs.IR

    Why Machines Cannot Learn Mathematics, Yet

    Authors: André Greiner-Petter, Terry Ruas, Moritz Schubotz, Akiko Aizawa, William Grosky, Bela Gipp

    Abstract: Nowadays, Machine Learning (ML) is seen as the universal solution to improve the effectiveness of information retrieval (IR) methods. However, while mathematics is a precise and accurate science, it is usually expressed by less accurate and imprecise descriptions, contributing to the relative dearth of machine learning applications for IR in this domain. Generally, mathematical documents communica… ▽ More

    Submitted 20 May, 2019; originally announced May 2019.

    Comments: Submitted to 4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries colocated at the 42nd International ACM SIGIR Conference

    Journal ref: 2019 http://ceur-ws.org/Vol-2414/paper14.pdf

  20. Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context

    Authors: Moritz Schubotz, Andre Greiner-Petter, Philipp Scharpf, Norman Meuschke, Howard Cohl, Bela Gipp

    Abstract: Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial to communicate information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable f… ▽ More

    Submitted 13 April, 2018; originally announced April 2018.

    Comments: 10 pages, 4 figures

    Journal ref: Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Jun. 2018, Fort Worth, USA