Skip to main content

Showing 1–8 of 8 results for author: Deußer, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.13526  [pdf, ps, other

    cs.CL cs.LG

    Controlled Randomness Improves the Performance of Transformer Models

    Authors: Tobias Deußer, Cong Zhao, Wolfgang Krämer, David Leonhard, Christian Bauckhage, Rafet Sifa

    Abstract: During the pre-training step of natural language models, the main objective is to learn a general representation of the pre-training dataset, usually requiring large amounts of textual data to capture the complexity and diversity of natural language. Contrasting this, in most cases, the size of the data available to solve the specific downstream task is often dwarfed by the aforementioned pre-trai… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted at ICMLA 2023, 10 pages, 2 tables

  2. arXiv:2308.07791  [pdf, other

    cs.CL cs.AI cs.LG

    Informed Named Entity Recognition Decoding for Generative Language Models

    Authors: Tobias Deußer, Lars Hillebrand, Christian Bauckhage, Rafet Sifa

    Abstract: Ever-larger language models with ever-increasing capabilities are by now well-established text processing tools. Alas, information extraction tasks such as named entity recognition are still largely unaffected by this progress as they are primarily based on the previous generation of encoder-only transformer models. Here, we propose a simple yet effective approach, Informed Named Entity Recognitio… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: 12 pages, 2 figures, 4 tables

  3. arXiv:2308.06111  [pdf, other

    cs.CL cs.AI

    Improving Zero-Shot Text Matching for Financial Auditing with Large Language Models

    Authors: Lars Hillebrand, Armin Berger, Tobias Deußer, Tim Dilmaghani, Mohamed Khaled, Bernd Kliem, Rüdiger Loitz, Maren Pielka, David Leonhard, Christian Bauckhage, Rafet Sifa

    Abstract: Auditing financial documents is a very tedious and time-consuming process. As of today, it can already be simplified by employing AI-based solutions to recommend relevant text passages from a report for each legal requirement of rigorous accounting standards. However, these methods need to be fine-tuned regularly, and they require abundant annotated data, which is often lacking in industrial envir… ▽ More

    Submitted 14 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: Accepted at DocEng 2023, 4 pages, 1 figure, 2 tables

  4. arXiv:2305.08711  [pdf, other

    cs.CL cs.AI cs.LG

    sustain.AI: a Recommender System to analyze Sustainability Reports

    Authors: Lars Hillebrand, Maren Pielka, David Leonhard, Tobias Deußer, Tim Dilmaghani, Bernd Kliem, Rüdiger Loitz, Milad Morad, Christian Temath, Thiago Bell, Robin Stenzel, Rafet Sifa

    Abstract: We present sustainAI, an intelligent, context-aware recommender system that assists auditors and financial investors as well as the general public to efficiently analyze companies' sustainability reports. The tool leverages an end-to-end trainable architecture that couples a BERT-based encoding module with a multi-label classification head to match relevant text passages from sustainability report… ▽ More

    Submitted 26 May, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: Accepted at ICAIL 2023, 5 pages, 3 figure, 3 tables

    ACM Class: H.3.3

  5. arXiv:2211.06112  [pdf, other

    cs.CL cs.AI cs.LG

    Towards automating Numerical Consistency Checks in Financial Reports

    Authors: Lars Hillebrand, Tobias Deußer, Tim Dilmaghani, Bernd Kliem, Rüdiger Loitz, Christian Bauckhage, Rafet Sifa

    Abstract: We introduce KPI-Check, a novel system that automatically identifies and cross-checks semantically equivalent key performance indicators (KPIs), e.g. "revenue" or "total costs", in real-world German financial reports. It combines a financial named entity and relation extraction module with a BERT-based filtering and text pair classification component to extract KPIs from unstructured sentences bef… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: Accepted at BigData 2022, 10 pages, 3 figure, 5 tables

  6. arXiv:2210.10434  [pdf, ps, other

    cs.CL

    A Linguistic Investigation of Machine Learning based Contradiction Detection Models: An Empirical Analysis and Future Perspectives

    Authors: Maren Pielka, Felix Rode, Lisa Pucknat, Tobias Deußer, Rafet Sifa

    Abstract: We analyze two Natural Language Inference data sets with respect to their linguistic features. The goal is to identify those syntactic and semantic properties that are particularly hard to comprehend for a machine learning model. To this end, we also investigate the differences between a crowd-sourced, machine-translated data set (SNLI) and a collection of text pairs from internet sources. Our mai… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: Accepted at ICMLA 2022, 5 pages, 2 tables

  7. KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents

    Authors: Tobias Deußer, Syed Musharraf Ali, Lars Hillebrand, Desiana Nurchalifah, Basil Jacob, Christian Bauckhage, Rafet Sifa

    Abstract: We introduce KPI-EDGAR, a novel dataset for Joint Named Entity Recognition and Relation Extraction building on financial reports uploaded to the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, where the main objective is to extract Key Performance Indicators (KPIs) from financial documents and link them to their numerical values and other attributes. We further provide four acco… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: Accepted at ICMLA 2022, 6 pages, 5 tables

  8. arXiv:2208.02140  [pdf, other

    cs.CL cs.AI cs.LG

    KPI-BERT: A Joint Named Entity Recognition and Relation Extraction Model for Financial Reports

    Authors: Lars Hillebrand, Tobias Deußer, Tim Dilmaghani, Bernd Kliem, Rüdiger Loitz, Christian Bauckhage, Rafet Sifa

    Abstract: We present KPI-BERT, a system which employs novel methods of named entity recognition (NER) and relation extraction (RE) to extract and link key performance indicators (KPIs), e.g. "revenue" or "interest expenses", of companies from real-world German financial documents. Specifically, we introduce an end-to-end trainable architecture that is based on Bidirectional Encoder Representations from Tran… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: Accepted at ICPR 2022, 8 pages, 1 figure, 6 tables