Skip to main content

Showing 1–17 of 17 results for author: Gertz, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.16295  [pdf, other

    cs.CL

    LexDrafter: Terminology Drafting for Legislative Documents using Retrieval Augmented Generation

    Authors: Ashish Chouhan, Michael Gertz

    Abstract: With the increase in legislative documents at the EU, the number of new terms and their definitions is increasing as well. As per the Joint Practical Guide of the European Parliament, the Council and the Commission, terms used in legal documents shall be consistent, and identical concepts shall be expressed without departing from their meaning in ordinary, legal, or technical language. Thus, while… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-COLING 2024

  2. arXiv:2310.08140  [pdf, other

    cs.SI

    CODY: A graph-based framework for the analysis of COnversation DYnamics in online social networks

    Authors: John Ziegler, Fabian Kneissl, Michael Gertz

    Abstract: Conversations are an integral part of online social media, and gaining insights into these conversations is of significant value for many commercial as well as academic use cases. From a computational perspective, however, analyzing conversation data is complex, and numerous aspects must be considered. Next to the structure of conversations, the discussed content - as well as their dynamics - have… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  3. arXiv:2305.13309  [pdf, other

    cs.CL

    Evaluating Factual Consistency of Texts with Semantic Role Labeling

    Authors: **g Fan, Dennis Aumiller, Michael Gertz

    Abstract: Automated evaluation of text generation systems has recently seen increasing attention, particularly checking whether generated text stays truthful to input sources. Existing methods frequently rely on an evaluation using task-specific language models, which in turn allows for little interpretability of generated scores. We introduce SRLScore, a reference-free evaluation metric designed with text… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted at *SEM 2023

  4. arXiv:2305.08853  [pdf, other

    cs.CL

    CQE: A Comprehensive Quantity Extractor

    Authors: Satya Almasian, Vivian Kazakova, Philip Göldner, Michael Gertz

    Abstract: Quantities are essential in documents to describe factual information. They are ubiquitous in application domains such as finance, business, medicine, and science in general. Compared to other information extraction approaches, interestingly only a few works exist that describe methods for a proper extraction and representation of quantities in text. In this paper, we present such a comprehensive… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: 8 pages of content, 3 page of appendix

    ACM Class: I.7; I.7.1; I.7.5

  5. arXiv:2301.07095  [pdf, other

    cs.CL

    On the State of German (Abstractive) Text Summarization

    Authors: Dennis Aumiller, **g Fan, Michael Gertz

    Abstract: With recent advancements in the area of Natural Language Processing, the focus is slowly shifting from a purely English-centric view towards more language-specific solutions, including German. Especially practical for businesses to analyze their growing amount of textual data are text summarization systems, which transform long input documents into compressed and more digestible summary texts. In… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

    Comments: Accepted at the 20th Conference on Database Systems for Business, Technology and Web (BTW'23)

  6. arXiv:2301.01764  [pdf, other

    cs.CL

    UniHD at TSAR-2022 Shared Task: Is Compute All We Need for Lexical Simplification?

    Authors: Dennis Aumiller, Michael Gertz

    Abstract: Previous state-of-the-art models for lexical simplification consist of complex pipelines with several components, each of which requires deep technical knowledge and fine-tuned interaction to achieve its full potential. As an alternative, we describe a frustratingly simple pipeline based on prompted GPT-3 responses, beating competing approaches by a wide margin in settings with few training instan… ▽ More

    Submitted 5 January, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

    Comments: Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022) at EMNLP 2022

  7. arXiv:2210.13448  [pdf, other

    cs.CL

    EUR-Lex-Sum: A Multi- and Cross-lingual Dataset for Long-form Summarization in the Legal Domain

    Authors: Dennis Aumiller, Ashish Chouhan, Michael Gertz

    Abstract: Existing summarization datasets come with two main drawbacks: (1) They tend to focus on overly exposed domains, such as news articles or wiki-like texts, and (2) are primarily monolingual, with few multilingual datasets. In this work, we propose a novel dataset, called EUR-Lex-Sum, based on manually curated document summaries of legal acts from the European Union law platform (EUR-Lex). Documents… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  8. arXiv:2201.07198  [pdf, other

    cs.CL

    Klexikon: A German Dataset for Joint Summarization and Simplification

    Authors: Dennis Aumiller, Michael Gertz

    Abstract: Traditionally, Text Simplification is treated as a monolingual translation task where sentences between source texts and their simplified counterparts are aligned for training. However, especially for longer input documents, summarizing the text (or drop** less relevant content altogether) plays an important role in the simplification process, which is currently not reflected in existing dataset… ▽ More

    Submitted 28 July, 2022; v1 submitted 18 January, 2022; originally announced January 2022.

    Comments: Code and data are available on Github: https://github.com/dennlinger/klexikon

  9. arXiv:2109.14927   

    cs.CL

    BERT got a Date: Introducing Transformers to Temporal Tagging

    Authors: Satya Almasian, Dennis Aumiller, Michael Gertz

    Abstract: Temporal expressions in text play a significant role in language understanding and correctly identifying them is fundamental to various retrieval and natural language processing systems. Previous works have slowly shifted from rule-based to neural architectures, capable of tagging expressions with higher accuracy. However, neural models can not yet distinguish between different expression types at… ▽ More

    Submitted 24 January, 2022; v1 submitted 30 September, 2021; originally announced September 2021.

    Comments: unreliable evaluation results for Seq2seq models

  10. arXiv:2103.14956  [pdf, other

    cs.HC

    Dark Patterns in the Interaction with Cookie Banners

    Authors: Philip Hausner, Michael Gertz

    Abstract: Dark patterns are interface designs that nudge users towards behavior that is against their best interests. Since humans are often not even aware that they are influenced by these malicious patterns, research has to identify ways to protect web users against them. One approach to this is the automatic detection of dark patterns which enables the development of tools that are able to protect users… ▽ More

    Submitted 27 March, 2021; originally announced March 2021.

    Comments: 5 pages, 3 figures, Position Paper at the Workshop "What Can CHI Do About Dark Patterns?" at the CHI Conference on Human Factors in Computing Systems (CHI 2021), May 8-13, 2021, Yokohama, Japan

  11. Structural Text Segmentation of Legal Documents

    Authors: Dennis Aumiller, Satya Almasian, Sebastian Lackner, Michael Gertz

    Abstract: The growing complexity of legal cases has lead to an increasing interest in legal information retrieval systems that can effectively satisfy user-specific information needs. However, such downstream systems typically require documents to be properly formatted and segmented, which is often done with relatively simple pre-processing steps, disregarding topical coherence of segments. Systems generall… ▽ More

    Submitted 17 May, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

  12. arXiv:1907.07381  [pdf, other

    cs.SI cs.LG cs.NE

    DeepNC: Deep Generative Network Completion

    Authors: Cong Tran, Won-Yong Shin, Andreas Spitz, Michael Gertz

    Abstract: Most network data are collected from partially observable networks with both missing nodes and missing edges, for example, due to limited resources and privacy settings specified by users on social media. Thus, it stands to reason that inferring the missing parts of the networks by performing network completion should precede downstream applications. However, despite this need, the recovery of mis… ▽ More

    Submitted 20 October, 2020; v1 submitted 17 July, 2019; originally announced July 2019.

    Comments: 16 pages, 10 figures, 5 tables; to appear in the IEEE Transactions on Pattern Analysis and Machine Intelligence (Please cite our journal version that will appear in an upcoming issue.)

  13. TopExNet: Entity-Centric Network Topic Exploration in News Streams

    Authors: Andreas Spitz, Satya Almasian, Michael Gertz

    Abstract: The recent introduction of entity-centric implicit network representations of unstructured text offers novel ways for exploring entity relations in document collections and streams efficiently and interactively. Here, we present TopExNet as a tool for exploring entity-centric network topics in streams of news articles. The application is available as a web service at https://topexnet.ifi.uni-heide… ▽ More

    Submitted 31 May, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: Published in Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, February 11-15, 2019

  14. Retrieving Multi-Entity Associations: An Evaluation of Combination Modes for Word Embeddings

    Authors: Gloria Feher, Andreas Spitz, Michael Gertz

    Abstract: Word embeddings have gained significant attention as learnable representations of semantic relations between words, and have been shown to improve upon the results of traditional word representations. However, little effort has been devoted to using embeddings for the retrieval of entity associations beyond pairwise relations. In this paper, we use popular embedding methods to train vector represe… ▽ More

    Submitted 22 May, 2019; originally announced May 2019.

    Comments: 4 pages; Accepted at SIGIR'19

    ACM Class: H.3.3

  15. Word Embeddings for Entity-annotated Texts

    Authors: Satya Almasian, Andreas Spitz, Michael Gertz

    Abstract: Learned vector representations of words are useful tools for many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annota… ▽ More

    Submitted 12 February, 2020; v1 submitted 6 February, 2019; originally announced February 2019.

    Comments: This paper is accepted in 41st European Conference on Information Retrieval

  16. arXiv:1708.03569  [pdf, other

    cs.IR cs.CL

    Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding

    Authors: Erich Schubert, Andreas Spitz, Michael Weiler, Johanna Geiß, Michael Gertz

    Abstract: Many word clouds provide no semantics to the word placement, but use a random layout optimized solely for aesthetic purposes. We propose a novel approach to model word significance and word affinity within a document, and in comparison to a large background corpus. We demonstrate its usefulness for generating more meaningful word clouds as a visual summary of a given document. We then select keywo… ▽ More

    Submitted 11 August, 2017; originally announced August 2017.

  17. arXiv:cs/0106051  [pdf, ps, other

    cs.MS

    Users Guide for SnadiOpt: A Package Adding Automatic Differentiation to Snopt

    Authors: E. Michael Gertz, Philip E. Gill, Julia Muetherig

    Abstract: SnadiOpt is a package that supports the use of the automatic differentiation package ADIFOR with the optimization package Snopt. Snopt is a general-purpose system for solving optimization problems with many variables and constraints. It minimizes a linear or nonlinear function subject to bounds on the variables and sparse linear or nonlinear constraints. It is suitable for large-scale linear and… ▽ More

    Submitted 25 June, 2001; originally announced June 2001.

    Comments: pages i-iv, 1-23

    Report number: ANL/MCS-TM-245 ACM Class: G.1.6; G.1.4