Skip to main content

Showing 1–50 of 58 results for author: Matthes, F

.
  1. arXiv:2407.05925  [pdf, other

    cs.CL cs.AI

    Towards Optimizing and Evaluating a Retrieval Augmented QA Chatbot using LLMs with Human in the Loop

    Authors: Anum Afzal, Alexander Kowsik, Rajna Fani, Florian Matthes

    Abstract: Large Language Models have found application in various mundane and repetitive tasks including Human Resource (HR) support. We worked with the domain experts of SAP SE to develop an HR support chatbot as an efficient and effective tool for addressing employee inquiries. We inserted a human-in-the-loop in various parts of the development cycles such as dataset collection, prompt optimization, and e… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  2. arXiv:2407.02027  [pdf, other

    cs.CY

    Privacy Risks of General-Purpose AI Systems: A Foundation for Investigating Practitioner Perspectives

    Authors: Stephen Meisenbacher, Alexandra Klymenko, Patrick Gage Kelley, Sai Teja Peddinti, Kurt Thomas, Florian Matthes

    Abstract: The rise of powerful AI models, more formally $\textit{General-Purpose AI Systems}$ (GPAIS), has led to impressive leaps in performance across a wide range of tasks. At the same time, researchers and practitioners alike have raised a number of privacy concerns, resulting in a wealth of literature covering various privacy risks and vulnerabilities of AI models. Works surveying such risks provide di… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 5 pages. Accepted to SUPA@SOUPS'24

  3. arXiv:2407.00997  [pdf, other

    cs.CL

    Engineering Conversational Search Systems: A Review of Applications, Architectures, and Functional Components

    Authors: Phillip Schneider, Wessel Poelman, Michael Rovatsos, Florian Matthes

    Abstract: Conversational search systems enable information retrieval via natural language interactions, with the goal of maximizing users' information gain over multiple dialogue turns. The increasing prevalence of conversational interfaces adopting this search paradigm challenges traditional information retrieval approaches, stressing the importance of better understanding the engineering process of develo… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2024 NLP4ConvAI Workshop

  4. arXiv:2407.00638  [pdf, other

    cs.CL

    A Collocation-based Method for Addressing Challenges in Word-level Metric Differential Privacy

    Authors: Stephen Meisenbacher, Maulik Chevli, Florian Matthes

    Abstract: Applications of Differential Privacy (DP) in NLP must distinguish between the syntactic level on which a proposed mechanism operates, often taking the form of $\textit{word-level}$ or $\textit{document-level}$ privatization. Recently, several word-level $\textit{Metric}$ Differential Privacy approaches have been proposed, which rely on this generalized DP notion for operating in word embedding spa… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 13 pages, 2 figures, 9 tables. Accepted to PrivateNLP 2024

  5. arXiv:2407.00637  [pdf, other

    cs.CL

    DP-MLM: Differentially Private Text Rewriting Using Masked Language Models

    Authors: Stephen Meisenbacher, Maulik Chevli, Juraj Vladika, Florian Matthes

    Abstract: The task of text privatization using Differential Privacy has recently taken the form of $\textit{text rewriting}$, in which an input text is obfuscated via the use of generative (large) language models. While these methods have shown promising results in the ability to preserve privacy, these methods rely on autoregressive models which lack a mechanism to contextualize the private rewriting proce… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 15 pages, 2 figures, 8 tables. Accepted to ACL 2024 (Findings)

  6. arXiv:2406.15294  [pdf, other

    cs.CL

    NLP-KG: A System for Exploratory Search of Scientific Literature in Natural Language Processing

    Authors: Tim Schopf, Florian Matthes

    Abstract: Scientific literature searches are often exploratory, whereby users are not yet familiar with a particular field or concept but are interested in learning more about it. However, existing systems for scientific literature search are typically tailored to keyword-based lookup searches, limiting the possibilities for exploration. We propose NLP-KG, a feature-rich system designed to support the explo… ▽ More

    Submitted 4 July, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 System Demonstrations

  7. arXiv:2406.06809  [pdf, other

    cs.CL

    AGB-DE: A Corpus for the Automated Legal Assessment of Clauses in German Consumer Contracts

    Authors: Daniel Braun, Florian Matthes

    Abstract: Legal tasks and datasets are often used as benchmarks for the capabilities of language models. However, openly available annotated datasets are rare. In this paper, we introduce AGB-DE, a corpus of 3,764 clauses from German consumer contracts that have been annotated and legally assessed by legal experts. Together with the data, we present a first baseline for the task of detecting potentially voi… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  8. arXiv:2406.05845  [pdf, other

    cs.CL

    MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering

    Authors: Juraj Vladika, Phillip Schneider, Florian Matthes

    Abstract: In recent years, Large Language Models (LLMs) have demonstrated an impressive ability to encode knowledge during pre-training on large text corpora. They can leverage this knowledge for downstream tasks like question answering (QA), even in complex areas involving health topics. Considering their high potential for facilitating clinical work in the future, understanding the quality of encoded medi… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 (Findings)

  9. arXiv:2405.19831  [pdf, other

    cs.CL

    Just Rewrite It Again: A Post-Processing Method for Enhanced Semantic Similarity and Privacy Preservation of Differentially Private Rewritten Text

    Authors: Stephen Meisenbacher, Florian Matthes

    Abstract: The study of Differential Privacy (DP) in Natural Language Processing often views the task of text privatization as a $\textit{rewriting}$ task, in which sensitive input texts are rewritten to hide explicit or implicit private information. In order to evaluate the privacy-preserving capabilities of a DP text rewriting mechanism, $\textit{empirical privacy}$ tests are frequently employed. In these… ▽ More

    Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: 10 pages, 2 figures, 2 tables. Accepted to ARES 2024 (IWAPS)

  10. arXiv:2405.01678  [pdf, other

    cs.CL

    1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy

    Authors: Stephen Meisenbacher, Maulik Chevli, Florian Matthes

    Abstract: The study of privacy-preserving Natural Language Processing (NLP) has gained rising attention in recent years. One promising avenue studies the integration of Differential Privacy in NLP, which has brought about innovative methods in a variety of application settings. Of particular note are $\textit{word-level Metric Local Differential Privacy (MLDP)}$ mechanisms, which work to obfuscate potential… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 12 pages, 7 figures, 7 tables, 10th ACM International Workshop on Security and Privacy Analytics (IWSPA 2024)

  11. arXiv:2404.18759  [pdf

    cs.CL cs.CY

    Towards A Structured Overview of Use Cases for Natural Language Processing in the Legal Domain: A German Perspective

    Authors: Juraj Vladika, Stephen Meisenbacher, Martina Preis, Alexandra Klymenko, Florian Matthes

    Abstract: In recent years, the field of Legal Tech has risen in prevalence, as the Natural Language Processing (NLP) and legal disciplines have combined forces to digitalize legal processes. Amidst the steady flow of research solutions stemming from the NLP domain, the study of use cases has fallen behind, leading to a number of innovative technical methods without a place in practice. In this work, we aim… ▽ More

    Submitted 2 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: 10 pages, 6 tables, 30th Americas Conference on Information Systems (AMCIS 2024)

  12. arXiv:2404.08359  [pdf, other

    cs.CL cs.AI cs.IR

    Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval

    Authors: Juraj Vladika, Florian Matthes

    Abstract: In today's digital world, seeking answers to health questions on the Internet is a common practice. However, existing question answering (QA) systems often rely on using pre-selected and annotated evidence documents, thus making them inadequate for addressing novel questions. Our study focuses on the open-domain QA setting, where the key challenge is to first uncover relevant evidence in large kno… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024 (Findings)

  13. arXiv:2404.03324  [pdf, other

    cs.CL

    A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off

    Authors: Stephen Meisenbacher, Nihildev Nandakumar, Alexandra Klymenko, Florian Matthes

    Abstract: The application of Differential Privacy to Natural Language Processing techniques has emerged in relevance in recent years, with an increasing number of studies published in established NLP outlets. In particular, the adaptation of Differential Privacy for use in NLP tasks has first focused on the $\textit{word-level}$, where calibrated noise is added to word embedding vectors to achieve "noisy" r… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted to LREC-COLING 2024

  14. arXiv:2404.01443  [pdf

    cs.CL

    Enterprise Use Cases Combining Knowledge Graphs and Natural Language Processing

    Authors: Phillip Schneider, Tim Schopf, Juraj Vladika, Florian Matthes

    Abstract: Knowledge management is a critical challenge for enterprises in today's digital world, as the volume and complexity of data being generated and collected continue to grow incessantly. Knowledge graphs (KG) emerged as a promising solution to this problem by providing a flexible, scalable, and semantically rich way to organize and make sense of data. This paper builds upon a recent survey of the res… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 16 pages

  15. arXiv:2402.02844  [pdf, other

    cs.CL cs.AI cs.IR

    Comparing Knowledge Sources for Open-Domain Scientific Claim Verification

    Authors: Juraj Vladika, Florian Matthes

    Abstract: The increasing rate at which scientific knowledge is discovered and health claims shared online has highlighted the importance of develo** efficient fact-checking systems for scientific claims. The usual setting for this task in the literature assumes that the documents containing the evidence for claims are already provided and annotated or contained in a limited corpus. This renders the system… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted to EACL 2024

  16. arXiv:2402.01495  [pdf, other

    cs.CL

    A Comparative Analysis of Conversational Large Language Models in Knowledge-Based Text Generation

    Authors: Phillip Schneider, Manuel Klettner, Elena Simperl, Florian Matthes

    Abstract: Generating natural language text from graph-structured data is essential for conversational information seeking. Semantic triples derived from knowledge graphs can serve as a valuable source for grounding responses from conversational agents by providing a factual basis for the information they communicate. This is especially relevant in the context of large language models, which offer great pote… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted to EACL 2024

  17. arXiv:2401.09488  [pdf, other

    cs.CR cs.SE

    A Universal System for OpenID Connect Sign-ins with Verifiable Credentials and Cross-Device Flow

    Authors: Felix Hoops, Florian Matthes

    Abstract: Self-Sovereign Identity (SSI), as a new and promising identity management paradigm, needs mechanisms that can ease a gradual transition of existing services and developers towards it. Systems that bridge the gap between SSI and established identity and access management have been proposed but still lack adoption. We argue that they are all some combination of too complex, locked into specific ecos… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Submitted to IEEE ICBC 24 for review

  18. arXiv:2401.07992  [pdf, other

    cs.CR

    Playing the MEV Game on a First-Come-First-Served Blockchain

    Authors: Burak Öz, Jonas Gebele, Parshant Singh, Filip Rezabek, Florian Matthes

    Abstract: Maximal Extractable Value (MEV) searching has gained prominence on the Ethereum blockchain since the surge in Decentralized Finance activities. In Ethereum, MEV extraction primarily hinges on fee payments to block proposers. However, in First-Come-First-Served (FCFS) blockchain networks, the focus shifts to latency optimizations, akin to High-Frequency Trading in Traditional Finance. This paper il… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: 13 pages, 5 figures

  19. arXiv:2401.01711  [pdf, ps, other

    cs.CL cs.IR

    Evaluating Large Language Models in Semantic Parsing for Conversational Question Answering over Knowledge Graphs

    Authors: Phillip Schneider, Manuel Klettner, Kristiina Jokinen, Elena Simperl, Florian Matthes

    Abstract: Conversational question answering systems often rely on semantic parsing to enable interactive information retrieval, which involves the generation of structured database queries from a natural language input. For information-seeking conversations about facts stored within a knowledge graph, dialogue utterances are transformed into graph queries in a process that is called knowledge-based conversa… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: Accepted to ICAART 2024

  20. arXiv:2312.13881  [pdf, other

    cs.CL

    Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs

    Authors: Juraj Vladika, Alexander Fichtl, Florian Matthes

    Abstract: Recent advances in natural language processing (NLP) owe their success to pre-training language models on large amounts of unstructured data. Still, there is an increasing effort to combine the unstructured nature of LMs with structured knowledge and reasoning. Particularly in the rapidly evolving field of biomedical NLP, knowledge-enhanced language models (KELMs) have emerged as promising tools t… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted as Full Paper to ICAART 2024

  21. A Knowledge Graph Approach for Exploratory Search in Research Institutions

    Authors: Tim Schopf, Nektrios Machner, Florian Matthes

    Abstract: Over the past decades, research institutions have grown increasingly and consequently also their research output. This poses a significant challenge for researchers seeking to understand the research landscape of an institution. The process of exploring the research landscape of institutions has a vague information need, no precise goal, and is open-ended. Current applications are not designed to… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted to 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KMIS

  22. A Taxonomy of Decentralized Identifier Methods for Practitioners

    Authors: Felix Hoops, Alexander Mühle, Florian Matthes, Christoph Meinel

    Abstract: A core part of the new identity management paradigm of Self-Sovereign Identity (SSI) is the W3C Decentralized Identifiers (DIDs) standard. The diversity of interoperable implementations encouraged by the paradigm is key for a less centralized future, and it is made possible by the concept of DIDs. However, this leads to a kind of dilemma of choices, where practitioners are faced with the difficult… ▽ More

    Submitted 18 October, 2023; originally announced November 2023.

    Journal ref: 2023 IEEE International Conference on Decentralized Applications and Infrastructures (DAPPS), Athens, Greece, 2023, pp. 57-65

  23. arXiv:2310.05150  [pdf, other

    cs.CL cs.IR

    From Data to Dialogue: Leveraging the Structure of Knowledge Graphs for Conversational Exploratory Search

    Authors: Phillip Schneider, Nils Rehtanz, Kristiina Jokinen, Florian Matthes

    Abstract: Exploratory search is an open-ended information retrieval process that aims at discovering knowledge about a topic or domain rather than searching for a specific answer or piece of information. Conversational interfaces are particularly suitable for supporting exploratory search, allowing users to refine queries and examine search results through interactive dialogues. In addition to conversationa… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to PACLIC 2023

  24. arXiv:2309.08503  [pdf, other

    cs.CL cs.AI

    HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking

    Authors: Juraj Vladika, Phillip Schneider, Florian Matthes

    Abstract: In the digital age, seeking health advice on the Internet has become a common practice. At the same time, determining the trustworthiness of online medical content is increasingly challenging. Fact-checking has emerged as an approach to assess the veracity of factual claims using evidence from credible knowledge sources. To help advance automated Natural Language Processing (NLP) solutions for thi… ▽ More

    Submitted 25 March, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted to LREC-COLING 2024

  25. arXiv:2309.07588  [pdf, other

    cond-mat.mes-hall

    Spin-Selective Electron Transport Through Single Chiral Molecules

    Authors: Mohammad Reza Safari, Frank Matthes, Claus M. Schneider, Karl-Heinz Ernst, Daniel E. Bürgler

    Abstract: The interplay between chirality and magnetism has been a source of fascination among scientists for over a century. In recent years, chirality-induced spin selectivity (CISS) has attracted renewed interest. It has been observed that electron transport through layers of homochiral molecules leads to a significant spin polarization of several tens of percent. Despite the abundant experimental eviden… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: 15 pages, 4 figures, plus Supporting Information

    Journal ref: Small, 2308233 (2023)

  26. arXiv:2308.06513  [pdf, other

    cs.CR

    A Study of MEV Extraction Techniques on a First-Come-First-Served Blockchain

    Authors: Burak Öz, Filip Rezabek, Jonas Gebele, Felix Hoops, Florian Matthes

    Abstract: Maximal Extractable Value (MEV) has become a significant incentive on blockchain networks, referring to the value captured through the manipulation of transaction execution order and strategic issuance of profit-generation transactions. We argue that transaction ordering techniques used for MEV extraction in blockchains where fees can influence the execution order do not directly apply to blockcha… ▽ More

    Submitted 15 January, 2024; v1 submitted 12 August, 2023; originally announced August 2023.

    Comments: 15 pages, 4 figures

  27. arXiv:2308.02454  [pdf, other

    cs.LG cs.AI cs.CR cs.DC

    SoK: Assessing the State of Applied Federated Machine Learning

    Authors: Tobias Müller, Maximilian Stäbler, Hugo Gascón, Frank Köster, Florian Matthes

    Abstract: Machine Learning (ML) has shown significant potential in various applications; however, its adoption in privacy-critical domains has been limited due to concerns about data privacy. A promising solution to this issue is Federated Machine Learning (FedML), a model-to-data approach that prioritizes data privacy. By enabling ML algorithms to be applied directly to distributed data sources without sha… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: 9 pages, 6 figures, 3 tables

  28. Exploring the Landscape of Natural Language Processing Research

    Authors: Tim Schopf, Karim Arabi, Florian Matthes

    Abstract: As an efficient approach to understand, generate, and process natural language texts, research in natural language processing (NLP) has exhibited a rapid spread and wide adoption in recent years. Given the increasing research work in this area, several NLP-related approaches have been surveyed in the research community. However, a comprehensive study that categorizes established topics, identifies… ▽ More

    Submitted 24 September, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: Extended version of the paper accepted to the 14th International Conference on Recent Advances in Natural Language Processing (RANLP 2023)

    ACM Class: I.2.7

  29. AspectCSE: Sentence Embeddings for Aspect-based Semantic Textual Similarity Using Contrastive Learning and Structured Knowledge

    Authors: Tim Schopf, Emanuel Gerber, Malte Ostendorff, Florian Matthes

    Abstract: Generic sentence embeddings provide a coarse-grained approximation of semantic textual similarity but ignore specific aspects that make texts similar. Conversely, aspect-based sentence embeddings provide similarities between texts based on certain predefined aspects. Thus, similarity predictions of texts are more targeted to specific requirements and more easily explainable. In this paper, we pres… ▽ More

    Submitted 24 September, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

    Comments: Accepted to the 14th International Conference on Recent Advances in Natural Language Processing (RANLP 2023)

    ACM Class: I.2.7

  30. arXiv:2307.05814  [pdf, other

    cs.CR cs.GT

    Time Moves Faster When There is Nothing You Anticipate: The Role of Time in MEV Rewards

    Authors: Burak Öz, Benjamin Kraner, Nicolò Vallarano, Bingle Stegmann Kruger, Florian Matthes, Claudio Juan Tessone

    Abstract: This study explores the intricacies of waiting games, a novel dynamic that emerged with Ethereum's transition to a Proof-of-Stake (PoS)-based block proposer selection protocol. Within this PoS framework, validators acquire a distinct monopoly position during their assigned slots, given that block proposal rights are set deterministically, contrasting with Proof-of-Work (PoW) protocols. Consequentl… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: 23 pages, 13 figures, 3 appendices

  31. Efficient Domain Adaptation of Sentence Embeddings Using Adapters

    Authors: Tim Schopf, Dennis N. Schneider, Florian Matthes

    Abstract: Sentence embeddings enable us to capture the semantic similarity of short texts. Most sentence embedding models are trained for general semantic textual similarity tasks. Therefore, to use sentence embeddings in a particular domain, the model must be adapted to it in order to achieve good results. Usually, this is done by fine-tuning the entire sentence embedding model for the domain of interest.… ▽ More

    Submitted 24 September, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: Accepted to the 14th International Conference on Recent Advances in Natural Language Processing (RANLP 2023)

    ACM Class: I.2.7

  32. Challenges in Domain-Specific Abstractive Summarization and How to Overcome them

    Authors: Anum Afzal, Juraj Vladika, Daniel Braun, Florian Matthes

    Abstract: Large Language Models work quite well with general-purpose data and many tasks in Natural Language Processing. However, they show several limitations when used for a task such as domain-specific abstractive text summarization. This paper identifies three of those limitations as research problems in the context of abstractive text summarization: 1) Quadratic complexity of transformer-based models w… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  33. arXiv:2306.15497  [pdf

    cs.CR cs.CY

    Identifying Practical Challenges in the Implementation of Technical Measures for Data Privacy Compliance

    Authors: Oleksandra Klymenko, Stephen Meisenbacher, Florian Matthes

    Abstract: Modern privacy regulations provide a strict mandate for data processing entities to implement appropriate technical measures to demonstrate compliance. In practice, determining what measures are indeed "appropriate" is not trivial, particularly in light of vague guidelines provided by privacy regulations. To exacerbate the issue, challenges arise not only in the implementation of the technical mea… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: 10 pages, 2 tables, 29th Americas Conference on Information Systems (AMCIS 2023)

  34. arXiv:2305.16859  [pdf, other

    cs.CL

    Scientific Fact-Checking: A Survey of Resources and Approaches

    Authors: Juraj Vladika, Florian Matthes

    Abstract: The task of fact-checking deals with assessing the veracity of factual claims based on credible evidence and background knowledge. In particular, scientific fact-checking is the variation of the task concerned with verifying claims rooted in scientific knowledge. This task has received significant attention due to the growing importance of scientific and health discussions on online platforms. Aut… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: 9 pages, ACL 2023 (Findings)

  35. arXiv:2304.13688  [pdf

    cs.AI

    Unlocking the Potential of Collaborative AI -- On the Socio-technical Challenges of Federated Machine Learning

    Authors: Tobias Müller, Milena Zahn, Florian Matthes

    Abstract: The disruptive potential of AI systems roots in the emergence of big data. Yet, a significant portion is scattered and locked in data silos, leaving its potential untapped. Federated Machine Learning is a novel AI paradigm enabling the creation of AI models from decentralized, potentially siloed data. Hence, Federated Machine Learning could technically open data silos and therefore unlock economic… ▽ More

    Submitted 28 April, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

    Comments: Accepted for Publication at the 31st European Conference on Information Systems (ECIS 2023)

  36. arXiv:2304.13180  [pdf, other

    cs.CL

    Sebis at SemEval-2023 Task 7: A Joint System for Natural Language Inference and Evidence Retrieval from Clinical Trial Reports

    Authors: Juraj Vladika, Florian Matthes

    Abstract: With the increasing number of clinical trial reports generated every day, it is becoming hard to keep up with novel discoveries that inform evidence-based healthcare recommendations. To help automate this process and assist medical experts, NLP solutions are being developed. This motivated the SemEval-2023 Task 7, where the goal was to develop an NLP system for two tasks: evidence retrieval and na… ▽ More

    Submitted 2 May, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: 6 pages, SemEval 2023

  37. arXiv:2303.14286  [pdf, other

    cs.CL

    Voice-Based Conversational Agents and Knowledge Graphs for Improving News Search in Assisted Living

    Authors: Phillip Schneider, Nils Rehtanz, Kristiina Jokinen, Florian Matthes

    Abstract: As the healthcare sector is facing major challenges, such as aging populations, staff shortages, and common chronic diseases, delivering high-quality care to individuals has become very difficult. Conversational agents have shown to be a promising technology to alleviate some of these issues. In the form of digital health assistants, they have the potential to improve the everyday life of the elde… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: 10 pages, submitted to PETRA 2023

  38. arXiv:2301.04098  [pdf, other

    cs.CL

    Investigating Conversational Search Behavior For Domain Exploration

    Authors: Phillip Schneider, Anum Afzal, Juraj Vladika, Daniel Braun, Florian Matthes

    Abstract: Conversational search has evolved as a new information retrieval paradigm, marking a shift from traditional search systems towards interactive dialogues with intelligent search agents. This change especially affects exploratory information-seeking contexts, where conversational search systems can guide the discovery of unfamiliar domains. In these scenarios, users find it often difficult to expres… ▽ More

    Submitted 27 February, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

    Comments: Accepted to ECIR 2023

  39. Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches

    Authors: Tim Schopf, Daniel Braun, Florian Matthes

    Abstract: Text classification of unseen classes is a challenging Natural Language Processing task and is mainly attempted using two different types of approaches. Similarity-based approaches attempt to classify instances based on similarities between text document representations and class description representations. Zero-shot text classification approaches aim to generalize knowledge gained from a trainin… ▽ More

    Submitted 31 January, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: Accepted to 6th International Conference on Natural Language Processing and Information Retrieval (NLPIR '22)

    ACM Class: I.2.7

  40. arXiv:2211.12976  [pdf, other

    cond-mat.mtrl-sci

    Enantioselective adsorption on magnetic surfaces

    Authors: Mohammad Reza Safari, Frank Matthes, Vasile Caciuc, Nicolae Atodiresei, Claus M. Schneider, Karl-Heinz Ernst, Daniel E. Bürgler

    Abstract: From the beginning of molecular theory, the interplay of chirality and magnetism has intrigued scientists. There is still the question if enantiospecific adsorption of chiral molecules occurs on magnetic surfaces. Enantiomer discrimination was conjectured to arise from chirality-induced spin separation within the molecules and exchange interaction with the substrate's magnetization. Here we show t… ▽ More

    Submitted 25 October, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: 19 pages, 4 figures, plus Supporting Information

    Journal ref: Adv. Mater. 2308666 (2024)

  41. arXiv:2211.11057  [pdf, other

    cs.CL cs.SE

    Semantic Similarity-Based Clustering of Findings From Security Testing Tools

    Authors: Phillip Schneider, Markus Voggenreiter, Abdullah Gulraiz, Florian Matthes

    Abstract: Over the last years, software development in domains with high security demands transitioned from traditional methodologies to uniting modern approaches from software development and operations (DevOps). Key principles of DevOps gained more importance and are now applied to security aspects of software development, resulting in the automation of security-enhancing activities. In particular, it is… ▽ More

    Submitted 20 November, 2022; originally announced November 2022.

    Comments: Accepted to ICNLSP 2022

  42. arXiv:2211.03898  [pdf, other

    cs.CR

    Lessons Learned: Surveying the Practicality of Differential Privacy in the Industry

    Authors: Gonzalo Munilla Garrido, Xiaoyuan Liu, Florian Matthes, Dawn Song

    Abstract: Since its introduction in 2006, differential privacy has emerged as a predominant statistical tool for quantifying data privacy in academic works. Yet despite the plethora of research and open-source utilities that have accompanied its rise, with limited exceptions, differential privacy has failed to achieve widespread adoption in the enterprise domain. Our study aims to shed light on the fundamen… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  43. Lbl2Vec: An Embedding-Based Approach for Unsupervised Document Retrieval on Predefined Topics

    Authors: Tim Schopf, Daniel Braun, Florian Matthes

    Abstract: In this paper, we consider the task of retrieving documents with predefined topics from an unlabeled document dataset using an unsupervised approach. The proposed unsupervised approach requires only a small number of keywords describing the respective topics and no labeled document. Existing approaches either heavily relied on a large amount of additionally encoded world knowledge or on term-docum… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Journal ref: In Proceedings of the 17th International Conference on Web Information Systems and Technologies - WEBIST, ISBN 978-989-758-536-4; ISSN 2184-3252, pages 124-132 (2021)

  44. PatternRank: Leveraging Pretrained Language Models and Part of Speech for Unsupervised Keyphrase Extraction

    Authors: Tim Schopf, Simon Klimek, Florian Matthes

    Abstract: Keyphrase extraction is the process of automatically selecting a small set of most relevant phrases from a given text. Supervised keyphrase extraction approaches need large amounts of labeled training data and perform poorly outside the domain of the training data. In this paper, we present PatternRank, which leverages pretrained language models and part-of-speech for unsupervised keyphrase extrac… ▽ More

    Submitted 12 October, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted to 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR

    Journal ref: In Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR 2022, ISBN 978-989-758-614-9; ISSN 2184-3228, pages 243-248. DOI: 10.5220/0011546600003335

  45. arXiv:2210.00105  [pdf, other

    cs.CL cs.AI

    A Decade of Knowledge Graphs in Natural Language Processing: A Survey

    Authors: Phillip Schneider, Tim Schopf, Juraj Vladika, Mikhail Galkin, Elena Simperl, Florian Matthes

    Abstract: In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing am… ▽ More

    Submitted 30 September, 2022; originally announced October 2022.

    Comments: Accepted to AACL-IJCNLP 2022

  46. Exploring privacy-enhancing technologies in the automotive value chain

    Authors: Gonzalo Munilla Garrido, Kaja Schmidt, Christopher Harth-Kitzerow, Johannes Klepsch, Andre Luckow, Florian Matthes

    Abstract: Privacy-enhancing technologies (PETs) are becoming increasingly crucial for addressing customer needs, security, privacy (e.g., enhancing anonymity and confidentiality), and regulatory requirements. However, applying PETs in organizations requires a precise understanding of use cases, technologies, and limitations. This paper investigates several industrial use cases, their characteristics, and th… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Journal ref: 2021 IEEE International Conference on Big Data (Big Data)

  47. Understanding the Implementation of Technical Measures in the Process of Data Privacy Compliance: A Qualitative Study

    Authors: Oleksandra Klymenko, Oleksandr Kosenkov, Stephen Meisenbacher, Parisa Elahidoost, Daniel Mendez, Florian Matthes

    Abstract: Modern privacy regulations, such as the General Data Protection Regulation (GDPR), address privacy in software systems in a technologically agnostic way by mentioning general "technical measures" for data privacy compliance rather than dictating how these should be implemented. An understanding of the concept of technical measures and how exactly these can be handled in practice, however, is not t… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

    Comments: The 16th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

  48. Differential Privacy in Natural Language Processing: The Story So Far

    Authors: Oleksandra Klymenko, Stephen Meisenbacher, Florian Matthes

    Abstract: As the tide of Big Data continues to influence the landscape of Natural Language Processing (NLP), the utilization of modern NLP methods has grounded itself in this data, in order to tackle a variety of text-based tasks. These methods without a doubt can include private or otherwise personally identifiable information. As such, the question of privacy in NLP has gained fervor in recent years, coin… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

  49. arXiv:2201.03913   

    cs.CR

    Exponential Randomized Response: Boosting Utility in Differentially Private Selection

    Authors: Gonzalo Munilla Garrido, Florian Matthes

    Abstract: A differentially private selection algorithm outputs from a finite set the item that approximately maximizes a data-dependent quality function. The most widely adopted mechanisms tackling this task are the pioneering exponential mechanism and permute-and-flip, which can offer utility improvements of up to a factor of two over the exponential mechanism. This work introduces a new differentially pri… ▽ More

    Submitted 3 August, 2022; v1 submitted 11 January, 2022; originally announced January 2022.

    Comments: This algorithm only works under an assumption that is not realistic for the wider application of differential privacy

  50. arXiv:2109.10789  [pdf, other

    cs.CR

    Do I Get the Privacy I Need? Benchmarking Utility in Differential Privacy Libraries

    Authors: Gonzalo Munilla Garrido, Joseph Near, Aitsam Muhammad, Warren He, Roman Matzutt, Florian Matthes

    Abstract: An increasing number of open-source libraries promise to bring differential privacy to practice, even for non-experts. This paper studies five libraries that offer differentially private analytics: Google DP, SmartNoise, diffprivlib, diffpriv, and Chorus. We compare these libraries qualitatively (capabilities, features, and maturity) and quantitatively (utility and scalability) across four analyti… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

    Comments: 13 pages, 12 figures, 15 tables, and 1 algorithm