Search | arXiv e-print repository

The Constitutions of Web3

Authors: Joshua Z. Tan, Max Langenkamp, Anna Weichselbraun, Ann Brody, Lucia Korpas

Abstract: The governance of online communities has been a critical issue since the first USENET groups, and a number of serious constitutions -- declarations of goals, values, and rights -- have emerged since the mid-1990s. More recently, decentralized autonomous organizations (DAOs) have begun to publish their own constitutions, manifestos, and other governance documents. There are two unique aspects to th… ▽ More The governance of online communities has been a critical issue since the first USENET groups, and a number of serious constitutions -- declarations of goals, values, and rights -- have emerged since the mid-1990s. More recently, decentralized autonomous organizations (DAOs) have begun to publish their own constitutions, manifestos, and other governance documents. There are two unique aspects to these documents: they (1) often govern significantly more resources than previously-observed online communities, and (2) are used in conjunction with smart contracts that can secure certain community rights and processes through code. In this article, we analyze 25 DAO constitutions, observe a number of common patterns, and provide a template and a set of recommendations to support the crafting and dissemination of future DAO constitutions. We conclude with a report on how our template and recommendations were then used within the actual constitutional drafting process of a major blockchain. △ Less

Submitted 29 February, 2024; originally announced March 2024.

arXiv:2305.06140 [pdf, other]

doi 10.5220/0011749600003467

CrudeBERT: Applying Economic Theory towards fine-tuning Transformer-based Sentiment Analysis Models to the Crude Oil Market

Authors: Himmet Kaplan, Ralf-Peter Mundani, Heiko Rölke, Albert Weichselbraun

Abstract: Predicting market movements based on the sentiment of news media has a long tradition in data analysis. With advances in natural language processing, transformer architectures have emerged that enable contextually aware sentiment classification. Nevertheless, current methods built for the general financial market such as FinBERT cannot distinguish asset-specific value-driving factors. This paper a… ▽ More Predicting market movements based on the sentiment of news media has a long tradition in data analysis. With advances in natural language processing, transformer architectures have emerged that enable contextually aware sentiment classification. Nevertheless, current methods built for the general financial market such as FinBERT cannot distinguish asset-specific value-driving factors. This paper addresses this shortcoming by presenting a method that identifies and classifies events that impact supply and demand in the crude oil markets within a large corpus of relevant news headlines. We then introduce CrudeBERT, a new sentiment analysis model that draws upon these events to contextualize and fine-tune FinBERT, thereby yielding improved sentiment classifications for headlines related to the crude oil futures market. An extensive evaluation demonstrates that CrudeBERT outperforms proprietary and open-source solutions in the domain of crude oil. △ Less

Submitted 10 May, 2023; originally announced May 2023.

ACM Class: H.3; H.4; I.2.7

Journal ref: Proceedings of the 25th International Conference on Enterprise Information Systems (ICEIS 2023), pages 324-334

arXiv:2207.04862 [pdf, other]

doi 10.1007/978-3-031-08473-7_25

Slot Filling for Extracting Reskilling and Upskilling Options from the Web

Authors: Albert Weichselbraun, Roger Waldvogel, Andreas Fraefel, Alexander van Schie, Philipp Kuntschik

Abstract: Disturbances in the job market such as advances in science and technology, crisis and increased competition have triggered a surge in reskilling and upskilling programs. Information on suitable continuing education options is distributed across many sites, rendering the search, comparison and selection of useful programs a cumbersome task. This paper, therefore, introduces a knowledge extraction s… ▽ More Disturbances in the job market such as advances in science and technology, crisis and increased competition have triggered a surge in reskilling and upskilling programs. Information on suitable continuing education options is distributed across many sites, rendering the search, comparison and selection of useful programs a cumbersome task. This paper, therefore, introduces a knowledge extraction system that integrates reskilling and upskilling options into a single knowledge graph. The system collects educational programs from 488 different providers and uses context extraction for identifying and contextualizing relevant content. Afterwards, entity recognition and entity linking methods draw upon a domain ontology to locate relevant entities such as skills, occupations and topics. Finally, slot filling integrates entities based on their context into the corresponding slots of the continuous education knowledge graph. We also introduce a German gold standard that comprises 169 documents and over 3800 annotations for benchmarking the necessary content extraction, entity linking, entity recognition and slot filling tasks, and provide an overview of the system's performance. △ Less

Submitted 11 July, 2022; originally announced July 2022.

Comments: Natural Language Processing and Information Systems (NLDB 2022). This preprint has not undergone any post-submission improvements or corrections. The Version of Record of this contribution is published in "27th International Conference on Applications of Natural Language to Information Systems (NLDB 2022), Valencia, Spain, June 15-17, 2022, Proceedings", and is available online at https://doi.org/10.1007/978-3-031-08473-7_25

ACM Class: H.3; H.4; I.2.7

arXiv:2108.01454 [pdf, other]

doi 10.21105/joss.03557

Inscriptis -- A Python-based HTML to text conversion library optimized for knowledge extraction from the Web

Authors: Albert Weichselbraun

Abstract: Inscriptis provides a library, command line client and Web service for converting HTML to plain text. Its development has been triggered by the need to obtain accurate text representations for knowledge extraction tasks that preserve the spatial alignment of text without drawing upon heavyweight, browser-based solutions such as Selenium. In contrast to related software packages, Inscriptis (i) pro… ▽ More Inscriptis provides a library, command line client and Web service for converting HTML to plain text. Its development has been triggered by the need to obtain accurate text representations for knowledge extraction tasks that preserve the spatial alignment of text without drawing upon heavyweight, browser-based solutions such as Selenium. In contrast to related software packages, Inscriptis (i) provides a layout-aware conversion of HTML that more closely resembles the rendering obtained from standard Web browsers; and (ii) supports annotation rules, i.e., user-provided map**s that allow for annotating the extracted text based on structural and semantic information encoded in HTML tags and attributes. These unique features ensure that downstream knowledge extraction components can operate on accurate text representations, and may even use information on the semantics and structure of the original HTML document. △ Less

Submitted 22 October, 2021; v1 submitted 12 July, 2021; originally announced August 2021.

Comments: Preprint of the published version, which includes all improvements made during the review process

ACM Class: H.3.1; H.4.3

Journal ref: Journal of Open Source Software (2021), 6(66), 3557

arXiv:2102.02240 [pdf, ps, other]

doi 10.1109/WIIAT50758.2020.00065

Harvest -- An Open Source Toolkit for Extracting Posts and Post Metadata from Web Forums

Authors: Albert Weichselbraun, Adrian M. P. Brasoveanu, Roger Waldvogel, Fabian Odoni

Abstract: Automatic extraction of forum posts and metadata is a crucial but challenging task since forums do not expose their content in a standardized structure. Content extraction methods, therefore, often need customizations such as adaptations to page templates and improvements of their extraction code before they can be deployed to new forums. Most of the current solutions are also built for the more g… ▽ More Automatic extraction of forum posts and metadata is a crucial but challenging task since forums do not expose their content in a standardized structure. Content extraction methods, therefore, often need customizations such as adaptations to page templates and improvements of their extraction code before they can be deployed to new forums. Most of the current solutions are also built for the more general case of content extraction from web pages and lack key features important for understanding forum content such as the identification of author metadata and information on the thread structure. This paper, therefore, presents a method that determines the XPath of forum posts, eliminating incorrect mergers and splits of the extracted posts that were common in systems from the previous generation. Based on the individual posts further metadata such as authors, forum URL and structure are extracted. We also introduce Harvest, a new open source toolkit that implements the presented methods and create a gold standard extracted from 52 different Web forums for evaluating our approach. A comprehensive evaluation reveals that Harvest clearly outperforms competing systems. △ Less

Submitted 3 February, 2021; originally announced February 2021.

Comments: IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 2020), Accepted 27 October 2020

ACM Class: H.3.1; H.3.3; I.2.7

arXiv:2102.00827 [pdf]

doi 10.1007/s12559-021-09839-4

Automatic Expansion of Domain-Specific Affective Models for Web Intelligence Applications

Authors: Albert Weichselbraun, Jakob Steixner, Adrian M. P. Braşoveanu, Arno Scharl, Max Göbel, Lyndon J. B. Nixon

Abstract: Sentic computing relies on well-defined affective models of different complexity - polarity to distinguish positive and negative sentiment, for example, or more nuanced models to capture expressions of human emotions. When used to measure communication success, even the most granular affective model combined with sophisticated machine learning approaches may not fully capture an organisation's str… ▽ More Sentic computing relies on well-defined affective models of different complexity - polarity to distinguish positive and negative sentiment, for example, or more nuanced models to capture expressions of human emotions. When used to measure communication success, even the most granular affective model combined with sophisticated machine learning approaches may not fully capture an organisation's strategic positioning goals. Such goals often deviate from the assumptions of standardised affective models. While certain emotions such as Joy and Trust typically represent desirable brand associations, specific communication goals formulated by marketing professionals often go beyond such standard dimensions. For instance, the brand manager of a television show may consider fear or sadness to be desired emotions for its audience. This article introduces expansion techniques for affective models, combining common and commonsense knowledge available in knowledge graphs with language models and affective reasoning, improving coverage and consistency as well as supporting domain-specific interpretations of emotions. An extensive evaluation compares the performance of different expansion techniques: (i) a quantitative evaluation based on the revisited Hourglass of Emotions model to assess performance on complex models that cover multiple affective categories, using manually compiled gold standard data, and (ii) a qualitative evaluation of a domain-specific affective model for television programme brands. The results of these evaluations demonstrate that the introduced techniques support a variety of embeddings and pre-trained models. The paper concludes with a discussion on applying this approach to other scenarios where affective model resources are scarce. △ Less

Submitted 1 February, 2021; originally announced February 2021.

Comments: see also https://link.springer.com/article/10.1007/s12559-021-09839-4

ACM Class: H.3.1; H.4.2; I.2.7

Journal ref: Cognitive Computation, (2021), 1-18

arXiv:2010.09249 [pdf]

doi 10.1515/iwp-2020-2119

Improving Company Valuations with Automated Knowledge Discovery, Extraction and Fusion

Authors: Albert Weichselbraun, Philipp Kuntschik, Sandro Hörler

Abstract: Performing company valuations within the domain of biotechnology, pharmacy and medical technology is a challenging task, especially when considering the unique set of risks biotech start-ups face when entering new markets. Companies specialized in global valuation services, therefore, combine valuation models and past experience with heterogeneous metrics and indicators that provide insights into… ▽ More Performing company valuations within the domain of biotechnology, pharmacy and medical technology is a challenging task, especially when considering the unique set of risks biotech start-ups face when entering new markets. Companies specialized in global valuation services, therefore, combine valuation models and past experience with heterogeneous metrics and indicators that provide insights into a company's performance. This paper illustrates how automated knowledge discovery, extraction and data fusion can be used to (i) obtain additional indicators that provide insights into the success of a company's product development efforts, and (ii) support labor-intensive data curation processes. We apply deep web knowledge acquisition methods to identify and harvest data on clinical trials that is hidden behind proprietary search interfaces and integrate the extracted data into the industry partner's company valuation ontology. In addition, focused Web crawls and shallow semantic parsing yield information on the company's key personnel and respective contact data, notifying domain experts of relevant changes that get then incorporated into the industry partner's company data. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Comments: English translation of the article: "Optimierung von Unternehmensbewertungen durch automatisierte Wissensidentifikation, -extraktion und -integration". Information - Wissenschaft und Praxis 71 (5-6):1-5, https://doi.org/10.1515/iwp-2020-2119

ACM Class: H.3.1; H.3.3; J.4

Showing 1–7 of 7 results for author: Weichselbraun, A