-
The Constitutions of Web3
Authors:
Joshua Z. Tan,
Max Langenkamp,
Anna Weichselbraun,
Ann Brody,
Lucia Korpas
Abstract:
The governance of online communities has been a critical issue since the first USENET groups, and a number of serious constitutions -- declarations of goals, values, and rights -- have emerged since the mid-1990s. More recently, decentralized autonomous organizations (DAOs) have begun to publish their own constitutions, manifestos, and other governance documents. There are two unique aspects to th…
▽ More
The governance of online communities has been a critical issue since the first USENET groups, and a number of serious constitutions -- declarations of goals, values, and rights -- have emerged since the mid-1990s. More recently, decentralized autonomous organizations (DAOs) have begun to publish their own constitutions, manifestos, and other governance documents. There are two unique aspects to these documents: they (1) often govern significantly more resources than previously-observed online communities, and (2) are used in conjunction with smart contracts that can secure certain community rights and processes through code. In this article, we analyze 25 DAO constitutions, observe a number of common patterns, and provide a template and a set of recommendations to support the crafting and dissemination of future DAO constitutions. We conclude with a report on how our template and recommendations were then used within the actual constitutional drafting process of a major blockchain.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
CrudeBERT: Applying Economic Theory towards fine-tuning Transformer-based Sentiment Analysis Models to the Crude Oil Market
Authors:
Himmet Kaplan,
Ralf-Peter Mundani,
Heiko Rölke,
Albert Weichselbraun
Abstract:
Predicting market movements based on the sentiment of news media has a long tradition in data analysis. With advances in natural language processing, transformer architectures have emerged that enable contextually aware sentiment classification. Nevertheless, current methods built for the general financial market such as FinBERT cannot distinguish asset-specific value-driving factors. This paper a…
▽ More
Predicting market movements based on the sentiment of news media has a long tradition in data analysis. With advances in natural language processing, transformer architectures have emerged that enable contextually aware sentiment classification. Nevertheless, current methods built for the general financial market such as FinBERT cannot distinguish asset-specific value-driving factors. This paper addresses this shortcoming by presenting a method that identifies and classifies events that impact supply and demand in the crude oil markets within a large corpus of relevant news headlines. We then introduce CrudeBERT, a new sentiment analysis model that draws upon these events to contextualize and fine-tune FinBERT, thereby yielding improved sentiment classifications for headlines related to the crude oil futures market. An extensive evaluation demonstrates that CrudeBERT outperforms proprietary and open-source solutions in the domain of crude oil.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Slot Filling for Extracting Reskilling and Upskilling Options from the Web
Authors:
Albert Weichselbraun,
Roger Waldvogel,
Andreas Fraefel,
Alexander van Schie,
Philipp Kuntschik
Abstract:
Disturbances in the job market such as advances in science and technology, crisis and increased competition have triggered a surge in reskilling and upskilling programs. Information on suitable continuing education options is distributed across many sites, rendering the search, comparison and selection of useful programs a cumbersome task. This paper, therefore, introduces a knowledge extraction s…
▽ More
Disturbances in the job market such as advances in science and technology, crisis and increased competition have triggered a surge in reskilling and upskilling programs. Information on suitable continuing education options is distributed across many sites, rendering the search, comparison and selection of useful programs a cumbersome task. This paper, therefore, introduces a knowledge extraction system that integrates reskilling and upskilling options into a single knowledge graph. The system collects educational programs from 488 different providers and uses context extraction for identifying and contextualizing relevant content. Afterwards, entity recognition and entity linking methods draw upon a domain ontology to locate relevant entities such as skills, occupations and topics. Finally, slot filling integrates entities based on their context into the corresponding slots of the continuous education knowledge graph. We also introduce a German gold standard that comprises 169 documents and over 3800 annotations for benchmarking the necessary content extraction, entity linking, entity recognition and slot filling tasks, and provide an overview of the system's performance.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
Inscriptis -- A Python-based HTML to text conversion library optimized for knowledge extraction from the Web
Authors:
Albert Weichselbraun
Abstract:
Inscriptis provides a library, command line client and Web service for converting HTML to plain text. Its development has been triggered by the need to obtain accurate text representations for knowledge extraction tasks that preserve the spatial alignment of text without drawing upon heavyweight, browser-based solutions such as Selenium. In contrast to related software packages, Inscriptis (i) pro…
▽ More
Inscriptis provides a library, command line client and Web service for converting HTML to plain text. Its development has been triggered by the need to obtain accurate text representations for knowledge extraction tasks that preserve the spatial alignment of text without drawing upon heavyweight, browser-based solutions such as Selenium. In contrast to related software packages, Inscriptis (i) provides a layout-aware conversion of HTML that more closely resembles the rendering obtained from standard Web browsers; and (ii) supports annotation rules, i.e., user-provided map**s that allow for annotating the extracted text based on structural and semantic information encoded in HTML tags and attributes. These unique features ensure that downstream knowledge extraction components can operate on accurate text representations, and may even use information on the semantics and structure of the original HTML document.
△ Less
Submitted 22 October, 2021; v1 submitted 12 July, 2021;
originally announced August 2021.
-
Harvest -- An Open Source Toolkit for Extracting Posts and Post Metadata from Web Forums
Authors:
Albert Weichselbraun,
Adrian M. P. Brasoveanu,
Roger Waldvogel,
Fabian Odoni
Abstract:
Automatic extraction of forum posts and metadata is a crucial but challenging task since forums do not expose their content in a standardized structure. Content extraction methods, therefore, often need customizations such as adaptations to page templates and improvements of their extraction code before they can be deployed to new forums. Most of the current solutions are also built for the more g…
▽ More
Automatic extraction of forum posts and metadata is a crucial but challenging task since forums do not expose their content in a standardized structure. Content extraction methods, therefore, often need customizations such as adaptations to page templates and improvements of their extraction code before they can be deployed to new forums. Most of the current solutions are also built for the more general case of content extraction from web pages and lack key features important for understanding forum content such as the identification of author metadata and information on the thread structure.
This paper, therefore, presents a method that determines the XPath of forum posts, eliminating incorrect mergers and splits of the extracted posts that were common in systems from the previous generation. Based on the individual posts further metadata such as authors, forum URL and structure are extracted. We also introduce Harvest, a new open source toolkit that implements the presented methods and create a gold standard extracted from 52 different Web forums for evaluating our approach. A comprehensive evaluation reveals that Harvest clearly outperforms competing systems.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
Automatic Expansion of Domain-Specific Affective Models for Web Intelligence Applications
Authors:
Albert Weichselbraun,
Jakob Steixner,
Adrian M. P. Braşoveanu,
Arno Scharl,
Max Göbel,
Lyndon J. B. Nixon
Abstract:
Sentic computing relies on well-defined affective models of different complexity - polarity to distinguish positive and negative sentiment, for example, or more nuanced models to capture expressions of human emotions. When used to measure communication success, even the most granular affective model combined with sophisticated machine learning approaches may not fully capture an organisation's str…
▽ More
Sentic computing relies on well-defined affective models of different complexity - polarity to distinguish positive and negative sentiment, for example, or more nuanced models to capture expressions of human emotions. When used to measure communication success, even the most granular affective model combined with sophisticated machine learning approaches may not fully capture an organisation's strategic positioning goals. Such goals often deviate from the assumptions of standardised affective models. While certain emotions such as Joy and Trust typically represent desirable brand associations, specific communication goals formulated by marketing professionals often go beyond such standard dimensions. For instance, the brand manager of a television show may consider fear or sadness to be desired emotions for its audience. This article introduces expansion techniques for affective models, combining common and commonsense knowledge available in knowledge graphs with language models and affective reasoning, improving coverage and consistency as well as supporting domain-specific interpretations of emotions. An extensive evaluation compares the performance of different expansion techniques: (i) a quantitative evaluation based on the revisited Hourglass of Emotions model to assess performance on complex models that cover multiple affective categories, using manually compiled gold standard data, and (ii) a qualitative evaluation of a domain-specific affective model for television programme brands. The results of these evaluations demonstrate that the introduced techniques support a variety of embeddings and pre-trained models. The paper concludes with a discussion on applying this approach to other scenarios where affective model resources are scarce.
△ Less
Submitted 1 February, 2021;
originally announced February 2021.
-
Improving Company Valuations with Automated Knowledge Discovery, Extraction and Fusion
Authors:
Albert Weichselbraun,
Philipp Kuntschik,
Sandro Hörler
Abstract:
Performing company valuations within the domain of biotechnology, pharmacy and medical technology is a challenging task, especially when considering the unique set of risks biotech start-ups face when entering new markets. Companies specialized in global valuation services, therefore, combine valuation models and past experience with heterogeneous metrics and indicators that provide insights into…
▽ More
Performing company valuations within the domain of biotechnology, pharmacy and medical technology is a challenging task, especially when considering the unique set of risks biotech start-ups face when entering new markets. Companies specialized in global valuation services, therefore, combine valuation models and past experience with heterogeneous metrics and indicators that provide insights into a company's performance. This paper illustrates how automated knowledge discovery, extraction and data fusion can be used to (i) obtain additional indicators that provide insights into the success of a company's product development efforts, and (ii) support labor-intensive data curation processes. We apply deep web knowledge acquisition methods to identify and harvest data on clinical trials that is hidden behind proprietary search interfaces and integrate the extracted data into the industry partner's company valuation ontology. In addition, focused Web crawls and shallow semantic parsing yield information on the company's key personnel and respective contact data, notifying domain experts of relevant changes that get then incorporated into the industry partner's company data.
△ Less
Submitted 19 October, 2020;
originally announced October 2020.