-
Enabling the Digital Democratic Revival: A Research Program for Digital Democracy
Authors:
Davide Grossi,
Ulrike Hahn,
Michael Mäs,
Andreas Nitsche,
Jan Behrens,
Niclas Boehmer,
Markus Brill,
Ulle Endriss,
Umberto Grandi,
Adrian Haret,
Jobst Heitzig,
Nicolien Janssens,
Catholijn M. Jonker,
Marijn A. Keijzer,
Axel Kistner,
Martin Lackner,
Alexandra Lieben,
Anna Mikhaylovskaya,
Pradeep K. Murukannaiah,
Carlo Proietti,
Manon Revel,
Élise Rouméas,
Ehud Shapiro,
Gogulapati Sreedurga,
Björn Swierczek
, et al. (4 additional authors not shown)
Abstract:
This white paper outlines a long-term scientific vision for the development of digital-democracy technology. We contend that if digital democracy is to meet the ambition of enabling a participatory renewal in our societies, then a comprehensive multi-methods research effort is required that could, over the years, support its development in a democratically principled, empirically and computational…
▽ More
This white paper outlines a long-term scientific vision for the development of digital-democracy technology. We contend that if digital democracy is to meet the ambition of enabling a participatory renewal in our societies, then a comprehensive multi-methods research effort is required that could, over the years, support its development in a democratically principled, empirically and computationally informed way. The paper is co-authored by an international and interdisciplinary team of researchers and arose from the Lorentz Center Workshop on ``Algorithmic Technology for Democracy'' (Leiden, October 2022).
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
A Bayesian Agent-Based Framework for Argument Exchange Across Networks
Authors:
Leon Assaad,
Rafael Fuchs,
Ammar Jalalimanesh,
Kirsty Phillips,
Leon Schoeppl,
Ulrike Hahn
Abstract:
In this paper, we introduce a new framework for modelling the exchange of multiple arguments across agents in a social network. To date, most modelling work concerned with opinion dynamics, testimony, or communication across social networks has involved only the simulated exchange of a single opinion or single claim. By contrast, real-world debate involves the provision of numerous individual argu…
▽ More
In this paper, we introduce a new framework for modelling the exchange of multiple arguments across agents in a social network. To date, most modelling work concerned with opinion dynamics, testimony, or communication across social networks has involved only the simulated exchange of a single opinion or single claim. By contrast, real-world debate involves the provision of numerous individual arguments relevant to such an opinion. This may include arguments both for and against, and arguments varying in strength. This prompts the need for appropriate aggregation rules for combining diverse evidence as well as rules for communication. Here, we draw on the Bayesian framework to create an agent-based modelling environment that allows the study of belief dynamics across complex domains characterised by Bayesian Networks. Initial case studies illustrate the scope of the framework.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Emotion Embeddings $\unicode{x2014}$ Learning Stable and Homogeneous Abstractions from Heterogeneous Affective Datasets
Authors:
Sven Buechel,
Udo Hahn
Abstract:
Human emotion is expressed in many communication modalities and media formats and so their computational study is equally diversified into natural language processing, audio signal analysis, computer vision, etc. Similarly, the large variety of representation formats used in previous research to describe emotions (polarity scales, basic emotion categories, dimensional approaches, appraisal theory,…
▽ More
Human emotion is expressed in many communication modalities and media formats and so their computational study is equally diversified into natural language processing, audio signal analysis, computer vision, etc. Similarly, the large variety of representation formats used in previous research to describe emotions (polarity scales, basic emotion categories, dimensional approaches, appraisal theory, etc.) have led to an ever proliferating diversity of datasets, predictive models, and software tools for emotion analysis. Because of these two distinct types of heterogeneity, at the expressional and representational level, there is a dire need to unify previous work on increasingly diverging data and label types. This article presents such a unifying computational model. We propose a training procedure that learns a shared latent representation for emotions, so-called emotion embeddings, independent of different natural languages, communication modalities, media or representation label formats, and even disparate model architectures. Experiments on a wide range of heterogeneous affective datasets indicate that this approach yields the desired interoperability for the sake of reusability, interpretability and flexibility, without penalizing prediction quality. Code and data are archived under https://doi.org/10.5281/zenodo.7405327 .
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
Managing Expert Disagreement for the Policy Process and Beyond
Authors:
Ulrike Hahn,
Jens Koed Madsen,
Chris Reed
Abstract:
In this paper, we outline a new proposal for communicating scientific debate to policymakers and other stakeholders in circumstances where there is substantial disagreement within the scientific literature. In those circumstances, it seems important to provide policy makers both with a useful, balanced summary that is representative of opinion in the field large, and to transparently communicate t…
▽ More
In this paper, we outline a new proposal for communicating scientific debate to policymakers and other stakeholders in circumstances where there is substantial disagreement within the scientific literature. In those circumstances, it seems important to provide policy makers both with a useful, balanced summary that is representative of opinion in the field large, and to transparently communicate the actual evidence-base. To this end, we propose the compilation of argument maps through a collective intelligence process; these maps are then given to a wide sample of the relevant research community for evaluation and summary opinion in an IGM style IGM style poll (see igmchicago.org), which provides a representative view of opinion on the issue at stake within the wider scientific community. Policymakers then receive these two artefacts (map and poll) as their expert advice. Such a process would help overcome the resource limitations of the traditional expert advice process, while also providing greater balance by drawing on the expertise of researchers beyond the leading proponents of particular theories within a field. And, the actual evidence base would be transparent. In this paper, we present a pilot project step** through the map building component of such a policy advice scheme. We detail process, products, and issues encountered by implementing in the OVA (Online Visualisation of Argument tool, ova.arg-tech.org) an argument map with sample evidence from the behavioural literature on communicating probabilities, as a central issue within pandemic.
△ Less
Submitted 28 December, 2022;
originally announced December 2022.
-
Can counterfactual explanations of AI systems' predictions skew lay users' causal intuitions about the world? If so, can we correct for that?
Authors:
Marko Tesic,
Ulrike Hahn
Abstract:
Counterfactual (CF) explanations have been employed as one of the modes of explainability in explainable AI-both to increase the transparency of AI systems and to provide recourse. Cognitive science and psychology, however, have pointed out that people regularly use CFs to express causal relationships. Most AI systems are only able to capture associations or correlations in data so interpreting th…
▽ More
Counterfactual (CF) explanations have been employed as one of the modes of explainability in explainable AI-both to increase the transparency of AI systems and to provide recourse. Cognitive science and psychology, however, have pointed out that people regularly use CFs to express causal relationships. Most AI systems are only able to capture associations or correlations in data so interpreting them as casual would not be justified. In this paper, we present two experiment (total N = 364) exploring the effects of CF explanations of AI system's predictions on lay people's causal beliefs about the real world. In Experiment 1 we found that providing CF explanations of an AI system's predictions does indeed (unjustifiably) affect people's causal beliefs regarding factors/features the AI uses and that people are more likely to view them as causal factors in the real world. Inspired by the literature on misinformation and health warning messaging, Experiment 2 tested whether we can correct for the unjustified change in causal beliefs. We found that pointing out that AI systems capture correlations and not necessarily causal relationships can attenuate the effects of CF explanations on people's causal beliefs.
△ Less
Submitted 12 December, 2022; v1 submitted 12 May, 2022;
originally announced May 2022.
-
EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis
Authors:
Sven Buechel,
Udo Hahn
Abstract:
We describe EmoBank, a corpus of 10k English sentences balancing multiple genres, which we annotated with dimensional emotion metadata in the Valence-Arousal-Dominance (VAD) representation format. EmoBank excels with a bi-perspectival and bi-representational design. On the one hand, we distinguish between writer's and reader's emotions, on the other hand, a subset of the corpus complements dimensi…
▽ More
We describe EmoBank, a corpus of 10k English sentences balancing multiple genres, which we annotated with dimensional emotion metadata in the Valence-Arousal-Dominance (VAD) representation format. EmoBank excels with a bi-perspectival and bi-representational design. On the one hand, we distinguish between writer's and reader's emotions, on the other hand, a subset of the corpus complements dimensional VAD annotations with categorical ones based on Basic Emotions. We find evidence for the supremacy of the reader's perspective in terms of IAA and rating intensity, and achieve close-to-human performance when map** between dimensional and categorical formats.
△ Less
Submitted 4 May, 2022;
originally announced May 2022.
-
Semiparametric point process modeling of blinking artifacts in PALM
Authors:
Louis G. Jensen,
David J. Williamson,
Ute Hahn
Abstract:
Photoactivated localization microscopy (PALM) is a powerful imaging technique for characterization of protein organization in biological cells. Due to the stochastic blinking of fluorescent probes, and camera discretization effects, each protein gives rise to a cluster of artificial observations. These blinking artifacts are an obstacle for quantitative analysis of PALM data, and tools for their c…
▽ More
Photoactivated localization microscopy (PALM) is a powerful imaging technique for characterization of protein organization in biological cells. Due to the stochastic blinking of fluorescent probes, and camera discretization effects, each protein gives rise to a cluster of artificial observations. These blinking artifacts are an obstacle for quantitative analysis of PALM data, and tools for their correction are in high demand. We develop the Independent Blinking Cluster point process (IBCpp) family of models, which is suited for modeling of data from single-molecule localization microscopy modalities, and we present results on the mark correlation function. We then construct the PALM-IBCpp - a semiparametric IBCpp tailored for PALM data, and we describe a procedure for estimation of parameters, which can be used without parametric assumptions on the spatial organization of proteins. Our model is validated on nuclear pore complex reference data, where the ground truth was accurately recovered, and we demonstrate how the estimated blinking parameters can be used to perform a blinking corrected test for protein clustering in a cell expressing the adaptor protein LAT. Finally, we consider simulations with varying degrees of blinking and protein clustering to shed light on the expected performance in a range of realistic settings.
△ Less
Submitted 22 October, 2021; v1 submitted 28 January, 2021;
originally announced January 2021.
-
Towards Label-Agnostic Emotion Embeddings
Authors:
Sven Buechel,
Luise Modersohn,
Udo Hahn
Abstract:
Research in emotion analysis is scattered across different label formats (e.g., polarity types, basic emotion categories, and affective dimensions), linguistic levels (word vs. sentence vs. discourse), and, of course, (few well-resourced but much more under-resourced) natural languages and text genres (e.g., product reviews, tweets, news). The resulting heterogeneity makes data and software develo…
▽ More
Research in emotion analysis is scattered across different label formats (e.g., polarity types, basic emotion categories, and affective dimensions), linguistic levels (word vs. sentence vs. discourse), and, of course, (few well-resourced but much more under-resourced) natural languages and text genres (e.g., product reviews, tweets, news). The resulting heterogeneity makes data and software developed under these conflicting constraints hard to compare and challenging to integrate. To resolve this unsatisfactory state of affairs we here propose a training scheme that learns a shared latent representation of emotion independent from different label formats, natural languages, and even disparate model architectures. Experiments on a wide range of datasets indicate that this approach yields the desired interoperability without penalizing prediction quality. Code and data are archived under DOI 10.5281/zenodo.5466068.
△ Less
Submitted 6 November, 2021; v1 submitted 30 November, 2020;
originally announced December 2020.
-
GGPONC: A Corpus of German Medical Text with Rich Metadata Based on Clinical Practice Guidelines
Authors:
Florian Borchert,
Christina Lohr,
Luise Modersohn,
Thomas Langer,
Markus Follmann,
Jan Philipp Sachs,
Udo Hahn,
Matthieu-P. Schapranow
Abstract:
The lack of publicly accessible text corpora is a major obstacle for progress in natural language processing. For medical applications, unfortunately, all language communities other than English are low-resourced. In this work, we present GGPONC (German Guideline Program in Oncology NLP Corpus), a freely distributable German language corpus based on clinical practice guidelines for oncology. This…
▽ More
The lack of publicly accessible text corpora is a major obstacle for progress in natural language processing. For medical applications, unfortunately, all language communities other than English are low-resourced. In this work, we present GGPONC (German Guideline Program in Oncology NLP Corpus), a freely distributable German language corpus based on clinical practice guidelines for oncology. This corpus is one of the largest ever built from German medical documents. Unlike clinical documents, clinical guidelines do not contain any patient-related information and can therefore be used without data protection restrictions. Moreover, GGPONC is the first corpus for the German language covering diverse conditions in a large medical subfield and provides a variety of metadata, such as literature references and evidence levels. By applying and evaluating existing medical information extraction pipelines for German text, we are able to draw comparisons for the use of medical language to other corpora, medical and non-medical ones.
△ Less
Submitted 16 November, 2020; v1 submitted 13 July, 2020;
originally announced July 2020.
-
What Makes a Top-Performing Precision Medicine Search Engine? Tracing Main System Features in a Systematic Way
Authors:
Erik Faessler,
Michel Oleynik,
Udo Hahn
Abstract:
From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials. Despite lots of performance measurements carried out in these evaluation campaigns, the scientific community is still pretty unsure about the impact individual system features and their weights have on the overall system performa…
▽ More
From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials. Despite lots of performance measurements carried out in these evaluation campaigns, the scientific community is still pretty unsure about the impact individual system features and their weights have on the overall system performance. In order to overcome this explanatory gap, we first determined optimal feature configurations using the Sequential Model-based Algorithm Configuration (SMAC) program and applied its output to a BM25-based search engine. We then ran an ablation study to systematically assess the individual contributions of relevant system features: BM25 parameters, query type and weighting schema, query expansion, stop word filtering, and keyword boosting. For evaluation, we employed the gold standard data from the three TREC-PM installments to evaluate the effectiveness of different features using the commonly shared infNDCG metric.
△ Less
Submitted 5 June, 2020; v1 submitted 4 June, 2020;
originally announced June 2020.
-
Learning and Evaluating Emotion Lexicons for 91 Languages
Authors:
Sven Buechel,
Susanna Rücker,
Udo Hahn
Abstract:
Emotion lexicons describe the affective meaning of words and thus constitute a centerpiece for advanced sentiment and emotion analysis. Yet, manually curated lexicons are only available for a handful of languages, leaving most languages of the world without such a precious resource for downstream applications. Even worse, their coverage is often limited both in terms of the lexical units they cont…
▽ More
Emotion lexicons describe the affective meaning of words and thus constitute a centerpiece for advanced sentiment and emotion analysis. Yet, manually curated lexicons are only available for a handful of languages, leaving most languages of the world without such a precious resource for downstream applications. Even worse, their coverage is often limited both in terms of the lexical units they contain and the emotional variables they feature. In order to break this bottleneck, we here introduce a methodology for creating almost arbitrarily large emotion lexicons for any target language. Our approach requires nothing but a source language emotion lexicon, a bilingual word translation model, and a target language embedding model. Fulfilling these requirements for 91 languages, we are able to generate representationally rich high-coverage lexicons comprising eight emotional variables with more than 100k lexical entries each. We evaluated the automatically generated lexicons against human judgment from 26 datasets, spanning 12 typologically diverse languages, and found that our approach produces results in line with state-of-the-art monolingual approaches to lexicon creation and even surpasses human reliability for some languages and variables. Code and data are available at https://github.com/JULIELab/MEmoLon archived under DOI https://doi.org/10.5281/zenodo.3779901.
△ Less
Submitted 12 May, 2020;
originally announced May 2020.
-
BARD: A structured technique for group elicitation of Bayesian networks to support analytic reasoning
Authors:
Ann E. Nicholson,
Kevin B. Korb,
Erik P. Nyberg,
Michael Wybrow,
Ingrid Zukerman,
Steven Mascaro,
Shreshth Thakur,
Abraham Oshni Alvandi,
Jeff Riley,
Ross Pearson,
Shane Morris,
Matthieu Herrmann,
A. K. M. Azad,
Fergus Bolger,
Ulrike Hahn,
David Lagnado
Abstract:
In many complex, real-world situations, problem solving and decision making require effective reasoning about causation and uncertainty. However, human reasoning in these cases is prone to confusion and error. Bayesian networks (BNs) are an artificial intelligence technology that models uncertain situations, supporting probabilistic and causal reasoning and decision making. However, to date, BN me…
▽ More
In many complex, real-world situations, problem solving and decision making require effective reasoning about causation and uncertainty. However, human reasoning in these cases is prone to confusion and error. Bayesian networks (BNs) are an artificial intelligence technology that models uncertain situations, supporting probabilistic and causal reasoning and decision making. However, to date, BN methodologies and software require significant upfront training, do not provide much guidance on the model building process, and do not support collaboratively building BNs. BARD (Bayesian ARgumentation via Delphi) is both a methodology and an expert system that utilises (1) BNs as the underlying structured representations for better argument analysis, (2) a multi-user web-based software platform and Delphi-style social processes to assist with collaboration, and (3) short, high-quality e-courses on demand, a highly structured process to guide BN construction, and a variety of helpful tools to assist in building and reasoning with BNs, including an automated explanation tool to assist effective report writing. The result is an end-to-end online platform, with associated online training, for groups without prior BN expertise to understand and analyse a problem, build a model of its underlying probabilistic causal structure, validate and reason with the causal model, and use it to produce a written analytic report. Initial experimental results demonstrate that BARD aids in problem solving, reasoning and collaboration.
△ Less
Submitted 2 March, 2020;
originally announced March 2020.
-
The Temporal Dynamics of Belief-based Updating of Epistemic Trust: Light at the End of the Tunnel?
Authors:
Momme von Sydow,
Christoph Merdes,
Ulrike Hahn
Abstract:
We start with the distinction of outcome- and belief-based Bayesian models of the sequential update of agents' beliefs and subjective reliability of sources (trust). We then focus on discussing the influential Bayesian model of belief-based trust update by Eric Olsson, which models dichotomic events and explicitly represents anti-reliability. After sketching some disastrous recent results for this…
▽ More
We start with the distinction of outcome- and belief-based Bayesian models of the sequential update of agents' beliefs and subjective reliability of sources (trust). We then focus on discussing the influential Bayesian model of belief-based trust update by Eric Olsson, which models dichotomic events and explicitly represents anti-reliability. After sketching some disastrous recent results for this perhaps most promising model of belief update, we show new simulation results for the temporal dynamics of learning belief with and without trust update and with and without communication. The results seem to shed at least a somewhat more positive light on the communicating-and-trust-updating agents. This may be a light at the end of the tunnel of belief-based models of trust updating, but the interpretation of the clear findings is much less clear.
△ Less
Submitted 24 December, 2019;
originally announced December 2019.
-
A Time Series Analysis of Emotional Loading in Central Bank Statements
Authors:
Sven Buechel,
Simon Junker,
Thore Schlaak,
Claus Michelsen,
Udo Hahn
Abstract:
We examine the affective content of central bank press statements using emotion analysis. Our focus is on two major international players, the European Central Bank (ECB) and the US Federal Reserve Bank (Fed), covering a time span from 1998 through 2019. We reveal characteristic patterns in the emotional dimensions of valence, arousal, and dominance and find---despite the commonly established atti…
▽ More
We examine the affective content of central bank press statements using emotion analysis. Our focus is on two major international players, the European Central Bank (ECB) and the US Federal Reserve Bank (Fed), covering a time span from 1998 through 2019. We reveal characteristic patterns in the emotional dimensions of valence, arousal, and dominance and find---despite the commonly established attitude that emotional wording in central bank communication should be avoided---a correlation between the state of the economy and particularly the dominance dimension in the press releases under scrutiny and, overall, an impact of the president in office.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
The Influence of Down-Sampling Strategies on SVD Word Embedding Stability
Authors:
Johannes Hellrich,
Bernd Kampe,
Udo Hahn
Abstract:
The stability of word embedding algorithms, i.e., the consistency of the word representations they reveal when trained repeatedly on the same data set, has recently raised concerns. We here compare word embedding algorithms on three corpora of different sizes, and evaluate both their stability and accuracy. We find strong evidence that down-sampling strategies (used as part of their training proce…
▽ More
The stability of word embedding algorithms, i.e., the consistency of the word representations they reveal when trained repeatedly on the same data set, has recently raised concerns. We here compare word embedding algorithms on three corpora of different sizes, and evaluate both their stability and accuracy. We find strong evidence that down-sampling strategies (used as part of their training procedures) are particularly influential for the stability of SVDPPMI-type embeddings. This finding seems to explain diverging reports on their stability and lead us to a simple modification which provides superior stability as well as accuracy on par with skip-gram embeddings.
△ Less
Submitted 8 April, 2019; v1 submitted 21 August, 2018;
originally announced August 2018.
-
JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion
Authors:
Johannes Hellrich,
Sven Buechel,
Udo Hahn
Abstract:
We here introduce a substantially extended version of JeSemE, an interactive website for visually exploring computationally derived time-variant information on word meanings and lexical emotions assembled from five large diachronic text corpora. JeSemE is designed for scholars in the (digital) humanities as an alternative to consulting manually compiled, printed dictionaries for such information (…
▽ More
We here introduce a substantially extended version of JeSemE, an interactive website for visually exploring computationally derived time-variant information on word meanings and lexical emotions assembled from five large diachronic text corpora. JeSemE is designed for scholars in the (digital) humanities as an alternative to consulting manually compiled, printed dictionaries for such information (if available at all). This tool uniquely combines state-of-the-art distributional semantics with a nuanced model of human emotions, two information streams we deem beneficial for a data-driven interpretation of texts in the humanities.
△ Less
Submitted 18 December, 2020; v1 submitted 11 July, 2018;
originally announced July 2018.
-
Representation Map**: A Novel Approach to Generate High-Quality Multi-Lingual Emotion Lexicons
Authors:
Sven Buechel,
Udo Hahn
Abstract:
In the past years, sentiment analysis has increasingly shifted attention to representational frameworks more expressive than semantic polarity (being positive, negative or neutral). However, these richer formats (like Basic Emotions or Valence-Arousal-Dominance, and variants therefrom), rooted in psychological research, tend to proliferate the number of representation schemes for emotion encoding.…
▽ More
In the past years, sentiment analysis has increasingly shifted attention to representational frameworks more expressive than semantic polarity (being positive, negative or neutral). However, these richer formats (like Basic Emotions or Valence-Arousal-Dominance, and variants therefrom), rooted in psychological research, tend to proliferate the number of representation schemes for emotion encoding. Thus, a large amount of representationally incompatible emotion lexicons has been developed by various research groups adopting one or the other emotion representation format. As a consequence, the reusability of these resources decreases as does the comparability of systems using them. In this paper, we propose to solve this dilemma by methods and tools which map different representation formats onto each other for the sake of mutual compatibility and interoperability of language resources. We present the first large-scale investigation of such representation map**s for four typologically diverse languages and find evidence that our approach produces (near-)gold quality emotion lexicons, even in cross-lingual settings. Finally, we use our models to create new lexicons for eight typologically diverse languages.
△ Less
Submitted 2 July, 2018;
originally announced July 2018.
-
Emotion Representation Map** for Automatic Lexicon Construction (Mostly) Performs on Human Level
Authors:
Sven Buechel,
Udo Hahn
Abstract:
Emotion Representation Map** (ERM) has the goal to convert existing emotion ratings from one representation format into another one, e.g., map** Valence-Arousal-Dominance annotations for words or sentences into Ekman's Basic Emotions and vice versa. ERM can thus not only be considered as an alternative to Word Emotion Induction (WEI) techniques for automatic emotion lexicon construction but ma…
▽ More
Emotion Representation Map** (ERM) has the goal to convert existing emotion ratings from one representation format into another one, e.g., map** Valence-Arousal-Dominance annotations for words or sentences into Ekman's Basic Emotions and vice versa. ERM can thus not only be considered as an alternative to Word Emotion Induction (WEI) techniques for automatic emotion lexicon construction but may also help mitigate problems that come from the proliferation of emotion representation formats in recent years. We propose a new neural network approach to ERM that not only outperforms the previous state-of-the-art. Equally important, we present a refined evaluation methodology and gather strong evidence that our model yields results which are (almost) as reliable as human annotations, even in cross-lingual settings. Based on these results we generate new emotion ratings for 13 typologically diverse languages and claim that they have near-gold quality, at least.
△ Less
Submitted 22 June, 2018;
originally announced June 2018.
-
Modeling Word Emotion in Historical Language: Quantity Beats Supposed Stability in Seed Word Selection
Authors:
Johannes Hellrich,
Sven Buechel,
Udo Hahn
Abstract:
To understand historical texts, we must be aware that language -- including the emotional connotation attached to words -- changes over time. In this paper, we aim at estimating the emotion which is associated with a given word in former language stages of English and German. Emotion is represented following the popular Valence-Arousal-Dominance (VAD) annotation scheme. While being more expressive…
▽ More
To understand historical texts, we must be aware that language -- including the emotional connotation attached to words -- changes over time. In this paper, we aim at estimating the emotion which is associated with a given word in former language stages of English and German. Emotion is represented following the popular Valence-Arousal-Dominance (VAD) annotation scheme. While being more expressive than polarity alone, existing word emotion induction methods are typically not suited for addressing it. To overcome this limitation, we present adaptations of two popular algorithms to VAD. To measure their effectiveness in diachronic settings, we present the first gold standard for historical word emotions, which was created by scholars with proficiency in the respective language stages and covers both English and German. In contrast to claims in previous work, our findings indicate that hand-selecting small sets of seed words with supposedly stable emotional meaning is actually harmful rather than helpful.
△ Less
Submitted 8 April, 2019; v1 submitted 21 June, 2018;
originally announced June 2018.
-
A one-way ANOVA test for functional data with graphical interpretation
Authors:
Tomas Mrkvicka,
Mari Myllymaki,
Milan Jilek,
Ute Hahn
Abstract:
A new functional ANOVA test, with a graphical interpretation of the result, is presented. The test is an extension of the global envelope test introduced by Myllymaki et al. (2017, Global envelope tests for spatial processes, J. R. Statist. Soc. B 79, 381--404, doi: 10.1111/rssb.12172). The graphical interpretation is realized by a global envelope which is drawn jointly for all samples of function…
▽ More
A new functional ANOVA test, with a graphical interpretation of the result, is presented. The test is an extension of the global envelope test introduced by Myllymaki et al. (2017, Global envelope tests for spatial processes, J. R. Statist. Soc. B 79, 381--404, doi: 10.1111/rssb.12172). The graphical interpretation is realized by a global envelope which is drawn jointly for all samples of functions. If a mean function computed from the empirical data is out of the given envelope, the null hypothesis is rejected with the predetermined significance level $α$. The advantages of the proposed one-way functional ANOVA are that it identifies the domains of the functions which are responsible for the potential rejection. We introduce two versions of this test: the first gives a graphical interpretation of the test results in the original space of the functions and the second immediately offers a post-hoc test by identifying the significant pair-wise differences between groups. The proposed tests rely on discretization of the functions, therefore the tests are also applicable in the multidimensional ANOVA problem. In the empirical part of the article, we demonstrate the use of the method by analyzing fiscal decentralization in European countries. The aim of the empirical analysis is to capture differences between the levels of government expenditure decentralization ratio among different groups of European countries. The idea behind, based on the existing literature, is straightforward: countries with a longer European integration history are supposed to decentralize more of their government expenditure. We use the government expenditure centralization ratios of 29 European Union and EFTA countries in period from 1995 to 2016 sorted into three groups according to the presumed level of European economic and political integration.
△ Less
Submitted 22 May, 2020; v1 submitted 12 December, 2016;
originally announced December 2016.
-
Multiple Monte Carlo Testing with Applications in Spatial Point Processes
Authors:
Tomáš Mrkvička,
Mari Myllymäki,
Ute Hahn
Abstract:
The rank envelope test (Myllymäki et al., Global envelope tests for spatial processes, arXiv:1307.0239 [stat.ME]) is proposed as a solution to multiple testing problem for Monte Carlo tests. Three different situations are recognized: 1) a few univariate Monte Carlo tests, 2) a Monte Carlo test with a function as the test statistic, 3) several Monte Carlo tests with functions as test statistics. Th…
▽ More
The rank envelope test (Myllymäki et al., Global envelope tests for spatial processes, arXiv:1307.0239 [stat.ME]) is proposed as a solution to multiple testing problem for Monte Carlo tests. Three different situations are recognized: 1) a few univariate Monte Carlo tests, 2) a Monte Carlo test with a function as the test statistic, 3) several Monte Carlo tests with functions as test statistics. The rank test has correct (global) type I error in each case and it is accompanied with a $p$-value and with a graphical interpretation which shows which subtest or which distances of the used test function(s) lead to the rejection at the prescribed significance level of the test. Examples of null hypothesis from point process and random set statistics are used to demonstrate the strength of the rank envelope test. The examples include goodness-of-fit test with several test functions, goodness-of-fit test for one group of point patterns, comparison of several groups of point patterns, test of dependence of components in a multi-type point pattern, and test of Boolean assumption for random closed sets.
△ Less
Submitted 4 June, 2015;
originally announced June 2015.
-
Global envelope tests for spatial processes
Authors:
Mari Myllymäki,
Tomás Mrkvicka,
Pavel Grabarnik,
Henri Seijo,
Ute Hahn
Abstract:
Envelope tests are a popular tool in spatial statistics, where they are used in goodness-of-fit testing. These tests graphically compare an empirical function $T(r)$ with its simulated counterparts from the null model. However, the type I error probability $α$ is conventionally controlled for a fixed distance $r$ only, whereas the functions are inspected on an interval of distances $I$. In this st…
▽ More
Envelope tests are a popular tool in spatial statistics, where they are used in goodness-of-fit testing. These tests graphically compare an empirical function $T(r)$ with its simulated counterparts from the null model. However, the type I error probability $α$ is conventionally controlled for a fixed distance $r$ only, whereas the functions are inspected on an interval of distances $I$. In this study, we propose two approaches related to Barnard's Monte Carlo test for building global envelope tests on $I$:(1) ordering the empirical and simulated functions based on their $r$-wise ranks among each other, and (2) the construction of envelopes for a deviation test. These new tests allow the a priori selection of the global $α$ and they yield $p$-values. We illustrate these tests using simulated and real point pattern data.
△ Less
Submitted 3 November, 2015; v1 submitted 30 June, 2013;
originally announced July 2013.
-
Efficient calculation of local dose distribution for response modelling in proton and ion beams
Authors:
S. Greilich,
U. Hahn,
M. Kiderlen,
C. E. Andersen,
N. Bassler
Abstract:
We present an algorithm for fast and accurate computation of the local dose distribution in MeV beams of protons, carbon ions or other heavy-charged particles. It uses compound Poisson-process modelling of track interaction and succesive convolutions for fast computation. It can handle mixed particle fields over a wide range of fluences. Since the local dose distribution is the essential part of s…
▽ More
We present an algorithm for fast and accurate computation of the local dose distribution in MeV beams of protons, carbon ions or other heavy-charged particles. It uses compound Poisson-process modelling of track interaction and succesive convolutions for fast computation. It can handle mixed particle fields over a wide range of fluences. Since the local dose distribution is the essential part of several approaches to model detector efficiency or cellular response it has potential use in ion-beam dosimetry and radiotherapy.
△ Less
Submitted 2 June, 2013;
originally announced June 2013.
-
The Interaction between PEDOT/PSS Gate and sub-$μ$ low-k Insulator for All Printed OFETs
Authors:
Z. Shalabutov,
M. Friedrich,
I. Thurzo,
A. Hübler,
D. R. T. Zahn,
U. Hahn
Abstract:
In all printed OFETs with Poly(3,4-ethylenedioxythiophene)/ poly(styrenesulfonate) (PEDOT/ PSS) gate, offset printing and gravure of electrically dense sub-$μ$ insulators from polyvinylphenol (PVPh), polyvinyl alcohol (PVOH) and poly(methyl methacrylate) (PMMA) as well as other organic and inorganic materials turned out to be problematic due to the most reactive part of the gate material- the ne…
▽ More
In all printed OFETs with Poly(3,4-ethylenedioxythiophene)/ poly(styrenesulfonate) (PEDOT/ PSS) gate, offset printing and gravure of electrically dense sub-$μ$ insulators from polyvinylphenol (PVPh), polyvinyl alcohol (PVOH) and poly(methyl methacrylate) (PMMA) as well as other organic and inorganic materials turned out to be problematic due to the most reactive part of the gate material- the negative sulfonate ions from PSS. The present paper investigates the nature of the interaction between PSS and PVOH sub-$μ$ insulator explored by infrared spectroscopy and electrical methods. Some evidence is obtained that most probably OH- and not sulfonate ions are responsible for creating channels, penetrated by PEDOT/PSS nano dispersion applied as gate.
△ Less
Submitted 15 November, 2004;
originally announced November 2004.
-
Message-Passing Protocols for Real-World Parsing -- An Object-Oriented Model and its Preliminary Evaluation
Authors:
Udo Hahn,
Peter Neuhaus,
Norbert Broeker
Abstract:
We argue for a performance-based design of natural language grammars and their associated parsers in order to meet the constraints imposed by real-world NLP. Our approach incorporates declarative and procedural knowledge about language and language use within an object-oriented specification framework. We discuss several message-passing protocols for parsing and provide reasons for sacrificing c…
▽ More
We argue for a performance-based design of natural language grammars and their associated parsers in order to meet the constraints imposed by real-world NLP. Our approach incorporates declarative and procedural knowledge about language and language use within an object-oriented specification framework. We discuss several message-passing protocols for parsing and provide reasons for sacrificing completeness of the parse in favor of efficiency based on a preliminary empirical evaluation.
△ Less
Submitted 23 September, 1997;
originally announced September 1997.
-
Centering in-the-large: Computing referential discourse segments
Authors:
Udo Hahn,
Michael Strube
Abstract:
We specify an algorithm that builds up a hierarchy of referential discourse segments from local centering data. The spatial extension and nesting of these discourse segments constrain the reachability of potential antecedents of an anaphoric expression beyond the local level of adjacent center pairs. Thus, the centering model is scaled up to the level of the global referential structure of disco…
▽ More
We specify an algorithm that builds up a hierarchy of referential discourse segments from local centering data. The spatial extension and nesting of these discourse segments constrain the reachability of potential antecedents of an anaphoric expression beyond the local level of adjacent center pairs. Thus, the centering model is scaled up to the level of the global referential structure of discourse. An empirical evaluation of the algorithm is supplied.
△ Less
Submitted 30 April, 1997;
originally announced April 1997.
-
Incremental Centering and Center Ambiguity
Authors:
Udo Hahn,
Michael Strube
Abstract:
In this paper, we present a model of anaphor resolution within the framework of the centering model. The consideration of an incremental processing mode introduces the need to manage structural ambiguity at the center level. Hence, the centering framework is further refined to account for local and global parsing ambiguities which propagate up to the level of center representations, yielding mod…
▽ More
In this paper, we present a model of anaphor resolution within the framework of the centering model. The consideration of an incremental processing mode introduces the need to manage structural ambiguity at the center level. Hence, the centering framework is further refined to account for local and global parsing ambiguities which propagate up to the level of center representations, yielding moderately adapted data structures for the centering algorithm.
△ Less
Submitted 16 May, 1996;
originally announced May 1996.
-
Restricted Parallelism in Object-Oriented Lexical Parsing
Authors:
Peter Neuhaus,
Udo Hahn
Abstract:
We present an approach to parallel natural language parsing which is based on a concurrent, object-oriented model of computation. A depth-first, yet incomplete parsing algorithm for a dependency grammar is specified and several restrictions on the degree of its parallelization are discussed.
We present an approach to parallel natural language parsing which is based on a concurrent, object-oriented model of computation. A depth-first, yet incomplete parsing algorithm for a dependency grammar is specified and several restrictions on the degree of its parallelization are discussed.
△ Less
Submitted 15 May, 1996;
originally announced May 1996.
-
Trading off Completeness for Efficiency --- The \textsc{ParseTalk} Performance Grammar Approach to Real-World Text Parsing
Authors:
Peter Neuhaus,
Udo Hahn
Abstract:
We argue for a performance-based design of natural language grammars and their associated parsers in order to meet the constraints posed by real-world natural language understanding. This approach incorporates declarative and procedural knowledge about language and language use within an object-oriented specification framework. We discuss several message passing protocols for real-world text par…
▽ More
We argue for a performance-based design of natural language grammars and their associated parsers in order to meet the constraints posed by real-world natural language understanding. This approach incorporates declarative and procedural knowledge about language and language use within an object-oriented specification framework. We discuss several message passing protocols for real-world text parsing and provide reasons for sacrificing completeness of the parse in favor of efficiency.
△ Less
Submitted 15 May, 1996;
originally announced May 1996.
-
A Conceptual Reasoning Approach to Textual Ellipsis
Authors:
Udo Hahn,
Katja Markert,
Michael Strube
Abstract:
We present a hybrid text understanding methodology for the resolution of textual ellipsis. It integrates conceptual criteria (based on the well-formedness and conceptual strength of role chains in a terminological knowledge base) and functional constraints reflecting the utterances' information structure (based on the distinction between context-bound and unbound discourse elements). The methodo…
▽ More
We present a hybrid text understanding methodology for the resolution of textual ellipsis. It integrates conceptual criteria (based on the well-formedness and conceptual strength of role chains in a terminological knowledge base) and functional constraints reflecting the utterances' information structure (based on the distinction between context-bound and unbound discourse elements). The methodological framework for text ellipsis resolution is the centering model that has been adapted to these constraints.
△ Less
Submitted 15 May, 1996;
originally announced May 1996.
-
Functional Centering
Authors:
Michael Strube,
Udo Hahn
Abstract:
Based on empirical evidence from a free word order language (German) we propose a fundamental revision of the principles guiding the ordering of discourse entities in the forward-looking centers within the centering model. We claim that grammatical role criteria should be replaced by indicators of the functional information structure of the utterances, i.e., the distinction between context-bound…
▽ More
Based on empirical evidence from a free word order language (German) we propose a fundamental revision of the principles guiding the ordering of discourse entities in the forward-looking centers within the centering model. We claim that grammatical role criteria should be replaced by indicators of the functional information structure of the utterances, i.e., the distinction between context-bound and unbound discourse elements. This claim is backed up by an empirical evaluation of functional centering.
△ Less
Submitted 14 May, 1996;
originally announced May 1996.
-
Where Defaults Don't Help: the Case of the German Plural System
Authors:
Ramin Charles Nakisa,
Ulrike Hahn
Abstract:
The German plural system has become a focal point for conflicting theories of language, both linguistic and cognitive. We present simulation results with three simple classifiers - an ordinary nearest neighbour algorithm, Nosofsky's `Generalized Context Model' (GCM) and a standard, three-layer backprop network - predicting the plural class from a phonological representation of the singular in Ge…
▽ More
The German plural system has become a focal point for conflicting theories of language, both linguistic and cognitive. We present simulation results with three simple classifiers - an ordinary nearest neighbour algorithm, Nosofsky's `Generalized Context Model' (GCM) and a standard, three-layer backprop network - predicting the plural class from a phonological representation of the singular in German. Though these are absolutely `minimal' models, in terms of architecture and input information, they nevertheless do remarkably well. The nearest neighbour predicts the correct plural class with an accuracy of 72% for a set of 24,640 nouns from the CELEX database. With a subset of 8,598 (non-compound) nouns, the nearest neighbour, the GCM and the network score 71.0%, 75.0% and 83.5%, respectively, on novel items. Furthermore, they outperform a hybrid, `pattern-associator + default rule', model, as proposed by Marcus et al. (1995), on this data set.
△ Less
Submitted 13 May, 1996;
originally announced May 1996.
-
ParseTalk about Textual Ellipsis
Authors:
Michael Strube,
Udo Hahn
Abstract:
A hybrid methodology for the resolution of text-level ellipsis is presented in this paper. It incorporates conceptual proximity criteria applied to ontologically well-engineered domain knowledge bases and an approach to centering based on functional topic/comment patterns. We state text grammatical predicates for ellipsis and then turn to the procedural aspects of their evaluation within the fra…
▽ More
A hybrid methodology for the resolution of text-level ellipsis is presented in this paper. It incorporates conceptual proximity criteria applied to ontologically well-engineered domain knowledge bases and an approach to centering based on functional topic/comment patterns. We state text grammatical predicates for ellipsis and then turn to the procedural aspects of their evaluation within the framework of an actor-based implementation of a lexically distributed parser.
△ Less
Submitted 28 September, 1995;
originally announced September 1995.
-
ParseTalk about Sentence- and Text-Level Anaphora
Authors:
Michael Strube,
Udo Hahn
Abstract:
We provide a unified account of sentence-level and text-level anaphora within the framework of a dependency-based grammar model. Criteria for anaphora resolution within sentence boundaries rephrase major concepts from GB's binding theory, while those for text-level anaphora incorporate an adapted version of a Grosz-Sidner-style focus model.
We provide a unified account of sentence-level and text-level anaphora within the framework of a dependency-based grammar model. Criteria for anaphora resolution within sentence boundaries rephrase major concepts from GB's binding theory, while those for text-level anaphora incorporate an adapted version of a Grosz-Sidner-style focus model.
△ Less
Submitted 3 March, 1995;
originally announced March 1995.
-
Concurrent Lexicalized Dependency Parsing: A Behavioral View on ParseTalk Events
Authors:
Susanne Schacht,
Udo Hahn,
Norbert Broeker
Abstract:
The behavioral specification of an object-oriented grammar model is considered. The model is based on full lexicalization, head-orientation via valency constraints and dependency relations, inheritance as a means for non-redundant lexicon specification, and concurrency of computation. The computation model relies upon the actor paradigm, with concurrency entering through asynchronous message pas…
▽ More
The behavioral specification of an object-oriented grammar model is considered. The model is based on full lexicalization, head-orientation via valency constraints and dependency relations, inheritance as a means for non-redundant lexicon specification, and concurrency of computation. The computation model relies upon the actor paradigm, with concurrency entering through asynchronous message passing between actors. In particular, we here elaborate on principles of how the global behavior of a lexically distributed grammar and its corresponding parser can be specified in terms of event type networks and event networks, resp.
△ Less
Submitted 24 October, 1994;
originally announced October 1994.
-
Concurrent Lexicalized Dependency Parsing: The ParseTalk Model
Authors:
Norbert Broeker,
Udo Hahn,
Susanne Schacht
Abstract:
A grammar model for concurrent, object-oriented natural language parsing is introduced. Complete lexical distribution of grammatical knowledge is achieved building upon the head-oriented notions of valency and dependency, while inheritance mechanisms are used to capture lexical generalizations. The underlying concurrent computation model relies upon the actor paradigm. We consider message passin…
▽ More
A grammar model for concurrent, object-oriented natural language parsing is introduced. Complete lexical distribution of grammatical knowledge is achieved building upon the head-oriented notions of valency and dependency, while inheritance mechanisms are used to capture lexical generalizations. The underlying concurrent computation model relies upon the actor paradigm. We consider message passing protocols for establishing dependency relations and ambiguity handling.
△ Less
Submitted 24 October, 1994;
originally announced October 1994.