Search | arXiv e-print repository

Enabling the Digital Democratic Revival: A Research Program for Digital Democracy

Authors: Davide Grossi, Ulrike Hahn, Michael Mäs, Andreas Nitsche, Jan Behrens, Niclas Boehmer, Markus Brill, Ulle Endriss, Umberto Grandi, Adrian Haret, Jobst Heitzig, Nicolien Janssens, Catholijn M. Jonker, Marijn A. Keijzer, Axel Kistner, Martin Lackner, Alexandra Lieben, Anna Mikhaylovskaya, Pradeep K. Murukannaiah, Carlo Proietti, Manon Revel, Élise Rouméas, Ehud Shapiro, Gogulapati Sreedurga, Björn Swierczek , et al. (4 additional authors not shown)

Abstract: This white paper outlines a long-term scientific vision for the development of digital-democracy technology. We contend that if digital democracy is to meet the ambition of enabling a participatory renewal in our societies, then a comprehensive multi-methods research effort is required that could, over the years, support its development in a democratically principled, empirically and computational… ▽ More This white paper outlines a long-term scientific vision for the development of digital-democracy technology. We contend that if digital democracy is to meet the ambition of enabling a participatory renewal in our societies, then a comprehensive multi-methods research effort is required that could, over the years, support its development in a democratically principled, empirically and computationally informed way. The paper is co-authored by an international and interdisciplinary team of researchers and arose from the Lorentz Center Workshop on ``Algorithmic Technology for Democracy'' (Leiden, October 2022). △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2311.09254 [pdf, other]

A Bayesian Agent-Based Framework for Argument Exchange Across Networks

Authors: Leon Assaad, Rafael Fuchs, Ammar Jalalimanesh, Kirsty Phillips, Leon Schoeppl, Ulrike Hahn

Abstract: In this paper, we introduce a new framework for modelling the exchange of multiple arguments across agents in a social network. To date, most modelling work concerned with opinion dynamics, testimony, or communication across social networks has involved only the simulated exchange of a single opinion or single claim. By contrast, real-world debate involves the provision of numerous individual argu… ▽ More In this paper, we introduce a new framework for modelling the exchange of multiple arguments across agents in a social network. To date, most modelling work concerned with opinion dynamics, testimony, or communication across social networks has involved only the simulated exchange of a single opinion or single claim. By contrast, real-world debate involves the provision of numerous individual arguments relevant to such an opinion. This may include arguments both for and against, and arguments varying in strength. This prompts the need for appropriate aggregation rules for combining diverse evidence as well as rules for communication. Here, we draw on the Bayesian framework to create an agent-based modelling environment that allows the study of belief dynamics across complex domains characterised by Bayesian Networks. Initial case studies illustrate the scope of the framework. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2308.07871 [pdf, other]

Emotion Embeddings $\unicode{x2014}$ Learning Stable and Homogeneous Abstractions from Heterogeneous Affective Datasets

Authors: Sven Buechel, Udo Hahn

Abstract: Human emotion is expressed in many communication modalities and media formats and so their computational study is equally diversified into natural language processing, audio signal analysis, computer vision, etc. Similarly, the large variety of representation formats used in previous research to describe emotions (polarity scales, basic emotion categories, dimensional approaches, appraisal theory,… ▽ More Human emotion is expressed in many communication modalities and media formats and so their computational study is equally diversified into natural language processing, audio signal analysis, computer vision, etc. Similarly, the large variety of representation formats used in previous research to describe emotions (polarity scales, basic emotion categories, dimensional approaches, appraisal theory, etc.) have led to an ever proliferating diversity of datasets, predictive models, and software tools for emotion analysis. Because of these two distinct types of heterogeneity, at the expressional and representational level, there is a dire need to unify previous work on increasingly diverging data and label types. This article presents such a unifying computational model. We propose a training procedure that learns a shared latent representation for emotions, so-called emotion embeddings, independent of different natural languages, communication modalities, media or representation label formats, and even disparate model architectures. Experiments on a wide range of heterogeneous affective datasets indicate that this approach yields the desired interoperability for the sake of reusability, interpretability and flexibility, without penalizing prediction quality. Code and data are archived under https://doi.org/10.5281/zenodo.7405327 . △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: 18 pages, 6 figures

arXiv:2212.14714 [pdf]

Managing Expert Disagreement for the Policy Process and Beyond

Authors: Ulrike Hahn, Jens Koed Madsen, Chris Reed

Abstract: In this paper, we outline a new proposal for communicating scientific debate to policymakers and other stakeholders in circumstances where there is substantial disagreement within the scientific literature. In those circumstances, it seems important to provide policy makers both with a useful, balanced summary that is representative of opinion in the field large, and to transparently communicate t… ▽ More In this paper, we outline a new proposal for communicating scientific debate to policymakers and other stakeholders in circumstances where there is substantial disagreement within the scientific literature. In those circumstances, it seems important to provide policy makers both with a useful, balanced summary that is representative of opinion in the field large, and to transparently communicate the actual evidence-base. To this end, we propose the compilation of argument maps through a collective intelligence process; these maps are then given to a wide sample of the relevant research community for evaluation and summary opinion in an IGM style IGM style poll (see igmchicago.org), which provides a representative view of opinion on the issue at stake within the wider scientific community. Policymakers then receive these two artefacts (map and poll) as their expert advice. Such a process would help overcome the resource limitations of the traditional expert advice process, while also providing greater balance by drawing on the expertise of researchers beyond the leading proponents of particular theories within a field. And, the actual evidence base would be transparent. In this paper, we present a pilot project step** through the map building component of such a policy advice scheme. We detail process, products, and issues encountered by implementing in the OVA (Online Visualisation of Argument tool, ova.arg-tech.org) an argument map with sample evidence from the behavioural literature on communicating probabilities, as a central issue within pandemic. △ Less

Submitted 28 December, 2022; originally announced December 2022.

Comments: 12 pages, 1 Figure

arXiv:2205.06241 [pdf, other]

doi 10.1016/j.patter.2022.100635

Can counterfactual explanations of AI systems' predictions skew lay users' causal intuitions about the world? If so, can we correct for that?

Authors: Marko Tesic, Ulrike Hahn

Abstract: Counterfactual (CF) explanations have been employed as one of the modes of explainability in explainable AI-both to increase the transparency of AI systems and to provide recourse. Cognitive science and psychology, however, have pointed out that people regularly use CFs to express causal relationships. Most AI systems are only able to capture associations or correlations in data so interpreting th… ▽ More Counterfactual (CF) explanations have been employed as one of the modes of explainability in explainable AI-both to increase the transparency of AI systems and to provide recourse. Cognitive science and psychology, however, have pointed out that people regularly use CFs to express causal relationships. Most AI systems are only able to capture associations or correlations in data so interpreting them as casual would not be justified. In this paper, we present two experiment (total N = 364) exploring the effects of CF explanations of AI system's predictions on lay people's causal beliefs about the real world. In Experiment 1 we found that providing CF explanations of an AI system's predictions does indeed (unjustifiably) affect people's causal beliefs regarding factors/features the AI uses and that people are more likely to view them as causal factors in the real world. Inspired by the literature on misinformation and health warning messaging, Experiment 2 tested whether we can correct for the unjustified change in causal beliefs. We found that pointing out that AI systems capture correlations and not necessarily causal relationships can attenuate the effects of CF explanations on people's causal beliefs. △ Less

Submitted 12 December, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

Journal ref: Patterns, 3(12), 2022

arXiv:2205.01996 [pdf, other]

EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis

Authors: Sven Buechel, Udo Hahn

Abstract: We describe EmoBank, a corpus of 10k English sentences balancing multiple genres, which we annotated with dimensional emotion metadata in the Valence-Arousal-Dominance (VAD) representation format. EmoBank excels with a bi-perspectival and bi-representational design. On the one hand, we distinguish between writer's and reader's emotions, on the other hand, a subset of the corpus complements dimensi… ▽ More We describe EmoBank, a corpus of 10k English sentences balancing multiple genres, which we annotated with dimensional emotion metadata in the Valence-Arousal-Dominance (VAD) representation format. EmoBank excels with a bi-perspectival and bi-representational design. On the one hand, we distinguish between writer's and reader's emotions, on the other hand, a subset of the corpus complements dimensional VAD annotations with categorical ones based on Basic Emotions. We find evidence for the supremacy of the reader's perspective in terms of IAA and rating intensity, and achieve close-to-human performance when map** between dimensional and categorical formats. △ Less

Submitted 4 May, 2022; originally announced May 2022.

Comments: Originally published at EACL 2017. Revised version with fixed typos and an additional appendix featuring the original instructions of the dataset

arXiv:2101.12285 [pdf, other]

Semiparametric point process modeling of blinking artifacts in PALM

Authors: Louis G. Jensen, David J. Williamson, Ute Hahn

Abstract: Photoactivated localization microscopy (PALM) is a powerful imaging technique for characterization of protein organization in biological cells. Due to the stochastic blinking of fluorescent probes, and camera discretization effects, each protein gives rise to a cluster of artificial observations. These blinking artifacts are an obstacle for quantitative analysis of PALM data, and tools for their c… ▽ More Photoactivated localization microscopy (PALM) is a powerful imaging technique for characterization of protein organization in biological cells. Due to the stochastic blinking of fluorescent probes, and camera discretization effects, each protein gives rise to a cluster of artificial observations. These blinking artifacts are an obstacle for quantitative analysis of PALM data, and tools for their correction are in high demand. We develop the Independent Blinking Cluster point process (IBCpp) family of models, which is suited for modeling of data from single-molecule localization microscopy modalities, and we present results on the mark correlation function. We then construct the PALM-IBCpp - a semiparametric IBCpp tailored for PALM data, and we describe a procedure for estimation of parameters, which can be used without parametric assumptions on the spatial organization of proteins. Our model is validated on nuclear pore complex reference data, where the ground truth was accurately recovered, and we demonstrate how the estimated blinking parameters can be used to perform a blinking corrected test for protein clustering in a cell expressing the adaptor protein LAT. Finally, we consider simulations with varying degrees of blinking and protein clustering to shed light on the expected performance in a range of realistic settings. △ Less

Submitted 22 October, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

Comments: 36 pages, 8 figures

arXiv:2012.00190 [pdf, other]

Towards Label-Agnostic Emotion Embeddings

Authors: Sven Buechel, Luise Modersohn, Udo Hahn

Abstract: Research in emotion analysis is scattered across different label formats (e.g., polarity types, basic emotion categories, and affective dimensions), linguistic levels (word vs. sentence vs. discourse), and, of course, (few well-resourced but much more under-resourced) natural languages and text genres (e.g., product reviews, tweets, news). The resulting heterogeneity makes data and software develo… ▽ More Research in emotion analysis is scattered across different label formats (e.g., polarity types, basic emotion categories, and affective dimensions), linguistic levels (word vs. sentence vs. discourse), and, of course, (few well-resourced but much more under-resourced) natural languages and text genres (e.g., product reviews, tweets, news). The resulting heterogeneity makes data and software developed under these conflicting constraints hard to compare and challenging to integrate. To resolve this unsatisfactory state of affairs we here propose a training scheme that learns a shared latent representation of emotion independent from different label formats, natural languages, and even disparate model architectures. Experiments on a wide range of datasets indicate that this approach yields the desired interoperability without penalizing prediction quality. Code and data are archived under DOI 10.5281/zenodo.5466068. △ Less

Submitted 6 November, 2021; v1 submitted 30 November, 2020; originally announced December 2020.

Comments: EMNLP 2021 camera-ready version

arXiv:2007.06400 [pdf, other]

GGPONC: A Corpus of German Medical Text with Rich Metadata Based on Clinical Practice Guidelines

Authors: Florian Borchert, Christina Lohr, Luise Modersohn, Thomas Langer, Markus Follmann, Jan Philipp Sachs, Udo Hahn, Matthieu-P. Schapranow

Abstract: The lack of publicly accessible text corpora is a major obstacle for progress in natural language processing. For medical applications, unfortunately, all language communities other than English are low-resourced. In this work, we present GGPONC (German Guideline Program in Oncology NLP Corpus), a freely distributable German language corpus based on clinical practice guidelines for oncology. This… ▽ More The lack of publicly accessible text corpora is a major obstacle for progress in natural language processing. For medical applications, unfortunately, all language communities other than English are low-resourced. In this work, we present GGPONC (German Guideline Program in Oncology NLP Corpus), a freely distributable German language corpus based on clinical practice guidelines for oncology. This corpus is one of the largest ever built from German medical documents. Unlike clinical documents, clinical guidelines do not contain any patient-related information and can therefore be used without data protection restrictions. Moreover, GGPONC is the first corpus for the German language covering diverse conditions in a large medical subfield and provides a variety of metadata, such as literature references and evidence levels. By applying and evaluating existing medical information extraction pipelines for German text, we are able to draw comparisons for the use of medical language to other corpora, medical and non-medical ones. △ Less

Submitted 16 November, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

Comments: LOUHI Workshop @ EMNLP '20. Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis. 2020

arXiv:2006.02785 [pdf, other]

doi 10.1145/3397271.3401048

What Makes a Top-Performing Precision Medicine Search Engine? Tracing Main System Features in a Systematic Way

Authors: Erik Faessler, Michel Oleynik, Udo Hahn

Abstract: From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials. Despite lots of performance measurements carried out in these evaluation campaigns, the scientific community is still pretty unsure about the impact individual system features and their weights have on the overall system performa… ▽ More From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials. Despite lots of performance measurements carried out in these evaluation campaigns, the scientific community is still pretty unsure about the impact individual system features and their weights have on the overall system performance. In order to overcome this explanatory gap, we first determined optimal feature configurations using the Sequential Model-based Algorithm Configuration (SMAC) program and applied its output to a BM25-based search engine. We then ran an ablation study to systematically assess the individual contributions of relevant system features: BM25 parameters, query type and weighting schema, query expansion, stop word filtering, and keyword boosting. For evaluation, we employed the gold standard data from the three TREC-PM installments to evaluate the effectiveness of different features using the commonly shared infNDCG metric. △ Less

Submitted 5 June, 2020; v1 submitted 4 June, 2020; originally announced June 2020.

Comments: Accepted for SIGIR2020, 10 pages

MSC Class: 68P20 (Primary); 92C50; 92Dxx (Secondary) ACM Class: H.3.3; H.3.1; J.3

arXiv:2005.05672 [pdf, other]

Learning and Evaluating Emotion Lexicons for 91 Languages

Authors: Sven Buechel, Susanna Rücker, Udo Hahn

Abstract: Emotion lexicons describe the affective meaning of words and thus constitute a centerpiece for advanced sentiment and emotion analysis. Yet, manually curated lexicons are only available for a handful of languages, leaving most languages of the world without such a precious resource for downstream applications. Even worse, their coverage is often limited both in terms of the lexical units they cont… ▽ More Emotion lexicons describe the affective meaning of words and thus constitute a centerpiece for advanced sentiment and emotion analysis. Yet, manually curated lexicons are only available for a handful of languages, leaving most languages of the world without such a precious resource for downstream applications. Even worse, their coverage is often limited both in terms of the lexical units they contain and the emotional variables they feature. In order to break this bottleneck, we here introduce a methodology for creating almost arbitrarily large emotion lexicons for any target language. Our approach requires nothing but a source language emotion lexicon, a bilingual word translation model, and a target language embedding model. Fulfilling these requirements for 91 languages, we are able to generate representationally rich high-coverage lexicons comprising eight emotional variables with more than 100k lexical entries each. We evaluated the automatically generated lexicons against human judgment from 26 datasets, spanning 12 typologically diverse languages, and found that our approach produces results in line with state-of-the-art monolingual approaches to lexicon creation and even surpasses human reliability for some languages and variables. Code and data are available at https://github.com/JULIELab/MEmoLon archived under DOI https://doi.org/10.5281/zenodo.3779901. △ Less

Submitted 12 May, 2020; originally announced May 2020.

Comments: ACL 2020 Camera-Ready

arXiv:2003.01207 [pdf, other]

doi 10.1111/risa.13759

BARD: A structured technique for group elicitation of Bayesian networks to support analytic reasoning

Authors: Ann E. Nicholson, Kevin B. Korb, Erik P. Nyberg, Michael Wybrow, Ingrid Zukerman, Steven Mascaro, Shreshth Thakur, Abraham Oshni Alvandi, Jeff Riley, Ross Pearson, Shane Morris, Matthieu Herrmann, A. K. M. Azad, Fergus Bolger, Ulrike Hahn, David Lagnado

Abstract: In many complex, real-world situations, problem solving and decision making require effective reasoning about causation and uncertainty. However, human reasoning in these cases is prone to confusion and error. Bayesian networks (BNs) are an artificial intelligence technology that models uncertain situations, supporting probabilistic and causal reasoning and decision making. However, to date, BN me… ▽ More In many complex, real-world situations, problem solving and decision making require effective reasoning about causation and uncertainty. However, human reasoning in these cases is prone to confusion and error. Bayesian networks (BNs) are an artificial intelligence technology that models uncertain situations, supporting probabilistic and causal reasoning and decision making. However, to date, BN methodologies and software require significant upfront training, do not provide much guidance on the model building process, and do not support collaboratively building BNs. BARD (Bayesian ARgumentation via Delphi) is both a methodology and an expert system that utilises (1) BNs as the underlying structured representations for better argument analysis, (2) a multi-user web-based software platform and Delphi-style social processes to assist with collaboration, and (3) short, high-quality e-courses on demand, a highly structured process to guide BN construction, and a variety of helpful tools to assist in building and reasoning with BNs, including an automated explanation tool to assist effective report writing. The result is an end-to-end online platform, with associated online training, for groups without prior BN expertise to understand and analyse a problem, build a model of its underlying probabilistic causal structure, validate and reason with the causal model, and use it to produce a written analytic report. Initial experimental results demonstrate that BARD aids in problem solving, reasoning and collaboration. △ Less

Submitted 2 March, 2020; originally announced March 2020.

arXiv:1912.13380 [pdf]

The Temporal Dynamics of Belief-based Updating of Epistemic Trust: Light at the End of the Tunnel?

Authors: Momme von Sydow, Christoph Merdes, Ulrike Hahn

Abstract: We start with the distinction of outcome- and belief-based Bayesian models of the sequential update of agents' beliefs and subjective reliability of sources (trust). We then focus on discussing the influential Bayesian model of belief-based trust update by Eric Olsson, which models dichotomic events and explicitly represents anti-reliability. After sketching some disastrous recent results for this… ▽ More We start with the distinction of outcome- and belief-based Bayesian models of the sequential update of agents' beliefs and subjective reliability of sources (trust). We then focus on discussing the influential Bayesian model of belief-based trust update by Eric Olsson, which models dichotomic events and explicitly represents anti-reliability. After sketching some disastrous recent results for this perhaps most promising model of belief update, we show new simulation results for the temporal dynamics of learning belief with and without trust update and with and without communication. The results seem to shed at least a somewhat more positive light on the communicating-and-trust-updating agents. This may be a light at the end of the tunnel of belief-based models of trust updating, but the interpretation of the clear findings is much less clear. △ Less

Submitted 24 December, 2019; originally announced December 2019.

Comments: 7 pages, 6 figures. The paper was presented 2019 at the TeaP in London and at the 41st Annual Meeting of the Cognitive Science Society in Montreal (Canada). We intend to submit an extended and improved version for publication in a journal

MSC Class: G.3; I.6.3; I.6.4; J.4; K.4.3 ACM Class: G.3; I.6.3; I.6.4; J.4; K.4.3

arXiv:1911.11522 [pdf, other]

doi 10.18653/v1/D19-5103

A Time Series Analysis of Emotional Loading in Central Bank Statements

Authors: Sven Buechel, Simon Junker, Thore Schlaak, Claus Michelsen, Udo Hahn

Abstract: We examine the affective content of central bank press statements using emotion analysis. Our focus is on two major international players, the European Central Bank (ECB) and the US Federal Reserve Bank (Fed), covering a time span from 1998 through 2019. We reveal characteristic patterns in the emotional dimensions of valence, arousal, and dominance and find---despite the commonly established atti… ▽ More We examine the affective content of central bank press statements using emotion analysis. Our focus is on two major international players, the European Central Bank (ECB) and the US Federal Reserve Bank (Fed), covering a time span from 1998 through 2019. We reveal characteristic patterns in the emotional dimensions of valence, arousal, and dominance and find---despite the commonly established attitude that emotional wording in central bank communication should be avoided---a correlation between the state of the economy and particularly the dominance dimension in the press releases under scrutiny and, overall, an impact of the president in office. △ Less

Submitted 26 November, 2019; originally announced November 2019.

Comments: Published at ECONLP 2019

Journal ref: Proceedings of the Second Workshop on Economics and Natural Language Processing @ EMNLP 2019, Hong Kong, November 4, 2019, pages 16-21

arXiv:1808.06810 [pdf, other]

The Influence of Down-Sampling Strategies on SVD Word Embedding Stability

Authors: Johannes Hellrich, Bernd Kampe, Udo Hahn

Abstract: The stability of word embedding algorithms, i.e., the consistency of the word representations they reveal when trained repeatedly on the same data set, has recently raised concerns. We here compare word embedding algorithms on three corpora of different sizes, and evaluate both their stability and accuracy. We find strong evidence that down-sampling strategies (used as part of their training proce… ▽ More The stability of word embedding algorithms, i.e., the consistency of the word representations they reveal when trained repeatedly on the same data set, has recently raised concerns. We here compare word embedding algorithms on three corpora of different sizes, and evaluate both their stability and accuracy. We find strong evidence that down-sampling strategies (used as part of their training procedures) are particularly influential for the stability of SVDPPMI-type embeddings. This finding seems to explain diverging reports on their stability and lead us to a simple modification which provides superior stability as well as accuracy on par with skip-gram embeddings. △ Less

Submitted 8 April, 2019; v1 submitted 21 August, 2018; originally announced August 2018.

Comments: Updated for RepEval 2019

arXiv:1807.04148 [pdf, other]

JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion

Authors: Johannes Hellrich, Sven Buechel, Udo Hahn

Abstract: We here introduce a substantially extended version of JeSemE, an interactive website for visually exploring computationally derived time-variant information on word meanings and lexical emotions assembled from five large diachronic text corpora. JeSemE is designed for scholars in the (digital) humanities as an alternative to consulting manually compiled, printed dictionaries for such information (… ▽ More We here introduce a substantially extended version of JeSemE, an interactive website for visually exploring computationally derived time-variant information on word meanings and lexical emotions assembled from five large diachronic text corpora. JeSemE is designed for scholars in the (digital) humanities as an alternative to consulting manually compiled, printed dictionaries for such information (if available at all). This tool uniquely combines state-of-the-art distributional semantics with a nuanced model of human emotions, two information streams we deem beneficial for a data-driven interpretation of texts in the humanities. △ Less

Submitted 18 December, 2020; v1 submitted 11 July, 2018; originally announced July 2018.

Comments: COLING 2018 System Demonstrations (camera-ready version)

arXiv:1807.00775 [pdf, other]

Representation Map**: A Novel Approach to Generate High-Quality Multi-Lingual Emotion Lexicons

Authors: Sven Buechel, Udo Hahn

Abstract: In the past years, sentiment analysis has increasingly shifted attention to representational frameworks more expressive than semantic polarity (being positive, negative or neutral). However, these richer formats (like Basic Emotions or Valence-Arousal-Dominance, and variants therefrom), rooted in psychological research, tend to proliferate the number of representation schemes for emotion encoding.… ▽ More In the past years, sentiment analysis has increasingly shifted attention to representational frameworks more expressive than semantic polarity (being positive, negative or neutral). However, these richer formats (like Basic Emotions or Valence-Arousal-Dominance, and variants therefrom), rooted in psychological research, tend to proliferate the number of representation schemes for emotion encoding. Thus, a large amount of representationally incompatible emotion lexicons has been developed by various research groups adopting one or the other emotion representation format. As a consequence, the reusability of these resources decreases as does the comparability of systems using them. In this paper, we propose to solve this dilemma by methods and tools which map different representation formats onto each other for the sake of mutual compatibility and interoperability of language resources. We present the first large-scale investigation of such representation map**s for four typologically diverse languages and find evidence that our approach produces (near-)gold quality emotion lexicons, even in cross-lingual settings. Finally, we use our models to create new lexicons for eight typologically diverse languages. △ Less

Submitted 2 July, 2018; originally announced July 2018.

Comments: Published in LREC 2018

arXiv:1806.08890 [pdf, other]

Emotion Representation Map** for Automatic Lexicon Construction (Mostly) Performs on Human Level

Authors: Sven Buechel, Udo Hahn

Abstract: Emotion Representation Map** (ERM) has the goal to convert existing emotion ratings from one representation format into another one, e.g., map** Valence-Arousal-Dominance annotations for words or sentences into Ekman's Basic Emotions and vice versa. ERM can thus not only be considered as an alternative to Word Emotion Induction (WEI) techniques for automatic emotion lexicon construction but ma… ▽ More Emotion Representation Map** (ERM) has the goal to convert existing emotion ratings from one representation format into another one, e.g., map** Valence-Arousal-Dominance annotations for words or sentences into Ekman's Basic Emotions and vice versa. ERM can thus not only be considered as an alternative to Word Emotion Induction (WEI) techniques for automatic emotion lexicon construction but may also help mitigate problems that come from the proliferation of emotion representation formats in recent years. We propose a new neural network approach to ERM that not only outperforms the previous state-of-the-art. Equally important, we present a refined evaluation methodology and gather strong evidence that our model yields results which are (almost) as reliable as human annotations, even in cross-lingual settings. Based on these results we generate new emotion ratings for 13 typologically diverse languages and claim that they have near-gold quality, at least. △ Less

Submitted 22 June, 2018; originally announced June 2018.

arXiv:1806.08115 [pdf, other]

Modeling Word Emotion in Historical Language: Quantity Beats Supposed Stability in Seed Word Selection

Authors: Johannes Hellrich, Sven Buechel, Udo Hahn

Abstract: To understand historical texts, we must be aware that language -- including the emotional connotation attached to words -- changes over time. In this paper, we aim at estimating the emotion which is associated with a given word in former language stages of English and German. Emotion is represented following the popular Valence-Arousal-Dominance (VAD) annotation scheme. While being more expressive… ▽ More To understand historical texts, we must be aware that language -- including the emotional connotation attached to words -- changes over time. In this paper, we aim at estimating the emotion which is associated with a given word in former language stages of English and German. Emotion is represented following the popular Valence-Arousal-Dominance (VAD) annotation scheme. While being more expressive than polarity alone, existing word emotion induction methods are typically not suited for addressing it. To overcome this limitation, we present adaptations of two popular algorithms to VAD. To measure their effectiveness in diachronic settings, we present the first gold standard for historical word emotions, which was created by scholars with proficiency in the respective language stages and covers both English and German. In contrast to claims in previous work, our findings indicate that hand-selecting small sets of seed words with supposedly stable emotional meaning is actually harmful rather than helpful. △ Less

Submitted 8 April, 2019; v1 submitted 21 June, 2018; originally announced June 2018.

Comments: Updated for LaTeCH-CLfL 2019

arXiv:1612.03608 [pdf, other]

doi 10.14736/kyb-2020-3-0432

A one-way ANOVA test for functional data with graphical interpretation

Authors: Tomas Mrkvicka, Mari Myllymaki, Milan Jilek, Ute Hahn

Abstract: A new functional ANOVA test, with a graphical interpretation of the result, is presented. The test is an extension of the global envelope test introduced by Myllymaki et al. (2017, Global envelope tests for spatial processes, J. R. Statist. Soc. B 79, 381--404, doi: 10.1111/rssb.12172). The graphical interpretation is realized by a global envelope which is drawn jointly for all samples of function… ▽ More A new functional ANOVA test, with a graphical interpretation of the result, is presented. The test is an extension of the global envelope test introduced by Myllymaki et al. (2017, Global envelope tests for spatial processes, J. R. Statist. Soc. B 79, 381--404, doi: 10.1111/rssb.12172). The graphical interpretation is realized by a global envelope which is drawn jointly for all samples of functions. If a mean function computed from the empirical data is out of the given envelope, the null hypothesis is rejected with the predetermined significance level $α$. The advantages of the proposed one-way functional ANOVA are that it identifies the domains of the functions which are responsible for the potential rejection. We introduce two versions of this test: the first gives a graphical interpretation of the test results in the original space of the functions and the second immediately offers a post-hoc test by identifying the significant pair-wise differences between groups. The proposed tests rely on discretization of the functions, therefore the tests are also applicable in the multidimensional ANOVA problem. In the empirical part of the article, we demonstrate the use of the method by analyzing fiscal decentralization in European countries. The aim of the empirical analysis is to capture differences between the levels of government expenditure decentralization ratio among different groups of European countries. The idea behind, based on the existing literature, is straightforward: countries with a longer European integration history are supposed to decentralize more of their government expenditure. We use the government expenditure centralization ratios of 29 European Union and EFTA countries in period from 1995 to 2016 sorted into three groups according to the presumed level of European economic and political integration. △ Less

Submitted 22 May, 2020; v1 submitted 12 December, 2016; originally announced December 2016.

Comments: arXiv admin note: text overlap with arXiv:1506.01646

Journal ref: Kybernetika 56 no. 3, 432-458, 2020

arXiv:1506.01646 [pdf, other]

Multiple Monte Carlo Testing with Applications in Spatial Point Processes

Authors: Tomáš Mrkvička, Mari Myllymäki, Ute Hahn

Abstract: The rank envelope test (Myllymäki et al., Global envelope tests for spatial processes, arXiv:1307.0239 [stat.ME]) is proposed as a solution to multiple testing problem for Monte Carlo tests. Three different situations are recognized: 1) a few univariate Monte Carlo tests, 2) a Monte Carlo test with a function as the test statistic, 3) several Monte Carlo tests with functions as test statistics. Th… ▽ More The rank envelope test (Myllymäki et al., Global envelope tests for spatial processes, arXiv:1307.0239 [stat.ME]) is proposed as a solution to multiple testing problem for Monte Carlo tests. Three different situations are recognized: 1) a few univariate Monte Carlo tests, 2) a Monte Carlo test with a function as the test statistic, 3) several Monte Carlo tests with functions as test statistics. The rank test has correct (global) type I error in each case and it is accompanied with a $p$-value and with a graphical interpretation which shows which subtest or which distances of the used test function(s) lead to the rejection at the prescribed significance level of the test. Examples of null hypothesis from point process and random set statistics are used to demonstrate the strength of the rank envelope test. The examples include goodness-of-fit test with several test functions, goodness-of-fit test for one group of point patterns, comparison of several groups of point patterns, test of dependence of components in a multi-type point pattern, and test of Boolean assumption for random closed sets. △ Less

Submitted 4 June, 2015; originally announced June 2015.

arXiv:1307.0239 [pdf, other]

doi 10.1111/rssb.12172

Global envelope tests for spatial processes

Authors: Mari Myllymäki, Tomás Mrkvicka, Pavel Grabarnik, Henri Seijo, Ute Hahn

Abstract: Envelope tests are a popular tool in spatial statistics, where they are used in goodness-of-fit testing. These tests graphically compare an empirical function $T(r)$ with its simulated counterparts from the null model. However, the type I error probability $α$ is conventionally controlled for a fixed distance $r$ only, whereas the functions are inspected on an interval of distances $I$. In this st… ▽ More Envelope tests are a popular tool in spatial statistics, where they are used in goodness-of-fit testing. These tests graphically compare an empirical function $T(r)$ with its simulated counterparts from the null model. However, the type I error probability $α$ is conventionally controlled for a fixed distance $r$ only, whereas the functions are inspected on an interval of distances $I$. In this study, we propose two approaches related to Barnard's Monte Carlo test for building global envelope tests on $I$:(1) ordering the empirical and simulated functions based on their $r$-wise ranks among each other, and (2) the construction of envelopes for a deviation test. These new tests allow the a priori selection of the global $α$ and they yield $p$-values. We illustrate these tests using simulated and real point pattern data. △ Less

Submitted 3 November, 2015; v1 submitted 30 June, 2013; originally announced July 2013.

Journal ref: Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79 (2017): 381-404

arXiv:1306.0185 [pdf, other]

doi 10.1140/epjd/e2014-40825-0

Efficient calculation of local dose distribution for response modelling in proton and ion beams

Authors: S. Greilich, U. Hahn, M. Kiderlen, C. E. Andersen, N. Bassler

Abstract: We present an algorithm for fast and accurate computation of the local dose distribution in MeV beams of protons, carbon ions or other heavy-charged particles. It uses compound Poisson-process modelling of track interaction and succesive convolutions for fast computation. It can handle mixed particle fields over a wide range of fluences. Since the local dose distribution is the essential part of s… ▽ More We present an algorithm for fast and accurate computation of the local dose distribution in MeV beams of protons, carbon ions or other heavy-charged particles. It uses compound Poisson-process modelling of track interaction and succesive convolutions for fast computation. It can handle mixed particle fields over a wide range of fluences. Since the local dose distribution is the essential part of several approaches to model detector efficiency or cellular response it has potential use in ion-beam dosimetry and radiotherapy. △ Less

Submitted 2 June, 2013; originally announced June 2013.

Comments: 9 pages, 3 figures

Journal ref: The European Physical Journal D (2014) 68:327

arXiv:cond-mat/0411372 [pdf]

The Interaction between PEDOT/PSS Gate and sub-$μ$ low-k Insulator for All Printed OFETs

Authors: Z. Shalabutov, M. Friedrich, I. Thurzo, A. Hübler, D. R. T. Zahn, U. Hahn

Abstract: In all printed OFETs with Poly(3,4-ethylenedioxythiophene)/ poly(styrenesulfonate) (PEDOT/ PSS) gate, offset printing and gravure of electrically dense sub-$μ$ insulators from polyvinylphenol (PVPh), polyvinyl alcohol (PVOH) and poly(methyl methacrylate) (PMMA) as well as other organic and inorganic materials turned out to be problematic due to the most reactive part of the gate material- the ne… ▽ More In all printed OFETs with Poly(3,4-ethylenedioxythiophene)/ poly(styrenesulfonate) (PEDOT/ PSS) gate, offset printing and gravure of electrically dense sub-$μ$ insulators from polyvinylphenol (PVPh), polyvinyl alcohol (PVOH) and poly(methyl methacrylate) (PMMA) as well as other organic and inorganic materials turned out to be problematic due to the most reactive part of the gate material- the negative sulfonate ions from PSS. The present paper investigates the nature of the interaction between PSS and PVOH sub-$μ$ insulator explored by infrared spectroscopy and electrical methods. Some evidence is obtained that most probably OH- and not sulfonate ions are responsible for creating channels, penetrated by PEDOT/PSS nano dispersion applied as gate. △ Less

Submitted 15 November, 2004; originally announced November 2004.

Comments: submitted to J. Appl. Phys

arXiv:cmp-lg/9709010 [pdf, ps, other]

Message-Passing Protocols for Real-World Parsing -- An Object-Oriented Model and its Preliminary Evaluation

Authors: Udo Hahn, Peter Neuhaus, Norbert Broeker

Abstract: We argue for a performance-based design of natural language grammars and their associated parsers in order to meet the constraints imposed by real-world NLP. Our approach incorporates declarative and procedural knowledge about language and language use within an object-oriented specification framework. We discuss several message-passing protocols for parsing and provide reasons for sacrificing c… ▽ More We argue for a performance-based design of natural language grammars and their associated parsers in order to meet the constraints imposed by real-world NLP. Our approach incorporates declarative and procedural knowledge about language and language use within an object-oriented specification framework. We discuss several message-passing protocols for parsing and provide reasons for sacrificing completeness of the parse in favor of efficiency based on a preliminary empirical evaluation. △ Less

Submitted 23 September, 1997; originally announced September 1997.

Comments: 12 pages, uses epsfig.sty

Journal ref: Proc. Int'l Workshop on Parsing Technologies, 1997, Boston/MA: MIT, pp 101-112

arXiv:cmp-lg/9704014 [pdf, ps, other]

Centering in-the-large: Computing referential discourse segments

Authors: Udo Hahn, Michael Strube

Abstract: We specify an algorithm that builds up a hierarchy of referential discourse segments from local centering data. The spatial extension and nesting of these discourse segments constrain the reachability of potential antecedents of an anaphoric expression beyond the local level of adjacent center pairs. Thus, the centering model is scaled up to the level of the global referential structure of disco… ▽ More We specify an algorithm that builds up a hierarchy of referential discourse segments from local centering data. The spatial extension and nesting of these discourse segments constrain the reachability of potential antecedents of an anaphoric expression beyond the local level of adjacent center pairs. Thus, the centering model is scaled up to the level of the global referential structure of discourse. An empirical evaluation of the algorithm is supplied. △ Less

Submitted 30 April, 1997; originally announced April 1997.

Comments: LaTeX, 8 pages

Journal ref: Proceedings of ACL 97 / EACL 97

arXiv:cmp-lg/9605030 [pdf, ps]

Incremental Centering and Center Ambiguity

Authors: Udo Hahn, Michael Strube

Abstract: In this paper, we present a model of anaphor resolution within the framework of the centering model. The consideration of an incremental processing mode introduces the need to manage structural ambiguity at the center level. Hence, the centering framework is further refined to account for local and global parsing ambiguities which propagate up to the level of center representations, yielding mod… ▽ More In this paper, we present a model of anaphor resolution within the framework of the centering model. The consideration of an incremental processing mode introduces the need to manage structural ambiguity at the center level. Hence, the centering framework is further refined to account for local and global parsing ambiguities which propagate up to the level of center representations, yielding moderately adapted data structures for the centering algorithm. △ Less

Submitted 16 May, 1996; originally announced May 1996.

Comments: 6 pages, uuencoded gzipped PS file (see also Technical Report at: http://www.coling.uni-freiburg.de/public/papers/cogsci96-center.ps.gz)

Journal ref: CogSci '96: Proc. of 18th Annual Conference of the Cognitive Science Society. La Jolla, Ca., Jul 12-15 1996.

arXiv:cmp-lg/9605027 [pdf, ps]

Restricted Parallelism in Object-Oriented Lexical Parsing

Authors: Peter Neuhaus, Udo Hahn

Abstract: We present an approach to parallel natural language parsing which is based on a concurrent, object-oriented model of computation. A depth-first, yet incomplete parsing algorithm for a dependency grammar is specified and several restrictions on the degree of its parallelization are discussed. We present an approach to parallel natural language parsing which is based on a concurrent, object-oriented model of computation. A depth-first, yet incomplete parsing algorithm for a dependency grammar is specified and several restrictions on the degree of its parallelization are discussed. △ Less

Submitted 15 May, 1996; originally announced May 1996.

Comments: 6 pages, uuencoded gzipped PS file

Journal ref: Proceedings of COLING '96 (Copenhagen)

arXiv:cmp-lg/9605026 [pdf, ps]

Trading off Completeness for Efficiency --- The \textsc{ParseTalk} Performance Grammar Approach to Real-World Text Parsing

Authors: Peter Neuhaus, Udo Hahn

Abstract: We argue for a performance-based design of natural language grammars and their associated parsers in order to meet the constraints posed by real-world natural language understanding. This approach incorporates declarative and procedural knowledge about language and language use within an object-oriented specification framework. We discuss several message passing protocols for real-world text par… ▽ More We argue for a performance-based design of natural language grammars and their associated parsers in order to meet the constraints posed by real-world natural language understanding. This approach incorporates declarative and procedural knowledge about language and language use within an object-oriented specification framework. We discuss several message passing protocols for real-world text parsing and provide reasons for sacrificing completeness of the parse in favor of efficiency. △ Less

Submitted 15 May, 1996; originally announced May 1996.

Comments: 6 pages, uuencoded gzipped PS file

Journal ref: Proceedings of FLAIRS '96 (Key West)

arXiv:cmp-lg/9605025 [pdf, ps]

A Conceptual Reasoning Approach to Textual Ellipsis

Authors: Udo Hahn, Katja Markert, Michael Strube

Abstract: We present a hybrid text understanding methodology for the resolution of textual ellipsis. It integrates conceptual criteria (based on the well-formedness and conceptual strength of role chains in a terminological knowledge base) and functional constraints reflecting the utterances' information structure (based on the distinction between context-bound and unbound discourse elements). The methodo… ▽ More We present a hybrid text understanding methodology for the resolution of textual ellipsis. It integrates conceptual criteria (based on the well-formedness and conceptual strength of role chains in a terminological knowledge base) and functional constraints reflecting the utterances' information structure (based on the distinction between context-bound and unbound discourse elements). The methodological framework for text ellipsis resolution is the centering model that has been adapted to these constraints. △ Less

Submitted 15 May, 1996; originally announced May 1996.

Comments: 5 pages, uuencoded gzipped PS file (see also Technical Report at: http://www.coling.uni-freiburg.de/public/papers/ecai96.ps.gz)

Journal ref: ECAI '96: Proc. of 12th European Conference on Artificial Intelligence. Budapest, Aug 12-16 1996, pp.572-576

arXiv:cmp-lg/9605021 [pdf, ps]

Functional Centering

Authors: Michael Strube, Udo Hahn

Abstract: Based on empirical evidence from a free word order language (German) we propose a fundamental revision of the principles guiding the ordering of discourse entities in the forward-looking centers within the centering model. We claim that grammatical role criteria should be replaced by indicators of the functional information structure of the utterances, i.e., the distinction between context-bound… ▽ More Based on empirical evidence from a free word order language (German) we propose a fundamental revision of the principles guiding the ordering of discourse entities in the forward-looking centers within the centering model. We claim that grammatical role criteria should be replaced by indicators of the functional information structure of the utterances, i.e., the distinction between context-bound and unbound discourse elements. This claim is backed up by an empirical evaluation of functional centering. △ Less

Submitted 14 May, 1996; originally announced May 1996.

Comments: 8 pages, uuencoded compressed PS file (see also Technical Report at: http://www.coling.uni-freiburg.de/public/papers/acl96.ps.gz)

Journal ref: Proceedings of ACL '96 (Santa Cruz)

arXiv:cmp-lg/9605020 [pdf, ps, other]

Where Defaults Don't Help: the Case of the German Plural System

Authors: Ramin Charles Nakisa, Ulrike Hahn

Abstract: The German plural system has become a focal point for conflicting theories of language, both linguistic and cognitive. We present simulation results with three simple classifiers - an ordinary nearest neighbour algorithm, Nosofsky's `Generalized Context Model' (GCM) and a standard, three-layer backprop network - predicting the plural class from a phonological representation of the singular in Ge… ▽ More The German plural system has become a focal point for conflicting theories of language, both linguistic and cognitive. We present simulation results with three simple classifiers - an ordinary nearest neighbour algorithm, Nosofsky's `Generalized Context Model' (GCM) and a standard, three-layer backprop network - predicting the plural class from a phonological representation of the singular in German. Though these are absolutely `minimal' models, in terms of architecture and input information, they nevertheless do remarkably well. The nearest neighbour predicts the correct plural class with an accuracy of 72% for a set of 24,640 nouns from the CELEX database. With a subset of 8,598 (non-compound) nouns, the nearest neighbour, the GCM and the network score 71.0%, 75.0% and 83.5%, respectively, on novel items. Furthermore, they outperform a hybrid, `pattern-associator + default rule', model, as proposed by Marcus et al. (1995), on this data set. △ Less

Submitted 13 May, 1996; originally announced May 1996.

Comments: In proceedings of the 18th annual meeting of the Cognitive Science Society. 6 pages, 4 postscript figures. cmp-lg has trouble handling PS fonts under NFSS, so the postscript version is 6 pages in postscript Times Roman and the LaTeX source produces 7 pages of Computer Modern. Just add times to the \usepackage command if your system can handle it

arXiv:cmp-lg/9509005 [pdf, ps]

ParseTalk about Textual Ellipsis

Authors: Michael Strube, Udo Hahn

Abstract: A hybrid methodology for the resolution of text-level ellipsis is presented in this paper. It incorporates conceptual proximity criteria applied to ontologically well-engineered domain knowledge bases and an approach to centering based on functional topic/comment patterns. We state text grammatical predicates for ellipsis and then turn to the procedural aspects of their evaluation within the fra… ▽ More A hybrid methodology for the resolution of text-level ellipsis is presented in this paper. It incorporates conceptual proximity criteria applied to ontologically well-engineered domain knowledge bases and an approach to centering based on functional topic/comment patterns. We state text grammatical predicates for ellipsis and then turn to the procedural aspects of their evaluation within the framework of an actor-based implementation of a lexically distributed parser. △ Less

Submitted 28 September, 1995; originally announced September 1995.

Comments: 11 pages, uuencoded compressed PS file (see also Technical Report at: http://www.coling.uni-freiburg.de/public/papers/ranlp95.ps)

Journal ref: RANLP 95: Proc. of the Intl. Conf. on Recent Advances in Natural Language Processing. Tzigov Chark, Bulgaria, Sep. 14-16 1995, pp.62-72.

arXiv:cmp-lg/9503006 [pdf, ps]

ParseTalk about Sentence- and Text-Level Anaphora

Authors: Michael Strube, Udo Hahn

Abstract: We provide a unified account of sentence-level and text-level anaphora within the framework of a dependency-based grammar model. Criteria for anaphora resolution within sentence boundaries rephrase major concepts from GB's binding theory, while those for text-level anaphora incorporate an adapted version of a Grosz-Sidner-style focus model. We provide a unified account of sentence-level and text-level anaphora within the framework of a dependency-based grammar model. Criteria for anaphora resolution within sentence boundaries rephrase major concepts from GB's binding theory, while those for text-level anaphora incorporate an adapted version of a Grosz-Sidner-style focus model. △ Less

Submitted 3 March, 1995; originally announced March 1995.

Comments: in Proceedings of EACL-95, uuencoded and gzipped postscript (see also technical Report at http://www.coling.uni-freiburg.de:80/forschung/papers/eacl95.ps)

arXiv:cmp-lg/9410019 [pdf, ps]

Concurrent Lexicalized Dependency Parsing: A Behavioral View on ParseTalk Events

Authors: Susanne Schacht, Udo Hahn, Norbert Broeker

Abstract: The behavioral specification of an object-oriented grammar model is considered. The model is based on full lexicalization, head-orientation via valency constraints and dependency relations, inheritance as a means for non-redundant lexicon specification, and concurrency of computation. The computation model relies upon the actor paradigm, with concurrency entering through asynchronous message pas… ▽ More The behavioral specification of an object-oriented grammar model is considered. The model is based on full lexicalization, head-orientation via valency constraints and dependency relations, inheritance as a means for non-redundant lexicon specification, and concurrency of computation. The computation model relies upon the actor paradigm, with concurrency entering through asynchronous message passing between actors. In particular, we here elaborate on principles of how the global behavior of a lexically distributed grammar and its corresponding parser can be specified in terms of event type networks and event networks, resp. △ Less

Submitted 24 October, 1994; originally announced October 1994.

Comments: 68kB, 5pages Postscript

Report number: CLIF Report 9/94

Journal ref: Proc.15th Intl Conference on Computational Linguistics, Kyoto, Japan, August 1994, pp.498-493

arXiv:cmp-lg/9410017 [pdf, ps]

Concurrent Lexicalized Dependency Parsing: The ParseTalk Model

Authors: Norbert Broeker, Udo Hahn, Susanne Schacht

Abstract: A grammar model for concurrent, object-oriented natural language parsing is introduced. Complete lexical distribution of grammatical knowledge is achieved building upon the head-oriented notions of valency and dependency, while inheritance mechanisms are used to capture lexical generalizations. The underlying concurrent computation model relies upon the actor paradigm. We consider message passin… ▽ More A grammar model for concurrent, object-oriented natural language parsing is introduced. Complete lexical distribution of grammatical knowledge is achieved building upon the head-oriented notions of valency and dependency, while inheritance mechanisms are used to capture lexical generalizations. The underlying concurrent computation model relies upon the actor paradigm. We consider message passing protocols for establishing dependency relations and ambiguity handling. △ Less

Submitted 24 October, 1994; originally announced October 1994.

Comments: 90kB, 7pages Postscript

Report number: CLIF Report 9/94

Journal ref: Proc.15th Intl Conference on Computational Linguistics, Kyoto, Japan, August 1994, pp.379-385

Showing 1–36 of 36 results for author: Hahn, U