Search | arXiv e-print repository

doi 10.1109/RBME.2024.3408456

Automated Radiology Report Generation: A Review of Recent Advances

Authors: Phillip Sloan, Philip Clatworthy, Edwin Simpson, Majid Mirmehdi

Abstract: Increasing demands on medical imaging departments are taking a toll on the radiologist's ability to deliver timely and accurate reports. Recent technological advances in artificial intelligence have demonstrated great potential for automatic radiology report generation (ARRG), sparking an explosion of research. This survey paper conducts a methodological review of contemporary ARRG approaches by w… ▽ More Increasing demands on medical imaging departments are taking a toll on the radiologist's ability to deliver timely and accurate reports. Recent technological advances in artificial intelligence have demonstrated great potential for automatic radiology report generation (ARRG), sparking an explosion of research. This survey paper conducts a methodological review of contemporary ARRG approaches by way of (i) assessing datasets based on characteristics, such as availability, size, and adoption rate, (ii) examining deep learning training methods, such as contrastive learning and reinforcement learning, (iii) exploring state-of-the-art model architectures, including variations of CNN and transformer models, (iv) outlining techniques integrating clinical knowledge through multimodal inputs and knowledge graphs, and (v) scrutinising current model evaluation techniques, including commonly applied NLP metrics and qualitative clinical reviews. Furthermore, the quantitative results of the reviewed models are analysed, where the top performing models are examined to seek further insights. Finally, potential new directions are highlighted, with the adoption of additional datasets from other radiological modalities and improved evaluation methods predicted as important areas of future development. △ Less

Submitted 29 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: 24 pages, 8 figures, 6 tables. Accepted by IEEE Reviews in Biomedical Engineering

MSC Class: 68T99 ACM Class: I.2; I.4; J.3

arXiv:2402.07864 [pdf, other]

doi 10.1145/3613904.3642494

Cruising Queer HCI on the DL: A Literature Review of LGBTQ+ People in HCI

Authors: Jordan Taylor, Ellen Simpson, Anh-Ton Tran, Jed Brubaker, Sarah Fox, Haiyi Zhu

Abstract: LGBTQ+ people have received increased attention in HCI research, paralleling a greater emphasis on social justice in recent years. However, there has not been a systematic review of how LGBTQ+ people are researched or discussed in HCI. In this work, we review all research mentioning LGBTQ+ people across the HCI venues of CHI, CSCW, DIS, and TOCHI. Since 2014, we find a linear growth in the number… ▽ More LGBTQ+ people have received increased attention in HCI research, paralleling a greater emphasis on social justice in recent years. However, there has not been a systematic review of how LGBTQ+ people are researched or discussed in HCI. In this work, we review all research mentioning LGBTQ+ people across the HCI venues of CHI, CSCW, DIS, and TOCHI. Since 2014, we find a linear growth in the number of papers substantially about LGBTQ+ people and an exponential increase in the number of mentions. Research about LGBTQ+ people tends to center experiences of being politicized, outside the norm, stigmatized, or highly vulnerable. LGBTQ+ people are typically mentioned as a marginalized group or an area of future research. We identify gaps and opportunities for (1) research about and (2) the discussion of LGBTQ+ in HCI and provide a dataset to facilitate future Queer HCI research. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Journal ref: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI '24)

arXiv:2212.06204 [pdf, ps, other]

"Hey, Can You Add Captions?": The Critical Infrastructuring Practices of Neurodiverse People on TikTok

Authors: Ellen Simpson, Samantha Dalal, Bryan Semaan

Abstract: Accessibility efforts, how we can make the world usable and useful to as many people as possible, have explicitly focused on how we can support and allow for the autonomy and independence of people with disabilities, neurotypes, chronic conditions, and older adults. Despite these efforts, not all technology is designed or implemented to support everyone's needs. Recently, a community-organized pus… ▽ More Accessibility efforts, how we can make the world usable and useful to as many people as possible, have explicitly focused on how we can support and allow for the autonomy and independence of people with disabilities, neurotypes, chronic conditions, and older adults. Despite these efforts, not all technology is designed or implemented to support everyone's needs. Recently, a community-organized push by creators and general users of TikTok urged the platform to add accessibility features, such as closed captioning to user-generated content, allowing more people to use the platform with greater ease. Our work focuses on an understudied population -- people with ADHD and those who experience similar challenges -- exploring the creative practices people from this community engage in, focusing on the kinds of accessibility they create through their creative work. Through an interview study exploring the experiences of creatives on TikTok, we find that creatives engage in critical infrastructuring -- a process of bottom-up (re)design -- to make the platform more accessible despite the challenges the platform presents to them as creators. We present these critical infrastructuring practices through the themes of: creating and augmenting video editing infrastructures and creating and augmenting video captioning infrastructures. We reflect on the introduction of a top-down infrastructure - the implementation of an auto-captioning feature - shifts the critical infrastructure practices of content creators. Through their infrastructuring, creatives revised sociotechnical capabilities of TikTok to support their own needs as well as the broader needs of the TikTok community. We discuss how the routine of infrastructuring accessibility is actually best conceptualized as incidental care work. We further highlight how accessibility is an evolving sociotechnical construct, and forward the concept of contextual accessibility. △ Less

Submitted 12 December, 2022; originally announced December 2022.

Comments: To be published in: Proc. ACM Hum.-Comput. Interact. CSCW '23

arXiv:2211.07596 [pdf, other]

Towards Abstractive Timeline Summarisation using Preference-based Reinforcement Learning

Authors: Yuxuan Ye, Edwin Simpson

Abstract: This paper introduces a novel pipeline for summarising timelines of events reported by multiple news sources. Transformer-based models for abstractive summarisation generate coherent and concise summaries of long documents but can fail to outperform established extractive methods on specialised tasks such as timeline summarisation (TLS). While extractive summaries are more faithful to their source… ▽ More This paper introduces a novel pipeline for summarising timelines of events reported by multiple news sources. Transformer-based models for abstractive summarisation generate coherent and concise summaries of long documents but can fail to outperform established extractive methods on specialised tasks such as timeline summarisation (TLS). While extractive summaries are more faithful to their sources, they may be less readable and contain redundant or unnecessary information. This paper proposes a preference-based reinforcement learning (PBRL) method for adapting pretrained abstractive summarisers to TLS, which can overcome the drawbacks of extractive timeline summaries. We define a compound reward function that learns from keywords of interest and pairwise preference labels, which we use to fine-tune a pretrained abstractive summariser via offline reinforcement learning. We carry out both automated and human evaluation on three datasets, finding that our method outperforms a comparable extractive TLS method on two of the three benchmark datasets, and participants prefer our method's summaries to those of both the extractive TLS method and the pretrained abstractive model. The method does not require expensive reference summaries and needs only a small number of preferences to align the generated summaries with human preferences. △ Less

Submitted 2 November, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

Comments: ECAI 2023

arXiv:2209.00099 [pdf, other]

Efficient Methods for Natural Language Processing: A Survey

Authors: Marcos Treviso, Ji-Ung Lee, Tianchu Ji, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Colin Raffel, Pedro H. Martins, André F. T. Martins, Jessica Zosa Forde, Peter Milder, Edwin Simpson, Noam Slonim, Jesse Dodge, Emma Strubell, Niranjan Balasubramanian, Leon Derczynski, Iryna Gurevych, Roy Schwartz

Abstract: Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require few… ▽ More Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require fewer resources to achieve similar results. This survey synthesizes and relates current methods and findings in efficient NLP. We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for develo** more efficient methods. △ Less

Submitted 24 March, 2023; v1 submitted 31 August, 2022; originally announced September 2022.

Comments: Accepted at TACL, pre publication version

arXiv:2109.01190 [pdf, other]

Assisting Decision Making in Scholarly Peer Review: A Preference Learning Perspective

Authors: Nils Dycke, Edwin Simpson, Ilia Kuznetsov, Iryna Gurevych

Abstract: Peer review is the primary means of quality control in academia; as an outcome of a peer review process, program and area chairs make acceptance decisions for each paper based on the review reports and scores they received. Quality of scientific work is multi-faceted; coupled with the subjectivity of reviewing, this makes final decision making difficult and time-consuming. To support this final st… ▽ More Peer review is the primary means of quality control in academia; as an outcome of a peer review process, program and area chairs make acceptance decisions for each paper based on the review reports and scores they received. Quality of scientific work is multi-faceted; coupled with the subjectivity of reviewing, this makes final decision making difficult and time-consuming. To support this final step of peer review, we formalize it as a paper ranking problem. We introduce a novel, multi-faceted generic evaluation framework for ranking submissions based on peer reviews that takes into account effectiveness, efficiency and fairness. We propose a preference learning perspective on the task that considers both review texts and scores to alleviate the inevitable bias and noise in reviews. Our experiments on peer review data from the ACL 2018 conference demonstrate the superiority of our preference-learning-based approach over baselines and prior work, while highlighting the importance of using both review texts and scores to rank submissions. △ Less

Submitted 27 May, 2022; v1 submitted 2 September, 2021; originally announced September 2021.

arXiv:2102.10398 [pdf]

All-Chalcogenide Programmable All-Optical Deep Neural Networks

Authors: Ting Yu, Xiaoxuan Ma, Ernest Pastor, Jonathan K. George, Simon Wall, Mario Miscuglio, Robert E. Simpson, Volker J. Sorger

Abstract: Deeplearning algorithms are revolutionising many aspects of modern life. Typically, they are implemented in CMOS-based hardware with severely limited memory access times and inefficient data-routing. All-optical neural networks without any electro-optic conversions could alleviate these shortcomings. However, an all-optical nonlinear activation function, which is a vital building block for optical… ▽ More Deeplearning algorithms are revolutionising many aspects of modern life. Typically, they are implemented in CMOS-based hardware with severely limited memory access times and inefficient data-routing. All-optical neural networks without any electro-optic conversions could alleviate these shortcomings. However, an all-optical nonlinear activation function, which is a vital building block for optical neural networks, needs to be developed efficiently on-chip. Here, we introduce and demonstrate both optical synapse weighting and all-optical nonlinear thresholding using two different effects in a chalcogenide material photonic platform. We show how the structural phase transitions in a wide-bandgap phase-change material enables storing the neural network weights via non-volatile photonic memory, whilst resonant bond destabilisation is used as a nonlinear activation threshold without changing the material. These two different transitions within chalcogenides enable programmable neural networks with near-zero static power consumption once trained, in addition to picosecond delays performing inference tasks not limited by wire charging that limit electrical circuits; for instance, we show that nanosecond-order weight programming and near-instantaneous weight updates enable accurate inference tasks within 20 picoseconds in a 3-layer all-optical neural network. Optical neural networks that bypass electro-optic conversion altogether hold promise for network-edge machine learning applications where decision-making in real-time are critical, such as for autonomous vehicles or navigation systems such as signal pre-processing of LIDAR systems. △ Less

Submitted 27 February, 2021; v1 submitted 20 February, 2021; originally announced February 2021.

arXiv:2010.12613 [pdf, other]

Ranking Creative Language Characteristics in Small Data Scenarios

Authors: Julia Siekiera, Marius Köppel, Edwin Simpson, Kevin Stowe, Iryna Gurevych, Stefan Kramer

Abstract: The ability to rank creative natural language provides an important general tool for downstream language understanding and generation. However, current deep ranking models require substantial amounts of labeled data that are difficult and expensive to obtain for different domains, languages and creative characteristics. A recent neural approach, the DirectRanker, promises to reduce the amount of t… ▽ More The ability to rank creative natural language provides an important general tool for downstream language understanding and generation. However, current deep ranking models require substantial amounts of labeled data that are difficult and expensive to obtain for different domains, languages and creative characteristics. A recent neural approach, the DirectRanker, promises to reduce the amount of training data needed but its application to text isn't fully explored. We therefore adapt the DirectRanker to provide a new deep model for ranking creative language with small data. We compare DirectRanker with a Bayesian approach, Gaussian process preference learning (GPPL), which has previously been shown to work well with sparse data. Our experiments with sparse training data show that while the performance of standard neural ranking approaches collapses with small training datasets, DirectRanker remains effective. We find that combining DirectRanker with GPPL increases performance across different settings by leveraging the complementary benefits of both models. Our combined approach outperforms the previous state-of-the-art on humor and metaphor novelty tasks, increasing Spearman's $ρ$ by 14% and 16% on average. △ Less

Submitted 23 October, 2020; originally announced October 2020.

Comments: 10 pages, 3 figures

arXiv:2008.00853 [pdf, other]

doi 10.26342/2020-64-4

Predicting the Humorousness of Tweets Using Gaussian Process Preference Learning

Authors: Tristan Miller, Erik-Lân Do Dinh, Edwin Simpson, Iryna Gurevych

Abstract: Most humour processing systems to date make at best discrete, coarse-grained distinctions between the comical and the conventional, yet such notions are better conceptualized as a broad spectrum. In this paper, we present a probabilistic approach, a variant of Gaussian process preference learning (GPPL), that learns to rank and rate the humorousness of short texts by exploiting human preference ju… ▽ More Most humour processing systems to date make at best discrete, coarse-grained distinctions between the comical and the conventional, yet such notions are better conceptualized as a broad spectrum. In this paper, we present a probabilistic approach, a variant of Gaussian process preference learning (GPPL), that learns to rank and rate the humorousness of short texts by exploiting human preference judgments and automatically sourced linguistic annotations. We apply our system, which is similar to one that had previously shown good performance on English-language one-liners annotated with pairwise humorousness annotations, to the Spanish-language data set of the HAHA@IberLEF2019 evaluation campaign. We report system performance for the campaign's two subtasks, humour detection and funniness score prediction, and discuss some issues arising from the conversion between the numeric scores used in the HAHA@IberLEF2019 data and the pairwise judgment annotations required for our method. △ Less

Submitted 26 March, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

Comments: 8 pages, 1 figure. A previous version of this paper was published as "OFAI-UKP at HAHA@IberLEF2019: Predicting the Humorousness of Tweets Using Gaussian Process Preference Learning" in the Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), volume 2421 of CEUR Workshop Proceedings, pages 180-190, 2019

ACM Class: I.2.7

Journal ref: Procesamiento del Lenguaje Natural, 64:37-44, March 2020

arXiv:2005.00250 [pdf, other]

Low Resource Multi-Task Sequence Tagging -- Revisiting Dynamic Conditional Random Fields

Authors: Jonas Pfeiffer, Edwin Simpson, Iryna Gurevych

Abstract: We compare different models for low resource multi-task sequence tagging that leverage dependencies between label sequences for different tasks. Our analysis is aimed at datasets where each example has labels for multiple tasks. Current approaches use either a separate model for each task or standard multi-task learning to learn shared feature representations. However, these approaches ignore corr… ▽ More We compare different models for low resource multi-task sequence tagging that leverage dependencies between label sequences for different tasks. Our analysis is aimed at datasets where each example has labels for multiple tasks. Current approaches use either a separate model for each task or standard multi-task learning to learn shared feature representations. However, these approaches ignore correlations between label sequences, which can provide important information in settings with small training datasets. To analyze which scenarios can profit from modeling dependencies between labels in different tasks, we revisit dynamic conditional random fields (CRFs) and combine them with deep neural networks. We compare single-task, multi-task and dynamic CRF setups for three diverse datasets at both sentence and document levels in English and German low resource scenarios. We show that including silver labels from pretrained part-of-speech taggers as auxiliary tasks can improve performance on downstream tasks. We find that especially in low-resource scenarios, the explicit modeling of inter-dependencies between task predictions outperforms single-task as well as standard multi-task models. △ Less

Submitted 1 May, 2020; originally announced May 2020.

arXiv:2005.00036 [pdf, other]

Improving Factual Consistency Between a Response and Persona Facts

Authors: Mohsen Mesgar, Edwin Simpson, Iryna Gurevych

Abstract: Neural models for response generation produce responses that are semantically plausible but not necessarily factually consistent with facts describing the speaker's persona. These models are trained with fully supervised learning where the objective function barely captures factual consistency. We propose to fine-tune these models by reinforcement learning and an efficient reward function that exp… ▽ More Neural models for response generation produce responses that are semantically plausible but not necessarily factually consistent with facts describing the speaker's persona. These models are trained with fully supervised learning where the objective function barely captures factual consistency. We propose to fine-tune these models by reinforcement learning and an efficient reward function that explicitly captures the consistency between a response and persona facts as well as semantic plausibility. Our automatic and human evaluations on the PersonaChat corpus confirm that our approach increases the rate of responses that are factually consistent with persona facts over its supervised counterpart while retaining the language quality of responses. △ Less

Submitted 14 February, 2021; v1 submitted 30 April, 2020; originally announced May 2020.

Comments: Accepted in EACL'21 (https://www.aclweb.org/anthology/)

arXiv:1912.01987 [pdf, other]

Scalable Bayesian Preference Learning for Crowds

Authors: Edwin Simpson, Iryna Gurevych

Abstract: We propose a scalable Bayesian preference learning method for jointly predicting the preferences of individuals as well as the consensus of a crowd from pairwise labels. Peoples' opinions often differ greatly, making it difficult to predict their preferences from small amounts of personal data. Individual biases also make it harder to infer the consensus of a crowd when there are few labels per it… ▽ More We propose a scalable Bayesian preference learning method for jointly predicting the preferences of individuals as well as the consensus of a crowd from pairwise labels. Peoples' opinions often differ greatly, making it difficult to predict their preferences from small amounts of personal data. Individual biases also make it harder to infer the consensus of a crowd when there are few labels per item. We address these challenges by combining matrix factorisation with Gaussian processes, using a Bayesian approach to account for uncertainty arising from noisy and sparse data. Our method exploits input features, such as text embeddings and user metadata, to predict preferences for new items and users that are not in the training set. As previous solutions based on Gaussian processes do not scale to large numbers of users, items or pairwise labels, we propose a stochastic variational inference approach that limits computational and memory costs. Our experiments on a recommendation task show that our method is competitive with previous approaches despite our scalable inference approximation. We demonstrate the method's scalability on a natural language processing task with thousands of users and items, and show improvements over the state of the art on this task. We make our software publicly available for future work. △ Less

Submitted 11 December, 2019; v1 submitted 4 December, 2019; originally announced December 2019.

arXiv:1911.10183 [pdf, other]

Interactive Text Ranking with Bayesian Optimisation: A Case Study on Community QA and Summarisation

Authors: Edwin Simpson, Yang Gao, Iryna Gurevych

Abstract: For many NLP applications, such as question answering and summarisation, the goal is to select the best solution from a large space of candidates to meet a particular user's needs. To address the lack of user-specific training data, we propose an interactive text ranking approach that actively selects pairs of candidates, from which the user selects the best. Unlike previous strategies, which atte… ▽ More For many NLP applications, such as question answering and summarisation, the goal is to select the best solution from a large space of candidates to meet a particular user's needs. To address the lack of user-specific training data, we propose an interactive text ranking approach that actively selects pairs of candidates, from which the user selects the best. Unlike previous strategies, which attempt to learn a ranking across the whole candidate space, our method employs Bayesian optimisation to focus the user's labelling effort on high quality candidates and integrates prior knowledge in a Bayesian manner to cope better with small data scenarios. We apply our method to community question answering (cQA) and extractive summarisation, finding that it significantly outperforms existing interactive approaches. We also show that the ranking function learned by our method is an effective reward function for reinforcement learning, which improves the state of the art for interactive summarisation. △ Less

Submitted 11 September, 2020; v1 submitted 22 November, 2019; originally announced November 2019.

Comments: Accepted to Transactions of the ACL

arXiv:1904.03063 [pdf, other]

Bayesian Heatmaps: Probabilistic Classification with Multiple Unreliable Information Sources

Authors: Edwin Simpson, Steven Reece, Stephen J. Roberts

Abstract: Unstructured data from diverse sources, such as social media and aerial imagery, can provide valuable up-to-date information for intelligent situation assessment. Mining these different information sources could bring major benefits to applications such as situation awareness in disaster zones and map** the spread of diseases. Such applications depend on classifying the situation across a region… ▽ More Unstructured data from diverse sources, such as social media and aerial imagery, can provide valuable up-to-date information for intelligent situation assessment. Mining these different information sources could bring major benefits to applications such as situation awareness in disaster zones and map** the spread of diseases. Such applications depend on classifying the situation across a region of interest, which can be depicted as a spatial "heatmap". Annotating unstructured data using crowdsourcing or automated classifiers produces individual classifications at sparse locations that typically contain many errors. We propose a novel Bayesian approach that models the relevance, error rates and bias of each information source, enabling us to learn a spatial Gaussian Process classifier by aggregating data from multiple sources with varying reliability and relevance. Our method does not require gold-labelled data and can make predictions at any location in an area of interest given only sparse observations. We show empirically that our approach can handle noisy and biased data sources, and that simultaneously inferring reliability and transferring information between neighbouring reports leads to more accurate predictions. We demonstrate our method on two real-world problems from disaster response, showing how our approach reduces the amount of crowdsourced data required and can be used to generate valuable heatmap visualisations from SMS messages and satellite images. △ Less

Submitted 5 April, 2019; originally announced April 2019.

Journal ref: Joint European Conference on Machine Learning and Knowledge Discovery in Databases (2017), pp. 109-125, Springer, Cham

arXiv:1903.11508 [pdf, other]

Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems

Authors: Steffen Eger, Gözde Gül Şahin, Andreas Rücklé, Ji-Ung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, Iryna Gurevych

Abstract: Visual modifications to text are often used to obfuscate offensive comments in social media (e.g., "!d10t") or as a writing style ("1337" in "leet speak"), among other scenarios. We consider this as a new type of adversarial attack in NLP, a setting to which humans are very robust, as our experiments with both simple and more difficult visual input perturbations demonstrate. We then investigate th… ▽ More Visual modifications to text are often used to obfuscate offensive comments in social media (e.g., "!d10t") or as a writing style ("1337" in "leet speak"), among other scenarios. We consider this as a new type of adversarial attack in NLP, a setting to which humans are very robust, as our experiments with both simple and more difficult visual input perturbations demonstrate. We then investigate the impact of visual adversarial attacks on current NLP systems on character-, word-, and sentence-level tasks, showing that both neural and non-neural models are, in contrast to humans, extremely sensitive to such attacks, suffering performance decreases of up to 82\%. We then explore three shielding methods---visual character embeddings, adversarial training, and rule-based recovery---which substantially improve the robustness of the models. However, the shielding methods still fall behind performances achieved in non-attack scenarios, which demonstrates the difficulty of dealing with visual attacks. △ Less

Submitted 10 June, 2020; v1 submitted 27 March, 2019; originally announced March 2019.

Comments: Accepted as long paper at NAACL-2019; fixed one ungrammatical sentence

arXiv:1811.00780 [pdf, other]

A Bayesian Approach for Sequence Tagging with Crowds

Authors: Edwin Simpson, Iryna Gurevych

Abstract: Current methods for sequence tagging, a core task in NLP, are data hungry, which motivates the use of crowdsourcing as a cheap way to obtain labelled data. However, annotators are often unreliable and current aggregation methods cannot capture common types of span annotation errors. To address this, we propose a Bayesian method for aggregating sequence tags that reduces errors by modelling sequent… ▽ More Current methods for sequence tagging, a core task in NLP, are data hungry, which motivates the use of crowdsourcing as a cheap way to obtain labelled data. However, annotators are often unreliable and current aggregation methods cannot capture common types of span annotation errors. To address this, we propose a Bayesian method for aggregating sequence tags that reduces errors by modelling sequential dependencies between the annotations as well as the ground-truth labels. By taking a Bayesian approach, we account for uncertainty in the model due to both annotator errors and the lack of data for modelling annotators who complete few tasks. We evaluate our model on crowdsourced data for named entity recognition, information extraction and argument mining, showing that our sequential model outperforms the previous state of the art. We also find that our approach can reduce crowdsourcing costs through more effective active learning, as it better captures uncertainty in the sequence labels when there are few annotations. △ Less

Submitted 6 September, 2019; v1 submitted 2 November, 2018; originally announced November 2018.

Comments: Accepted for EMNLP 2019

arXiv:1806.02418 [pdf, other]

Finding Convincing Arguments Using Scalable Bayesian Preference Learning

Authors: Edwin Simpson, Iryna Gurevych

Abstract: We introduce a scalable Bayesian preference learning method for identifying convincing arguments in the absence of gold-standard rat- ings or rankings. In contrast to previous work, we avoid the need for separate methods to perform quality control on training data, predict rankings and perform pairwise classification. Bayesian approaches are an effective solution when faced with sparse or noisy tr… ▽ More We introduce a scalable Bayesian preference learning method for identifying convincing arguments in the absence of gold-standard rat- ings or rankings. In contrast to previous work, we avoid the need for separate methods to perform quality control on training data, predict rankings and perform pairwise classification. Bayesian approaches are an effective solution when faced with sparse or noisy training data, but have not previously been used to identify convincing arguments. One issue is scalability, which we address by develo** a stochastic variational inference method for Gaussian process (GP) preference learning. We show how our method can be applied to predict argument convincingness from crowdsourced data, outperforming the previous state-of-the-art, particularly when trained with small amounts of unreliable data. We demonstrate how the Bayesian approach enables more effective active learning, thereby reducing the amount of data required to identify convincing arguments for new users and domains. While word embeddings are principally used with neural networks, our results show that word embeddings in combination with linguistic features also benefit GPs when predicting argument convincingness. △ Less

Submitted 6 June, 2018; originally announced June 2018.

Comments: Accepted for publication in TACL. To be presented at ACL 2018

Showing 1–17 of 17 results for author: Simpson, E