-
Automated Radiology Report Generation: A Review of Recent Advances
Authors:
Phillip Sloan,
Philip Clatworthy,
Edwin Simpson,
Majid Mirmehdi
Abstract:
Increasing demands on medical imaging departments are taking a toll on the radiologist's ability to deliver timely and accurate reports. Recent technological advances in artificial intelligence have demonstrated great potential for automatic radiology report generation (ARRG), sparking an explosion of research. This survey paper conducts a methodological review of contemporary ARRG approaches by w…
▽ More
Increasing demands on medical imaging departments are taking a toll on the radiologist's ability to deliver timely and accurate reports. Recent technological advances in artificial intelligence have demonstrated great potential for automatic radiology report generation (ARRG), sparking an explosion of research. This survey paper conducts a methodological review of contemporary ARRG approaches by way of (i) assessing datasets based on characteristics, such as availability, size, and adoption rate, (ii) examining deep learning training methods, such as contrastive learning and reinforcement learning, (iii) exploring state-of-the-art model architectures, including variations of CNN and transformer models, (iv) outlining techniques integrating clinical knowledge through multimodal inputs and knowledge graphs, and (v) scrutinising current model evaluation techniques, including commonly applied NLP metrics and qualitative clinical reviews. Furthermore, the quantitative results of the reviewed models are analysed, where the top performing models are examined to seek further insights. Finally, potential new directions are highlighted, with the adoption of additional datasets from other radiological modalities and improved evaluation methods predicted as important areas of future development.
△ Less
Submitted 29 May, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
Cruising Queer HCI on the DL: A Literature Review of LGBTQ+ People in HCI
Authors:
Jordan Taylor,
Ellen Simpson,
Anh-Ton Tran,
Jed Brubaker,
Sarah Fox,
Haiyi Zhu
Abstract:
LGBTQ+ people have received increased attention in HCI research, paralleling a greater emphasis on social justice in recent years. However, there has not been a systematic review of how LGBTQ+ people are researched or discussed in HCI. In this work, we review all research mentioning LGBTQ+ people across the HCI venues of CHI, CSCW, DIS, and TOCHI. Since 2014, we find a linear growth in the number…
▽ More
LGBTQ+ people have received increased attention in HCI research, paralleling a greater emphasis on social justice in recent years. However, there has not been a systematic review of how LGBTQ+ people are researched or discussed in HCI. In this work, we review all research mentioning LGBTQ+ people across the HCI venues of CHI, CSCW, DIS, and TOCHI. Since 2014, we find a linear growth in the number of papers substantially about LGBTQ+ people and an exponential increase in the number of mentions. Research about LGBTQ+ people tends to center experiences of being politicized, outside the norm, stigmatized, or highly vulnerable. LGBTQ+ people are typically mentioned as a marginalized group or an area of future research. We identify gaps and opportunities for (1) research about and (2) the discussion of LGBTQ+ in HCI and provide a dataset to facilitate future Queer HCI research.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
"Hey, Can You Add Captions?": The Critical Infrastructuring Practices of Neurodiverse People on TikTok
Authors:
Ellen Simpson,
Samantha Dalal,
Bryan Semaan
Abstract:
Accessibility efforts, how we can make the world usable and useful to as many people as possible, have explicitly focused on how we can support and allow for the autonomy and independence of people with disabilities, neurotypes, chronic conditions, and older adults. Despite these efforts, not all technology is designed or implemented to support everyone's needs. Recently, a community-organized pus…
▽ More
Accessibility efforts, how we can make the world usable and useful to as many people as possible, have explicitly focused on how we can support and allow for the autonomy and independence of people with disabilities, neurotypes, chronic conditions, and older adults. Despite these efforts, not all technology is designed or implemented to support everyone's needs. Recently, a community-organized push by creators and general users of TikTok urged the platform to add accessibility features, such as closed captioning to user-generated content, allowing more people to use the platform with greater ease. Our work focuses on an understudied population -- people with ADHD and those who experience similar challenges -- exploring the creative practices people from this community engage in, focusing on the kinds of accessibility they create through their creative work. Through an interview study exploring the experiences of creatives on TikTok, we find that creatives engage in critical infrastructuring -- a process of bottom-up (re)design -- to make the platform more accessible despite the challenges the platform presents to them as creators. We present these critical infrastructuring practices through the themes of: creating and augmenting video editing infrastructures and creating and augmenting video captioning infrastructures. We reflect on the introduction of a top-down infrastructure - the implementation of an auto-captioning feature - shifts the critical infrastructure practices of content creators. Through their infrastructuring, creatives revised sociotechnical capabilities of TikTok to support their own needs as well as the broader needs of the TikTok community. We discuss how the routine of infrastructuring accessibility is actually best conceptualized as incidental care work. We further highlight how accessibility is an evolving sociotechnical construct, and forward the concept of contextual accessibility.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
Towards Abstractive Timeline Summarisation using Preference-based Reinforcement Learning
Authors:
Yuxuan Ye,
Edwin Simpson
Abstract:
This paper introduces a novel pipeline for summarising timelines of events reported by multiple news sources. Transformer-based models for abstractive summarisation generate coherent and concise summaries of long documents but can fail to outperform established extractive methods on specialised tasks such as timeline summarisation (TLS). While extractive summaries are more faithful to their source…
▽ More
This paper introduces a novel pipeline for summarising timelines of events reported by multiple news sources. Transformer-based models for abstractive summarisation generate coherent and concise summaries of long documents but can fail to outperform established extractive methods on specialised tasks such as timeline summarisation (TLS). While extractive summaries are more faithful to their sources, they may be less readable and contain redundant or unnecessary information. This paper proposes a preference-based reinforcement learning (PBRL) method for adapting pretrained abstractive summarisers to TLS, which can overcome the drawbacks of extractive timeline summaries. We define a compound reward function that learns from keywords of interest and pairwise preference labels, which we use to fine-tune a pretrained abstractive summariser via offline reinforcement learning. We carry out both automated and human evaluation on three datasets, finding that our method outperforms a comparable extractive TLS method on two of the three benchmark datasets, and participants prefer our method's summaries to those of both the extractive TLS method and the pretrained abstractive model. The method does not require expensive reference summaries and needs only a small number of preferences to align the generated summaries with human preferences.
△ Less
Submitted 2 November, 2023; v1 submitted 14 November, 2022;
originally announced November 2022.
-
Efficient Methods for Natural Language Processing: A Survey
Authors:
Marcos Treviso,
Ji-Ung Lee,
Tianchu Ji,
Betty van Aken,
Qingqing Cao,
Manuel R. Ciosici,
Michael Hassid,
Kenneth Heafield,
Sara Hooker,
Colin Raffel,
Pedro H. Martins,
André F. T. Martins,
Jessica Zosa Forde,
Peter Milder,
Edwin Simpson,
Noam Slonim,
Jesse Dodge,
Emma Strubell,
Niranjan Balasubramanian,
Leon Derczynski,
Iryna Gurevych,
Roy Schwartz
Abstract:
Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require few…
▽ More
Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require fewer resources to achieve similar results. This survey synthesizes and relates current methods and findings in efficient NLP. We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for develo** more efficient methods.
△ Less
Submitted 24 March, 2023; v1 submitted 31 August, 2022;
originally announced September 2022.
-
Assisting Decision Making in Scholarly Peer Review: A Preference Learning Perspective
Authors:
Nils Dycke,
Edwin Simpson,
Ilia Kuznetsov,
Iryna Gurevych
Abstract:
Peer review is the primary means of quality control in academia; as an outcome of a peer review process, program and area chairs make acceptance decisions for each paper based on the review reports and scores they received. Quality of scientific work is multi-faceted; coupled with the subjectivity of reviewing, this makes final decision making difficult and time-consuming. To support this final st…
▽ More
Peer review is the primary means of quality control in academia; as an outcome of a peer review process, program and area chairs make acceptance decisions for each paper based on the review reports and scores they received. Quality of scientific work is multi-faceted; coupled with the subjectivity of reviewing, this makes final decision making difficult and time-consuming. To support this final step of peer review, we formalize it as a paper ranking problem. We introduce a novel, multi-faceted generic evaluation framework for ranking submissions based on peer reviews that takes into account effectiveness, efficiency and fairness. We propose a preference learning perspective on the task that considers both review texts and scores to alleviate the inevitable bias and noise in reviews. Our experiments on peer review data from the ACL 2018 conference demonstrate the superiority of our preference-learning-based approach over baselines and prior work, while highlighting the importance of using both review texts and scores to rank submissions.
△ Less
Submitted 27 May, 2022; v1 submitted 2 September, 2021;
originally announced September 2021.
-
All-Chalcogenide Programmable All-Optical Deep Neural Networks
Authors:
Ting Yu,
Xiaoxuan Ma,
Ernest Pastor,
Jonathan K. George,
Simon Wall,
Mario Miscuglio,
Robert E. Simpson,
Volker J. Sorger
Abstract:
Deeplearning algorithms are revolutionising many aspects of modern life. Typically, they are implemented in CMOS-based hardware with severely limited memory access times and inefficient data-routing. All-optical neural networks without any electro-optic conversions could alleviate these shortcomings. However, an all-optical nonlinear activation function, which is a vital building block for optical…
▽ More
Deeplearning algorithms are revolutionising many aspects of modern life. Typically, they are implemented in CMOS-based hardware with severely limited memory access times and inefficient data-routing. All-optical neural networks without any electro-optic conversions could alleviate these shortcomings. However, an all-optical nonlinear activation function, which is a vital building block for optical neural networks, needs to be developed efficiently on-chip. Here, we introduce and demonstrate both optical synapse weighting and all-optical nonlinear thresholding using two different effects in a chalcogenide material photonic platform. We show how the structural phase transitions in a wide-bandgap phase-change material enables storing the neural network weights via non-volatile photonic memory, whilst resonant bond destabilisation is used as a nonlinear activation threshold without changing the material. These two different transitions within chalcogenides enable programmable neural networks with near-zero static power consumption once trained, in addition to picosecond delays performing inference tasks not limited by wire charging that limit electrical circuits; for instance, we show that nanosecond-order weight programming and near-instantaneous weight updates enable accurate inference tasks within 20 picoseconds in a 3-layer all-optical neural network. Optical neural networks that bypass electro-optic conversion altogether hold promise for network-edge machine learning applications where decision-making in real-time are critical, such as for autonomous vehicles or navigation systems such as signal pre-processing of LIDAR systems.
△ Less
Submitted 27 February, 2021; v1 submitted 20 February, 2021;
originally announced February 2021.
-
Ranking Creative Language Characteristics in Small Data Scenarios
Authors:
Julia Siekiera,
Marius Köppel,
Edwin Simpson,
Kevin Stowe,
Iryna Gurevych,
Stefan Kramer
Abstract:
The ability to rank creative natural language provides an important general tool for downstream language understanding and generation. However, current deep ranking models require substantial amounts of labeled data that are difficult and expensive to obtain for different domains, languages and creative characteristics. A recent neural approach, the DirectRanker, promises to reduce the amount of t…
▽ More
The ability to rank creative natural language provides an important general tool for downstream language understanding and generation. However, current deep ranking models require substantial amounts of labeled data that are difficult and expensive to obtain for different domains, languages and creative characteristics. A recent neural approach, the DirectRanker, promises to reduce the amount of training data needed but its application to text isn't fully explored. We therefore adapt the DirectRanker to provide a new deep model for ranking creative language with small data. We compare DirectRanker with a Bayesian approach, Gaussian process preference learning (GPPL), which has previously been shown to work well with sparse data. Our experiments with sparse training data show that while the performance of standard neural ranking approaches collapses with small training datasets, DirectRanker remains effective. We find that combining DirectRanker with GPPL increases performance across different settings by leveraging the complementary benefits of both models. Our combined approach outperforms the previous state-of-the-art on humor and metaphor novelty tasks, increasing Spearman's $ρ$ by 14% and 16% on average.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
Predicting the Humorousness of Tweets Using Gaussian Process Preference Learning
Authors:
Tristan Miller,
Erik-Lân Do Dinh,
Edwin Simpson,
Iryna Gurevych
Abstract:
Most humour processing systems to date make at best discrete, coarse-grained distinctions between the comical and the conventional, yet such notions are better conceptualized as a broad spectrum. In this paper, we present a probabilistic approach, a variant of Gaussian process preference learning (GPPL), that learns to rank and rate the humorousness of short texts by exploiting human preference ju…
▽ More
Most humour processing systems to date make at best discrete, coarse-grained distinctions between the comical and the conventional, yet such notions are better conceptualized as a broad spectrum. In this paper, we present a probabilistic approach, a variant of Gaussian process preference learning (GPPL), that learns to rank and rate the humorousness of short texts by exploiting human preference judgments and automatically sourced linguistic annotations. We apply our system, which is similar to one that had previously shown good performance on English-language one-liners annotated with pairwise humorousness annotations, to the Spanish-language data set of the HAHA@IberLEF2019 evaluation campaign. We report system performance for the campaign's two subtasks, humour detection and funniness score prediction, and discuss some issues arising from the conversion between the numeric scores used in the HAHA@IberLEF2019 data and the pairwise judgment annotations required for our method.
△ Less
Submitted 26 March, 2021; v1 submitted 3 August, 2020;
originally announced August 2020.
-
Low Resource Multi-Task Sequence Tagging -- Revisiting Dynamic Conditional Random Fields
Authors:
Jonas Pfeiffer,
Edwin Simpson,
Iryna Gurevych
Abstract:
We compare different models for low resource multi-task sequence tagging that leverage dependencies between label sequences for different tasks. Our analysis is aimed at datasets where each example has labels for multiple tasks. Current approaches use either a separate model for each task or standard multi-task learning to learn shared feature representations. However, these approaches ignore corr…
▽ More
We compare different models for low resource multi-task sequence tagging that leverage dependencies between label sequences for different tasks. Our analysis is aimed at datasets where each example has labels for multiple tasks. Current approaches use either a separate model for each task or standard multi-task learning to learn shared feature representations. However, these approaches ignore correlations between label sequences, which can provide important information in settings with small training datasets. To analyze which scenarios can profit from modeling dependencies between labels in different tasks, we revisit dynamic conditional random fields (CRFs) and combine them with deep neural networks. We compare single-task, multi-task and dynamic CRF setups for three diverse datasets at both sentence and document levels in English and German low resource scenarios. We show that including silver labels from pretrained part-of-speech taggers as auxiliary tasks can improve performance on downstream tasks. We find that especially in low-resource scenarios, the explicit modeling of inter-dependencies between task predictions outperforms single-task as well as standard multi-task models.
△ Less
Submitted 1 May, 2020;
originally announced May 2020.
-
Improving Factual Consistency Between a Response and Persona Facts
Authors:
Mohsen Mesgar,
Edwin Simpson,
Iryna Gurevych
Abstract:
Neural models for response generation produce responses that are semantically plausible but not necessarily factually consistent with facts describing the speaker's persona. These models are trained with fully supervised learning where the objective function barely captures factual consistency. We propose to fine-tune these models by reinforcement learning and an efficient reward function that exp…
▽ More
Neural models for response generation produce responses that are semantically plausible but not necessarily factually consistent with facts describing the speaker's persona. These models are trained with fully supervised learning where the objective function barely captures factual consistency. We propose to fine-tune these models by reinforcement learning and an efficient reward function that explicitly captures the consistency between a response and persona facts as well as semantic plausibility. Our automatic and human evaluations on the PersonaChat corpus confirm that our approach increases the rate of responses that are factually consistent with persona facts over its supervised counterpart while retaining the language quality of responses.
△ Less
Submitted 14 February, 2021; v1 submitted 30 April, 2020;
originally announced May 2020.
-
Scalable Bayesian Preference Learning for Crowds
Authors:
Edwin Simpson,
Iryna Gurevych
Abstract:
We propose a scalable Bayesian preference learning method for jointly predicting the preferences of individuals as well as the consensus of a crowd from pairwise labels. Peoples' opinions often differ greatly, making it difficult to predict their preferences from small amounts of personal data. Individual biases also make it harder to infer the consensus of a crowd when there are few labels per it…
▽ More
We propose a scalable Bayesian preference learning method for jointly predicting the preferences of individuals as well as the consensus of a crowd from pairwise labels. Peoples' opinions often differ greatly, making it difficult to predict their preferences from small amounts of personal data. Individual biases also make it harder to infer the consensus of a crowd when there are few labels per item. We address these challenges by combining matrix factorisation with Gaussian processes, using a Bayesian approach to account for uncertainty arising from noisy and sparse data. Our method exploits input features, such as text embeddings and user metadata, to predict preferences for new items and users that are not in the training set. As previous solutions based on Gaussian processes do not scale to large numbers of users, items or pairwise labels, we propose a stochastic variational inference approach that limits computational and memory costs. Our experiments on a recommendation task show that our method is competitive with previous approaches despite our scalable inference approximation. We demonstrate the method's scalability on a natural language processing task with thousands of users and items, and show improvements over the state of the art on this task. We make our software publicly available for future work.
△ Less
Submitted 11 December, 2019; v1 submitted 4 December, 2019;
originally announced December 2019.
-
Interactive Text Ranking with Bayesian Optimisation: A Case Study on Community QA and Summarisation
Authors:
Edwin Simpson,
Yang Gao,
Iryna Gurevych
Abstract:
For many NLP applications, such as question answering and summarisation, the goal is to select the best solution from a large space of candidates to meet a particular user's needs. To address the lack of user-specific training data, we propose an interactive text ranking approach that actively selects pairs of candidates, from which the user selects the best. Unlike previous strategies, which atte…
▽ More
For many NLP applications, such as question answering and summarisation, the goal is to select the best solution from a large space of candidates to meet a particular user's needs. To address the lack of user-specific training data, we propose an interactive text ranking approach that actively selects pairs of candidates, from which the user selects the best. Unlike previous strategies, which attempt to learn a ranking across the whole candidate space, our method employs Bayesian optimisation to focus the user's labelling effort on high quality candidates and integrates prior knowledge in a Bayesian manner to cope better with small data scenarios. We apply our method to community question answering (cQA) and extractive summarisation, finding that it significantly outperforms existing interactive approaches. We also show that the ranking function learned by our method is an effective reward function for reinforcement learning, which improves the state of the art for interactive summarisation.
△ Less
Submitted 11 September, 2020; v1 submitted 22 November, 2019;
originally announced November 2019.
-
Bayesian Heatmaps: Probabilistic Classification with Multiple Unreliable Information Sources
Authors:
Edwin Simpson,
Steven Reece,
Stephen J. Roberts
Abstract:
Unstructured data from diverse sources, such as social media and aerial imagery, can provide valuable up-to-date information for intelligent situation assessment. Mining these different information sources could bring major benefits to applications such as situation awareness in disaster zones and map** the spread of diseases. Such applications depend on classifying the situation across a region…
▽ More
Unstructured data from diverse sources, such as social media and aerial imagery, can provide valuable up-to-date information for intelligent situation assessment. Mining these different information sources could bring major benefits to applications such as situation awareness in disaster zones and map** the spread of diseases. Such applications depend on classifying the situation across a region of interest, which can be depicted as a spatial "heatmap". Annotating unstructured data using crowdsourcing or automated classifiers produces individual classifications at sparse locations that typically contain many errors. We propose a novel Bayesian approach that models the relevance, error rates and bias of each information source, enabling us to learn a spatial Gaussian Process classifier by aggregating data from multiple sources with varying reliability and relevance. Our method does not require gold-labelled data and can make predictions at any location in an area of interest given only sparse observations. We show empirically that our approach can handle noisy and biased data sources, and that simultaneously inferring reliability and transferring information between neighbouring reports leads to more accurate predictions. We demonstrate our method on two real-world problems from disaster response, showing how our approach reduces the amount of crowdsourced data required and can be used to generate valuable heatmap visualisations from SMS messages and satellite images.
△ Less
Submitted 5 April, 2019;
originally announced April 2019.
-
Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems
Authors:
Steffen Eger,
Gözde Gül Şahin,
Andreas Rücklé,
Ji-Ung Lee,
Claudia Schulz,
Mohsen Mesgar,
Krishnkant Swarnkar,
Edwin Simpson,
Iryna Gurevych
Abstract:
Visual modifications to text are often used to obfuscate offensive comments in social media (e.g., "!d10t") or as a writing style ("1337" in "leet speak"), among other scenarios. We consider this as a new type of adversarial attack in NLP, a setting to which humans are very robust, as our experiments with both simple and more difficult visual input perturbations demonstrate. We then investigate th…
▽ More
Visual modifications to text are often used to obfuscate offensive comments in social media (e.g., "!d10t") or as a writing style ("1337" in "leet speak"), among other scenarios. We consider this as a new type of adversarial attack in NLP, a setting to which humans are very robust, as our experiments with both simple and more difficult visual input perturbations demonstrate. We then investigate the impact of visual adversarial attacks on current NLP systems on character-, word-, and sentence-level tasks, showing that both neural and non-neural models are, in contrast to humans, extremely sensitive to such attacks, suffering performance decreases of up to 82\%. We then explore three shielding methods---visual character embeddings, adversarial training, and rule-based recovery---which substantially improve the robustness of the models. However, the shielding methods still fall behind performances achieved in non-attack scenarios, which demonstrates the difficulty of dealing with visual attacks.
△ Less
Submitted 10 June, 2020; v1 submitted 27 March, 2019;
originally announced March 2019.
-
A Bayesian Approach for Sequence Tagging with Crowds
Authors:
Edwin Simpson,
Iryna Gurevych
Abstract:
Current methods for sequence tagging, a core task in NLP, are data hungry, which motivates the use of crowdsourcing as a cheap way to obtain labelled data. However, annotators are often unreliable and current aggregation methods cannot capture common types of span annotation errors. To address this, we propose a Bayesian method for aggregating sequence tags that reduces errors by modelling sequent…
▽ More
Current methods for sequence tagging, a core task in NLP, are data hungry, which motivates the use of crowdsourcing as a cheap way to obtain labelled data. However, annotators are often unreliable and current aggregation methods cannot capture common types of span annotation errors. To address this, we propose a Bayesian method for aggregating sequence tags that reduces errors by modelling sequential dependencies between the annotations as well as the ground-truth labels. By taking a Bayesian approach, we account for uncertainty in the model due to both annotator errors and the lack of data for modelling annotators who complete few tasks. We evaluate our model on crowdsourced data for named entity recognition, information extraction and argument mining, showing that our sequential model outperforms the previous state of the art. We also find that our approach can reduce crowdsourcing costs through more effective active learning, as it better captures uncertainty in the sequence labels when there are few annotations.
△ Less
Submitted 6 September, 2019; v1 submitted 2 November, 2018;
originally announced November 2018.
-
Finding Convincing Arguments Using Scalable Bayesian Preference Learning
Authors:
Edwin Simpson,
Iryna Gurevych
Abstract:
We introduce a scalable Bayesian preference learning method for identifying convincing arguments in the absence of gold-standard rat- ings or rankings. In contrast to previous work, we avoid the need for separate methods to perform quality control on training data, predict rankings and perform pairwise classification. Bayesian approaches are an effective solution when faced with sparse or noisy tr…
▽ More
We introduce a scalable Bayesian preference learning method for identifying convincing arguments in the absence of gold-standard rat- ings or rankings. In contrast to previous work, we avoid the need for separate methods to perform quality control on training data, predict rankings and perform pairwise classification. Bayesian approaches are an effective solution when faced with sparse or noisy training data, but have not previously been used to identify convincing arguments. One issue is scalability, which we address by develo** a stochastic variational inference method for Gaussian process (GP) preference learning. We show how our method can be applied to predict argument convincingness from crowdsourced data, outperforming the previous state-of-the-art, particularly when trained with small amounts of unreliable data. We demonstrate how the Bayesian approach enables more effective active learning, thereby reducing the amount of data required to identify convincing arguments for new users and domains. While word embeddings are principally used with neural networks, our results show that word embeddings in combination with linguistic features also benefit GPs when predicting argument convincingness.
△ Less
Submitted 6 June, 2018;
originally announced June 2018.