-
A Systematic Review of Natural Language Processing Applied to Radiology Reports
Authors:
Arlene Casey,
Emma Davidson,
Michael Poon,
Hang Dong,
Daniel Duma,
Andreas Grivas,
Claire Grover,
Víctor Suárez-Paniagua,
Richard Tobin,
William Whiteley,
Honghan Wu,
Beatrice Alex
Abstract:
NLP has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses recent literature in NLP applied to radiology reports. Our automated literature search yields 4,799…
▽ More
NLP has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses recent literature in NLP applied to radiology reports. Our automated literature search yields 4,799 results using automated filtering, metadata enriching steps and citation search combined with manual review. Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics. We present a comprehensive analysis of the 164 publications retrieved with each categorised into one of 6 clinical application categories. Deep learning use increases but conventional machine learning approaches are still prevalent. Deep learning remains challenged when data is scarce and there is little evidence of adoption into clinical practice. Despite 17% of studies reporting greater than 0.85 F1 scores, it is hard to comparatively evaluate these approaches given that most of them use different datasets. Only 14 studies made their data and 15 their code available with 10 externally validating results. Automated understanding of clinical narratives of the radiology reports has the potential to enhance the healthcare process but reproducibility and explainability of models are important if the domain is to move applications into clinical use. More could be done to share code enabling validation of methods on different institutional data and to reduce heterogeneity in reporting of study properties allowing inter-study comparisons. Our results have significance for researchers providing a systematic synthesis of existing work to build on, identify gaps, opportunities for collaboration and avoid duplication.
△ Less
Submitted 18 February, 2021;
originally announced February 2021.
-
Electromagnetic Data Libraries: recent evolutions and new perspectives
Authors:
Doina Cristina Duma,
Sandra Parlati,
Maria Grazia Pia,
Elisabetta Ronchieri,
Paolo Saracco
Abstract:
This paper summarizes the current status of the electromagnetic data libraries, reviews recent experimental validation results, highlights open issues and introduces new perspectives for the future of these data libraries taking shape in the context of INFN research. Special emphasis is given to the characteristics of reliability, transparency and openness, along with opportunities for the improve…
▽ More
This paper summarizes the current status of the electromagnetic data libraries, reviews recent experimental validation results, highlights open issues and introduces new perspectives for the future of these data libraries taking shape in the context of INFN research. Special emphasis is given to the characteristics of reliability, transparency and openness, along with opportunities for the improvement and the extension of the physics content.
△ Less
Submitted 5 February, 2020; v1 submitted 29 January, 2020;
originally announced January 2020.
-
Talking to myself: self-dialogues as data for conversational agents
Authors:
Joachim Fainberg,
Ben Krause,
Mihai Dobre,
Marco Damonte,
Emmanuel Kahembwe,
Daniel Duma,
Bonnie Webber,
Federico Fancellu
Abstract:
Conversational agents are gaining popularity with the increasing ubiquity of smart devices. However, training agents in a data driven manner is challenging due to a lack of suitable corpora. This paper presents a novel method for gathering topical, unstructured conversational data in an efficient way: self-dialogues through crowd-sourcing. Alongside this paper, we include a corpus of 3.6 million w…
▽ More
Conversational agents are gaining popularity with the increasing ubiquity of smart devices. However, training agents in a data driven manner is challenging due to a lack of suitable corpora. This paper presents a novel method for gathering topical, unstructured conversational data in an efficient way: self-dialogues through crowd-sourcing. Alongside this paper, we include a corpus of 3.6 million words across 23 topics. We argue the utility of the corpus by comparing self-dialogues with standard two-party conversations as well as data from other corpora.
△ Less
Submitted 19 September, 2018; v1 submitted 18 September, 2018;
originally announced September 2018.
-
Edina: Building an Open Domain Socialbot with Self-dialogues
Authors:
Ben Krause,
Marco Damonte,
Mihai Dobre,
Daniel Duma,
Joachim Fainberg,
Federico Fancellu,
Emmanuel Kahembwe,
Jianpeng Cheng,
Bonnie Webber
Abstract:
We present Edina, the University of Edinburgh's social bot for the Amazon Alexa Prize competition. Edina is a conversational agent whose responses utilize data harvested from Amazon Mechanical Turk (AMT) through an innovative new technique we call self-dialogues. These are conversations in which a single AMT Worker plays both participants in a dialogue. Such dialogues are surprisingly natural, eff…
▽ More
We present Edina, the University of Edinburgh's social bot for the Amazon Alexa Prize competition. Edina is a conversational agent whose responses utilize data harvested from Amazon Mechanical Turk (AMT) through an innovative new technique we call self-dialogues. These are conversations in which a single AMT Worker plays both participants in a dialogue. Such dialogues are surprisingly natural, efficient to collect and reflective of relevant and/or trending topics. These self-dialogues provide training data for a generative neural network as well as a basis for soft rules used by a matching score component. Each match of a soft rule against a user utterance is associated with a confidence score which we show is strongly indicative of reply quality, allowing this component to self-censor and be effectively integrated with other components. Edina's full architecture features a rule-based system backing off to a matching score, backing off to a generative neural network. Our hybrid data-driven methodology thus addresses both coverage limitations of a strictly rule-based approach and the lack of guarantees of a strictly machine-learning approach.
△ Less
Submitted 28 September, 2017;
originally announced September 2017.
-
Comparing disease control policies for interacting wild populations
Authors:
Iulia Martina Bulai,
Roberto Cavoretto,
Bruna Chialva,
Davide Duma,
Ezio Venturino
Abstract:
We consider interacting population systems of predator-prey type, presenting four models of control strategies for epidemics among the prey. In particular to contain the transmissible disease, safety niches are considered, assuming they lessen the disease spread, but do not protect prey from predators. This represents a novelty with respect to standard ecosystems where the refuge prevents predator…
▽ More
We consider interacting population systems of predator-prey type, presenting four models of control strategies for epidemics among the prey. In particular to contain the transmissible disease, safety niches are considered, assuming they lessen the disease spread, but do not protect prey from predators. This represents a novelty with respect to standard ecosystems where the refuge prevents predators' attacks. The niche is assumed either to protect the healthy individuals, or to hinder the infected ones to get in contact with the susceptibles, or finally to reduce altogether contacts that might lead to new cases of the infection. In addition a standard culling procedure is also analysed. The effectiveness of the different strategies are compared. Probably the environments providing a place where disease carriers cannot come in contact with the healthy individuals, or where their contact rates are lowered, seem to preferable for disease containment.
△ Less
Submitted 3 March, 2014;
originally announced March 2014.
-
Accurate Decoding of Pooled Sequenced Data Using Compressed Sensing
Authors:
Denisa Duma,
Mary Wootters,
Anna C. Gilbert,
Hung Q. Ngo,
Atri Rudra,
Matthew Alpert,
Timothy J. Close,
Gianfranco Ciardo,
Stefano Lonardi
Abstract:
In order to overcome the limitations imposed by DNA barcoding when multiplexing a large number of samples in the current generation of high-throughput sequencing instruments, we have recently proposed a new protocol that leverages advances in combinatorial pooling design (group testing) doi:10.1371/journal.pcbi.1003010. We have also demonstrated how this new protocol would enable de novo selective…
▽ More
In order to overcome the limitations imposed by DNA barcoding when multiplexing a large number of samples in the current generation of high-throughput sequencing instruments, we have recently proposed a new protocol that leverages advances in combinatorial pooling design (group testing) doi:10.1371/journal.pcbi.1003010. We have also demonstrated how this new protocol would enable de novo selective sequencing and assembly of large, highly-repetitive genomes. Here we address the problem of decoding pooled sequenced data obtained from such a protocol. Our algorithm employs a synergistic combination of ideas from compressed sensing and the decoding of error-correcting codes. Experimental results on synthetic data for the rice genome and real data for the barley genome show that our novel decoding algorithm enables significantly higher quality assemblies than the previous approach.
△ Less
Submitted 30 July, 2013;
originally announced July 2013.
-
Barcoding-free BAC Pooling Enables Combinatorial Selective Sequencing of the Barley Gene Space
Authors:
Stefano Lonardi,
Denisa Duma,
Matthew Alpert,
Francesca Cordero,
Marco Beccuti,
Prasanna R. Bhat,
Yonghui Wu,
Gianfranco Ciardo,
Burair Alsaihati,
Yaqin Ma,
Steve Wanamaker,
Josh Resnik,
Timothy J. Close
Abstract:
We propose a new sequencing protocol that combines recent advances in combinatorial pooling design and second-generation sequencing technology to efficiently approach de novo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when dealing with hundreds or thousands of DNA samples, such as genome-tiling gene-rich…
▽ More
We propose a new sequencing protocol that combines recent advances in combinatorial pooling design and second-generation sequencing technology to efficiently approach de novo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when dealing with hundreds or thousands of DNA samples, such as genome-tiling gene-rich BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundreds of million of short reads and assign them to the correct BAC clones so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is extremely accurate (99.57% of the deconvoluted reads are assigned to the correct BAC), and the resulting BAC assemblies have very high quality (BACs are covered by contigs over about 77% of their length, on average). Experimental results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate (almost 70% of left/right pairs in paired-end reads are assigned to the same BAC, despite being processed independently) and the BAC assemblies have good quality (the average sum of all assembled contigs is about 88% of the estimated BAC length).
△ Less
Submitted 19 December, 2011;
originally announced December 2011.