Search | arXiv e-print repository

doi 10.1186/s12911-021-01533-7

A Systematic Review of Natural Language Processing Applied to Radiology Reports

Authors: Arlene Casey, Emma Davidson, Michael Poon, Hang Dong, Daniel Duma, Andreas Grivas, Claire Grover, Víctor Suárez-Paniagua, Richard Tobin, William Whiteley, Honghan Wu, Beatrice Alex

Abstract: NLP has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses recent literature in NLP applied to radiology reports. Our automated literature search yields 4,799… ▽ More NLP has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses recent literature in NLP applied to radiology reports. Our automated literature search yields 4,799 results using automated filtering, metadata enriching steps and citation search combined with manual review. Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics. We present a comprehensive analysis of the 164 publications retrieved with each categorised into one of 6 clinical application categories. Deep learning use increases but conventional machine learning approaches are still prevalent. Deep learning remains challenged when data is scarce and there is little evidence of adoption into clinical practice. Despite 17% of studies reporting greater than 0.85 F1 scores, it is hard to comparatively evaluate these approaches given that most of them use different datasets. Only 14 studies made their data and 15 their code available with 10 externally validating results. Automated understanding of clinical narratives of the radiology reports has the potential to enhance the healthcare process but reproducibility and explainability of models are important if the domain is to move applications into clinical use. More could be done to share code enabling validation of methods on different institutional data and to reduce heterogeneity in reporting of study properties allowing inter-study comparisons. Our results have significance for researchers providing a systematic synthesis of existing work to build on, identify gaps, opportunities for collaboration and avoid duplication. △ Less

Submitted 18 February, 2021; originally announced February 2021.

Journal ref: BMC Medical Informatics and Decision Making 2021

arXiv:2001.10978 [pdf, other]

doi 10.1088/1748-0221/15/03/C03032

Electromagnetic Data Libraries: recent evolutions and new perspectives

Authors: Doina Cristina Duma, Sandra Parlati, Maria Grazia Pia, Elisabetta Ronchieri, Paolo Saracco

Abstract: This paper summarizes the current status of the electromagnetic data libraries, reviews recent experimental validation results, highlights open issues and introduces new perspectives for the future of these data libraries taking shape in the context of INFN research. Special emphasis is given to the characteristics of reliability, transparency and openness, along with opportunities for the improve… ▽ More This paper summarizes the current status of the electromagnetic data libraries, reviews recent experimental validation results, highlights open issues and introduces new perspectives for the future of these data libraries taking shape in the context of INFN research. Special emphasis is given to the characteristics of reliability, transparency and openness, along with opportunities for the improvement and the extension of the physics content. △ Less

Submitted 5 February, 2020; v1 submitted 29 January, 2020; originally announced January 2020.

Comments: Submitted to Proc. 15th Topical Seminar on Innovative Particle and Radiation Detectors (IPRD19), 14-17 October 2019, Siena, Italy

arXiv:1809.06641 [pdf, other]

Talking to myself: self-dialogues as data for conversational agents

Authors: Joachim Fainberg, Ben Krause, Mihai Dobre, Marco Damonte, Emmanuel Kahembwe, Daniel Duma, Bonnie Webber, Federico Fancellu

Abstract: Conversational agents are gaining popularity with the increasing ubiquity of smart devices. However, training agents in a data driven manner is challenging due to a lack of suitable corpora. This paper presents a novel method for gathering topical, unstructured conversational data in an efficient way: self-dialogues through crowd-sourcing. Alongside this paper, we include a corpus of 3.6 million w… ▽ More Conversational agents are gaining popularity with the increasing ubiquity of smart devices. However, training agents in a data driven manner is challenging due to a lack of suitable corpora. This paper presents a novel method for gathering topical, unstructured conversational data in an efficient way: self-dialogues through crowd-sourcing. Alongside this paper, we include a corpus of 3.6 million words across 23 topics. We argue the utility of the corpus by comparing self-dialogues with standard two-party conversations as well as data from other corpora. △ Less

Submitted 19 September, 2018; v1 submitted 18 September, 2018; originally announced September 2018.

Comments: 5 pages, 5 pages appendix, 2 figures

arXiv:1709.09816 [pdf, other]

Edina: Building an Open Domain Socialbot with Self-dialogues

Authors: Ben Krause, Marco Damonte, Mihai Dobre, Daniel Duma, Joachim Fainberg, Federico Fancellu, Emmanuel Kahembwe, Jianpeng Cheng, Bonnie Webber

Abstract: We present Edina, the University of Edinburgh's social bot for the Amazon Alexa Prize competition. Edina is a conversational agent whose responses utilize data harvested from Amazon Mechanical Turk (AMT) through an innovative new technique we call self-dialogues. These are conversations in which a single AMT Worker plays both participants in a dialogue. Such dialogues are surprisingly natural, eff… ▽ More We present Edina, the University of Edinburgh's social bot for the Amazon Alexa Prize competition. Edina is a conversational agent whose responses utilize data harvested from Amazon Mechanical Turk (AMT) through an innovative new technique we call self-dialogues. These are conversations in which a single AMT Worker plays both participants in a dialogue. Such dialogues are surprisingly natural, efficient to collect and reflective of relevant and/or trending topics. These self-dialogues provide training data for a generative neural network as well as a basis for soft rules used by a matching score component. Each match of a soft rule against a user utterance is associated with a confidence score which we show is strongly indicative of reply quality, allowing this component to self-censor and be effectively integrated with other components. Edina's full architecture features a rule-based system backing off to a matching score, backing off to a generative neural network. Our hybrid data-driven methodology thus addresses both coverage limitations of a strictly rule-based approach and the lack of guarantees of a strictly machine-learning approach. △ Less

Submitted 28 September, 2017; originally announced September 2017.

Comments: 10 pages; submitted to the 1st Proceedings of the Alexa Prize

arXiv:1403.0472 [pdf, ps, other]

Comparing disease control policies for interacting wild populations

Authors: Iulia Martina Bulai, Roberto Cavoretto, Bruna Chialva, Davide Duma, Ezio Venturino

Abstract: We consider interacting population systems of predator-prey type, presenting four models of control strategies for epidemics among the prey. In particular to contain the transmissible disease, safety niches are considered, assuming they lessen the disease spread, but do not protect prey from predators. This represents a novelty with respect to standard ecosystems where the refuge prevents predator… ▽ More We consider interacting population systems of predator-prey type, presenting four models of control strategies for epidemics among the prey. In particular to contain the transmissible disease, safety niches are considered, assuming they lessen the disease spread, but do not protect prey from predators. This represents a novelty with respect to standard ecosystems where the refuge prevents predators' attacks. The niche is assumed either to protect the healthy individuals, or to hinder the infected ones to get in contact with the susceptibles, or finally to reduce altogether contacts that might lead to new cases of the infection. In addition a standard culling procedure is also analysed. The effectiveness of the different strategies are compared. Probably the environments providing a place where disease carriers cannot come in contact with the healthy individuals, or where their contact rates are lowered, seem to preferable for disease containment. △ Less

Submitted 3 March, 2014; originally announced March 2014.

arXiv:1307.7810 [pdf, ps, other]

Accurate Decoding of Pooled Sequenced Data Using Compressed Sensing

Authors: Denisa Duma, Mary Wootters, Anna C. Gilbert, Hung Q. Ngo, Atri Rudra, Matthew Alpert, Timothy J. Close, Gianfranco Ciardo, Stefano Lonardi

Abstract: In order to overcome the limitations imposed by DNA barcoding when multiplexing a large number of samples in the current generation of high-throughput sequencing instruments, we have recently proposed a new protocol that leverages advances in combinatorial pooling design (group testing) doi:10.1371/journal.pcbi.1003010. We have also demonstrated how this new protocol would enable de novo selective… ▽ More In order to overcome the limitations imposed by DNA barcoding when multiplexing a large number of samples in the current generation of high-throughput sequencing instruments, we have recently proposed a new protocol that leverages advances in combinatorial pooling design (group testing) doi:10.1371/journal.pcbi.1003010. We have also demonstrated how this new protocol would enable de novo selective sequencing and assembly of large, highly-repetitive genomes. Here we address the problem of decoding pooled sequenced data obtained from such a protocol. Our algorithm employs a synergistic combination of ideas from compressed sensing and the decoding of error-correcting codes. Experimental results on synthetic data for the rice genome and real data for the barley genome show that our novel decoding algorithm enables significantly higher quality assemblies than the previous approach. △ Less

Submitted 30 July, 2013; originally announced July 2013.

Comments: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013)

arXiv:1112.4438 [pdf, ps, other]

Barcoding-free BAC Pooling Enables Combinatorial Selective Sequencing of the Barley Gene Space

Authors: Stefano Lonardi, Denisa Duma, Matthew Alpert, Francesca Cordero, Marco Beccuti, Prasanna R. Bhat, Yonghui Wu, Gianfranco Ciardo, Burair Alsaihati, Yaqin Ma, Steve Wanamaker, Josh Resnik, Timothy J. Close

Abstract: We propose a new sequencing protocol that combines recent advances in combinatorial pooling design and second-generation sequencing technology to efficiently approach de novo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when dealing with hundreds or thousands of DNA samples, such as genome-tiling gene-rich… ▽ More We propose a new sequencing protocol that combines recent advances in combinatorial pooling design and second-generation sequencing technology to efficiently approach de novo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when dealing with hundreds or thousands of DNA samples, such as genome-tiling gene-rich BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundreds of million of short reads and assign them to the correct BAC clones so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is extremely accurate (99.57% of the deconvoluted reads are assigned to the correct BAC), and the resulting BAC assemblies have very high quality (BACs are covered by contigs over about 77% of their length, on average). Experimental results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate (almost 70% of left/right pairs in paired-end reads are assigned to the same BAC, despite being processed independently) and the BAC assemblies have good quality (the average sum of all assembled contigs is about 88% of the estimated BAC length). △ Less

Submitted 19 December, 2011; originally announced December 2011.

Showing 1–7 of 7 results for author: Duma, D