Search | arXiv e-print repository

arXiv:2401.15222 [pdf, other]

Transfer Learning for the Prediction of Entity Modifiers in Clinical Text: Application to Opioid Use Disorder Case Detection

Authors: Abdullateef I. Almudaifer, Whitney Covington, JaMor Hairston, Zachary Deitch, Ankit Anand, Caleb M. Carroll, Estera Crisan, William Bradford, Lauren Walter, Eaton Ellen, Sue S. Feldman, John D. Osborne

Abstract: Background: The semantics of entities extracted from a clinical text can be dramatically altered by modifiers, including entity negation, uncertainty, conditionality, severity, and subject. Existing models for determining modifiers of clinical entities involve regular expression or features weights that are trained independently for each modifier. Methods: We develop and evaluate a multi-task tr… ▽ More Background: The semantics of entities extracted from a clinical text can be dramatically altered by modifiers, including entity negation, uncertainty, conditionality, severity, and subject. Existing models for determining modifiers of clinical entities involve regular expression or features weights that are trained independently for each modifier. Methods: We develop and evaluate a multi-task transformer architecture design where modifiers are learned and predicted jointly using the publicly available SemEval 2015 Task 14 corpus and a new Opioid Use Disorder (OUD) data set that contains modifiers shared with SemEval as well as novel modifiers specific for OUD. We evaluate the effectiveness of our multi-task learning approach versus previously published systems and assess the feasibility of transfer learning for clinical entity modifiers when only a portion of clinical modifiers are shared. Results: Our approach achieved state-of-the-art results on the ShARe corpus from SemEval 2015 Task 14, showing an increase of 1.1% on weighted accuracy, 1.7% on unweighted accuracy, and 10% on micro F1 scores. Conclusions: We show that learned weights from our shared model can be effectively transferred to a new partially matched data set, validating the use of transfer learning for clinical text modifiers △ Less

Submitted 5 February, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

Comments: 18 pages, 2 figures, 6 tables. To be submitted to the Journal of Biomedical Semantics

arXiv:2210.16125 [pdf]

BRATsynthetic: Text De-identification using a Markov Chain Replacement Strategy for Surrogate Personal Identifying Information

Authors: John D. Osborne, Tobias O'Leary, Akhil Nadimpalli, Salma M. Aly., Richard E. Kennedy

Abstract: Objective: Implement and assess personal health identifying information (PHI) substitution strategies and quantify their privacy preserving benefits. Materials and Methods: We implement and assess 3 different `Hiding in Plain Sight` (HIPS) strategies for PHI replacement including a standard Consistent replacement strategy, a Random replacement strategy and a novel Markov model-based strategy. We… ▽ More Objective: Implement and assess personal health identifying information (PHI) substitution strategies and quantify their privacy preserving benefits. Materials and Methods: We implement and assess 3 different `Hiding in Plain Sight` (HIPS) strategies for PHI replacement including a standard Consistent replacement strategy, a Random replacement strategy and a novel Markov model-based strategy. We evaluate the privacy preserving benefits of these strategies on a synthetic PHI distribution and real clinical corpora from 2 different institutions using a range of false negative error rates (FNER). Results: Using FNER ranging from 0.1% to 5% PHI leakage at the document level could be reduced from 27.1% to 0.1% (0.1% FNER) and from 94.2% to 57.7% (5% FNER) utilizing the Markov chain strategy versus the Consistent strategy on a corpus containing a diverse set of notes from the University of Alabama at Birmingham (UAB). The Markov chain substitution strategy also consistently outperformed the Consistent and Random substitution strategies in a MIMIC corpus of discharge summaries and on a range of synthetic clinical PHI distributions. Discussion: We demonstrate that a Markov chain surrogate generation strategy substantially reduces the chance of inadvertent PHI release across a range of assumed PHI FNER and release our implementation `BRATsynthetic` on Github. Conclusion: The Markov chain replacement strategy allows for the release of larger de-identified corpora at the same risk level relative to corpora released using a consistent HIPS strategy. △ Less

Submitted 28 October, 2022; originally announced October 2022.

ACM Class: I.6.6; D.0

arXiv:2110.10780 [pdf]

An Open Natural Language Processing Development Framework for EHR-based Clinical Research: A case demonstration using the National COVID Cohort Collaborative (N3C)

Authors: Sijia Liu, Andrew Wen, Liwei Wang, Huan He, Sunyang Fu, Robert Miller, Andrew Williams, Daniel Harris, Ramakanth Kavuluru, Mei Liu, Noor Abu-el-rub, Dalton Schutte, Rui Zhang, Masoud Rouhizadeh, John D. Osborne, Yongqun He, Umit Topaloglu, Stephanie S Hong, Joel H Saltz, Thomas Schaffter, Emily Pfaff, Christopher G. Chute, Tim Duong, Melissa A. Haendel, Rafael Fuentes , et al. (7 additional authors not shown)

Abstract: While we pay attention to the latest advances in clinical natural language processing (NLP), we can notice some resistance in the clinical and translational research community to adopt NLP models due to limited transparency, interpretability, and usability. In this study, we proposed an open natural language processing development framework. We evaluated it through the implementation of NLP algori… ▽ More While we pay attention to the latest advances in clinical natural language processing (NLP), we can notice some resistance in the clinical and translational research community to adopt NLP models due to limited transparency, interpretability, and usability. In this study, we proposed an open natural language processing development framework. We evaluated it through the implementation of NLP algorithms for the National COVID Cohort Collaborative (N3C). Based on the interests in information extraction from COVID-19 related clinical notes, our work includes 1) an open data annotation process using COVID-19 signs and symptoms as the use case, 2) a community-driven ruleset composing platform, and 3) a synthetic text data generation workflow to generate texts for information extraction tasks without involving human subjects. The corpora were derived from texts from three different institutions (Mayo Clinic, University of Kentucky, University of Minnesota). The gold standard annotations were tested with a single institution's (Mayo) ruleset. This resulted in performances of 0.876, 0.706, and 0.694 in F-scores for Mayo, Minnesota, and Kentucky test datasets, respectively. The study as a consortium effort of the N3C NLP subgroup demonstrates the feasibility of creating a federated NLP algorithm development and benchmarking platform to enhance multi-institution clinical NLP study and adoption. Although we use COVID-19 as a use case in this effort, our framework is general enough to be applied to other domains of interest in clinical NLP. △ Less

Submitted 21 March, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

Comments: update on contents

arXiv:1402.1668 [pdf, ps, other]

Evaluation of YTEX and MetaMap for clinical concept recognition

Authors: John David Osborne, Binod Gyawali, Thamar Solorio

Abstract: We used MetaMap and YTEX as a basis for the construc- tion of two separate systems to participate in the 2013 ShARe/CLEF eHealth Task 1[9], the recognition of clinical concepts. No modifications were directly made to these systems, but output concepts were filtered using stop concepts, stop concept text and UMLS semantic type. Con- cept boundaries were also adjusted using a small collection of rul… ▽ More We used MetaMap and YTEX as a basis for the construc- tion of two separate systems to participate in the 2013 ShARe/CLEF eHealth Task 1[9], the recognition of clinical concepts. No modifications were directly made to these systems, but output concepts were filtered using stop concepts, stop concept text and UMLS semantic type. Con- cept boundaries were also adjusted using a small collection of rules to increase precision on the strict task. Overall MetaMap had better per- formance than YTEX on the strict task, primarily due to a 20% perfor- mance improvement in precision. In the relaxed task YTEX had better performance in both precision and recall giving it an overall F-Score 4.6% higher than MetaMap on the test data. Our results also indicated a 1.3% higher accuracy for YTEX in UMLS CUI map**. △ Less

Submitted 7 February, 2014; originally announced February 2014.

Comments: 6 pages, working notes to the ShareClef eHealth 2013 Shared Task

MSC Class: 68

Showing 1–4 of 4 results for author: Osborne, J D