-
An Empirical Study of UMLS Concept Extraction from Clinical Notes using Boolean Combination Ensembles
Authors:
Greg M. Silverman,
Raymond L. Finzel,
Michael V. Heinz,
Jake Vasilakes,
Jacob C. Solinsky,
Reed McEwan,
Benjamin C. Knoll,
Christopher J. Tignanelli,
Hongfang Liu,
Hua Xu,
Xiaoqian Jiang,
Genevieve B. Melton,
Serguei VS Pakhomov
Abstract:
Our objective in this study is to investigate the behavior of Boolean operators on combining annotation output from multiple Natural Language Processing (NLP) systems across multiple corpora and to assess how filtering by aggregation of Unified Medical Language System (UMLS) Metathesaurus concepts affects system performance for Named Entity Recognition (NER) of UMLS concepts. We used three corpora…
▽ More
Our objective in this study is to investigate the behavior of Boolean operators on combining annotation output from multiple Natural Language Processing (NLP) systems across multiple corpora and to assess how filtering by aggregation of Unified Medical Language System (UMLS) Metathesaurus concepts affects system performance for Named Entity Recognition (NER) of UMLS concepts. We used three corpora annotated for UMLS concepts: 2010 i2b2 VA challenge set (31,161 annotations), Multi-source Integrated Platform for Answering Clinical Questions (MiPACQ) corpus (17,457 annotations including UMLS concept unique identifiers), and Fairview Health Services corpus (44,530 annotations). Our results showed that for UMLS concept matching, Boolean ensembling of the MiPACQ corpus trended towards higher performance over individual systems. Use of an approximate grid-search can help optimize the precision-recall tradeoff and can provide a set of heuristics for choosing an optimal set of ensembles.
△ Less
Submitted 4 August, 2021;
originally announced August 2021.
-
Joint Elastic Side-Scattering Lidar and Raman Lidar Measurements of Aerosol Optical Properties in South East Colorado
Authors:
L. Wiencke,
V. Rizi,
M. Will,
C. Allen,
A. Botts,
M. Calhoun,
B. Carande,
J. Claus,
M. Coco,
L. Emmert,
S. Esquibel,
A. F. Grillo,
L. Hamilton,
T. J. Heid,
M. Iarlori,
H. -O. Klages,
M. Kleifges,
B. Knoll,
J. Koop,
H. -J. Mathes,
A. Menshikov,
S. Morgan,
L. Patterson,
S. Petrera,
S. Robinson
, et al. (5 additional authors not shown)
Abstract:
We describe an experiment, located in south-east Colorado, USA, that measured aerosol optical depth profiles using two Lidar techniques. Two independent detectors measured scattered light from a vertical UV laser beam. One detector, located at the laser site, measured light via the inelastic Raman backscattering process. This is a common method used in atmospheric science for measuring aerosol opt…
▽ More
We describe an experiment, located in south-east Colorado, USA, that measured aerosol optical depth profiles using two Lidar techniques. Two independent detectors measured scattered light from a vertical UV laser beam. One detector, located at the laser site, measured light via the inelastic Raman backscattering process. This is a common method used in atmospheric science for measuring aerosol optical depth profiles. The other detector, located approximately 40km distant, viewed the laser beam from the side. This detector featured a 3.5m2 mirror and measured elastically scattered light in a bistatic Lidar configuration following the method used at the Pierre Auger cosmic ray observatory. The goal of this experiment was to assess and improve methods to measure atmospheric clarity, specifically aerosol optical depth profiles, for cosmic ray UV fluorescence detectors that use the atmosphere as a giant calorimeter. The experiment collected data from September 2010 to July 2011 under varying conditions of aerosol loading. We describe the instruments and techniques and compare the aerosol optical depth profiles measured by the Raman and bistatic Lidar detectors.
△ Less
Submitted 14 April, 2017;
originally announced April 2017.
-
A Machine Learning Perspective on Predictive Coding with PAQ
Authors:
Byron Knoll,
Nando de Freitas
Abstract:
PAQ8 is an open source lossless data compression algorithm that currently achieves the best compression rates on many benchmarks. This report presents a detailed description of PAQ8 from a statistical machine learning perspective. It shows that it is possible to understand some of the modules of PAQ8 and use this understanding to improve the method. However, intuitive statistical explanations of t…
▽ More
PAQ8 is an open source lossless data compression algorithm that currently achieves the best compression rates on many benchmarks. This report presents a detailed description of PAQ8 from a statistical machine learning perspective. It shows that it is possible to understand some of the modules of PAQ8 and use this understanding to improve the method. However, intuitive statistical explanations of the behavior of other modules remain elusive. We hope the description in this report will be a starting point for discussions that will increase our understanding, lead to improvements to PAQ8, and facilitate a transfer of knowledge from PAQ8 to other machine learning methods, such a recurrent neural networks and stochastic memoizers. Finally, the report presents a broad range of new applications of PAQ to machine learning tasks including language modeling and adaptive text prediction, adaptive game playing, classification, and compression using features from the field of deep learning.
△ Less
Submitted 16 August, 2011;
originally announced August 2011.