-
llmNER: (Zero|Few)-Shot Named Entity Recognition, Exploiting the Power of Large Language Models
Authors:
Fabián Villena,
Luis Miranda,
Claudio Aracena
Abstract:
Large language models (LLMs) allow us to generate high-quality human-like text. One interesting task in natural language processing (NLP) is named entity recognition (NER), which seeks to detect mentions of relevant information in documents. This paper presents llmNER, a Python library for implementing zero-shot and few-shot NER with LLMs; by providing an easy-to-use interface, llmNER can compose…
▽ More
Large language models (LLMs) allow us to generate high-quality human-like text. One interesting task in natural language processing (NLP) is named entity recognition (NER), which seeks to detect mentions of relevant information in documents. This paper presents llmNER, a Python library for implementing zero-shot and few-shot NER with LLMs; by providing an easy-to-use interface, llmNER can compose prompts, query the model, and parse the completion returned by the LLM. Also, the library enables the user to perform prompt engineering efficiently by providing a simple interface to test multiple variables. We validated our software on two NER tasks to show the library's flexibility. llmNER aims to push the boundaries of in-context learning research by removing the barrier of the prompting and parsing steps.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Explosively driven Richtmyer--Meshkov instability jet suppression and enhancement via coupling machine learning and additive manufacturing
Authors:
Dane M. Sterbentz,
Dylan J. Kline,
Daniel A. White,
Charles F. Jekel,
Michael P. Hennessey,
David K. Amondson,
Abigail J. Wilson,
Max J. Sevcik,
Matthew F. L. Villena,
Steve S. Lin,
Michael D. Grapes,
Kyle T. Sullivan,
Jonathan L. Belof
Abstract:
The ability to control the behavior of fluid instabilities at material interfaces, such as the shock-driven Richtmyer--Meshkov instability, is a grand technological challenge with a broad number of applications ranging from inertial confinement fusion experiments to explosively driven shaped charges. In this work, we use a linear-geometry shaped charge as a means of studying methods for controllin…
▽ More
The ability to control the behavior of fluid instabilities at material interfaces, such as the shock-driven Richtmyer--Meshkov instability, is a grand technological challenge with a broad number of applications ranging from inertial confinement fusion experiments to explosively driven shaped charges. In this work, we use a linear-geometry shaped charge as a means of studying methods for controlling material jetting that results from the Richtmyer--Meshkov instability. A shaped charge produces a high-velocity jet by focusing the energy from the detonation of high explosives. The interaction of the resulting detonation wave with a hollowed cavity lined with a thin metal layer produces the unstable jetting effect. By modifying characteristics of the detonation wave prior to striking the lined cavity, the kinetic energy of the jet can be enhanced or reduced. Modifying the geometry of the liner material can also be used to alter jetting properties. We apply optimization methods to investigate several design parameterizations for both enhancing or suppressing the shaped-charge jet. This is accomplished using 2D and 3D hydrodynamic simulations to investigate the design space that we consider. We also apply new additive manufacturing methods for producing the shaped-charge assemblies, which allow for experimental testing of complicated design geometries obtained through computational optimization. We present a direct comparison of our optimized designs with experimental results carried out at the High Explosives Application Facility at Lawrence Livermore National Laboratory.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Automatic Coding at Scale: Design and Deployment of a Nationwide System for Normalizing Referrals in the Chilean Public Healthcare System
Authors:
Fabián Villena,
Matías Rojas,
Felipe Arias,
Jorge Pacheco,
Paulina Vera,
Jocelyn Dunstan
Abstract:
The disease coding task involves assigning a unique identifier from a controlled vocabulary to each disease mentioned in a clinical document. This task is relevant since it allows information extraction from unstructured data to perform, for example, epidemiological studies about the incidence and prevalence of diseases in a determined context. However, the manual coding process is subject to erro…
▽ More
The disease coding task involves assigning a unique identifier from a controlled vocabulary to each disease mentioned in a clinical document. This task is relevant since it allows information extraction from unstructured data to perform, for example, epidemiological studies about the incidence and prevalence of diseases in a determined context. However, the manual coding process is subject to errors as it requires medical personnel to be competent in coding rules and terminology. In addition, this process consumes a lot of time and energy, which could be allocated to more clinically relevant tasks. These difficulties can be addressed by develo** computational systems that automatically assign codes to diseases. In this way, we propose a two-step system for automatically coding diseases in referrals from the Chilean public healthcare system. Specifically, our model uses a state-of-the-art NER model for recognizing disease mentions and a search engine system based on Elasticsearch for assigning the most relevant codes associated with these disease mentions. The system's performance was evaluated on referrals manually coded by clinical experts. Our system obtained a MAP score of 0.63 for the subcategory level and 0.83 for the category level, close to the best-performing models in the literature. This system could be a support tool for health professionals, optimizing the coding and management process. Finally, to guarantee reproducibility, we publicly release the code of our models and experiments.
△ Less
Submitted 9 July, 2023;
originally announced July 2023.
-
LaboRecommender: A crazy-easy to use Python-based recommender system for laboratory tests
Authors:
Fabián Villena
Abstract:
Laboratory tests play a major role in clinical decision making because they are essential for the confirmation of diagnostics suspicions and influence medical decisions. The number of different laboratory tests available to physicians in our age has been expanding very rapidly due to the rapid advances in laboratory technology. To find the correct desired tests within this expanding plethora of el…
▽ More
Laboratory tests play a major role in clinical decision making because they are essential for the confirmation of diagnostics suspicions and influence medical decisions. The number of different laboratory tests available to physicians in our age has been expanding very rapidly due to the rapid advances in laboratory technology. To find the correct desired tests within this expanding plethora of elements, the Health Information System must provide a powerful search engine and the practitioner need to remember the exact name of the laboratory test to correctly select the bag of tests to order. Recommender systems are platforms which suggest appropriate items to a user after learning the users' behaviour. A neighbourhood-based collaborative filtering method was used to model the recommender system, where similar bags, clustered using nearest neighbours algorithm, are used to make recommendations of tests for each other similar bag of laboratory tests. The recommender system developed in this paper achieved 95.54 % in the mean average precision metric. A fully documented Python package named LaboRecommender was developed to implement the algorithm proposed in this paper
△ Less
Submitted 3 May, 2021;
originally announced May 2021.
-
Maxwell Parallel Imaging
Authors:
Matteo Alessandro Francavilla,
Stamatios Lefkimmiatis,
Jorge F. Villena,
Athanasios G. Polimeridis
Abstract:
Purpose: To develop a general framework for Parallel Imaging (PI) with the use of Maxwell regularization for the estimation of the sensitivity maps (SMs) and constrained optimization for the parameter-free image reconstruction.
Theory and Methods: Certain characteristics of both the SMs and the images are routinely used to regularize the otherwise ill-posed optimization-based joint reconstructio…
▽ More
Purpose: To develop a general framework for Parallel Imaging (PI) with the use of Maxwell regularization for the estimation of the sensitivity maps (SMs) and constrained optimization for the parameter-free image reconstruction.
Theory and Methods: Certain characteristics of both the SMs and the images are routinely used to regularize the otherwise ill-posed optimization-based joint reconstruction from highly accelerated PI data. In this paper we rely on a fundamental property of SMs--they are solutions of Maxwell equations-- we construct the subspace of all possible SM distributions supported in a given field-of-view, and we promote solutions of SMs that belong in this subspace. In addition, we propose a constrained optimization scheme for the image reconstruction, as a second step, once an accurate estimation of the SMs is available. The resulting method, dubbed Maxwell Parallel Imaging (MPI), works seamlessly for arbitrary sequences (both 2D and 3D) with any trajectory and minimal calibration signals.
Results: The effectiveness of MPI is illustrated for a wide range of datasets with various undersampling schemes, including radial, variable-density Poisson-disc, and Cartesian, and is compared against the state-of-the-art PI methods. Finally, we include some numerical experiments that demonstrate the memory footprint reduction of the constructed Maxwell basis with the help of tensor decomposition, thus allowing the use of MPI for full 3D image reconstructions.
Conclusions: The MPI framework provides a physics-inspired optimization method for the accurate and efficient image reconstruction from arbitrary accelerated scans.
△ Less
Submitted 20 August, 2020;
originally announced August 2020.
-
Bayesian Manifold-Constrained-Prior Model for an Experiment to Locate Xce
Authors:
Alan B. Lenarcic,
John D. Calaway,
Fernando Pardo-Manuel de Villena,
William Valdar
Abstract:
We propose an analysis for a novel experiment intended to locate the genetic locus Xce (X-chromosome controlling element), which biases the stochastic process of X-inactivation in the mouse. X-inactivation bias is a phenomenon where cells in the embryo randomly choose one parental chromosome to inactivate, but show an average bias towards one parental strain. Measurement of allele-specific gene-ex…
▽ More
We propose an analysis for a novel experiment intended to locate the genetic locus Xce (X-chromosome controlling element), which biases the stochastic process of X-inactivation in the mouse. X-inactivation bias is a phenomenon where cells in the embryo randomly choose one parental chromosome to inactivate, but show an average bias towards one parental strain. Measurement of allele-specific gene-expression through pyrosequencing was conducted on mouse crosses of an uncharacterized parent with known carriers. Our Bayesian analysis is suitable for this adaptive experimental design, accounting for the biases and differences in precision among genes. Model identifiability is facilitated by priors constrained to a manifold. We show that reparameterized slice-sampling can suitably tackle a general class of constrained priors. We demonstrate a physical model, based upon a "weighted-coin" hypothesis, that predicts X-inactivation ratios in untested crosses. This model suggests that Xce alleles differ due to a process known as copy number variation, where stronger Xce alleles are shorter sequences.
△ Less
Submitted 20 December, 2018;
originally announced December 2018.
-
IsoDOT Detects Differential RNA-isoform Expression/Usage with respect to a Categorical or Continuous Covariate with High Sensitivity and Specificity
Authors:
Wei Sun,
Yufeng Liu,
James J. Crowley,
Ting-Huei Chen,
Hua Zhou,
Haitao Chu,
Shun** Huang,
Pei-Fen Kuan,
Yuan Li,
Darla Miller,
Ginger Shaw,
Yichao Wu,
Vasyl Zhabotynsky,
Leonard McMillan,
Fei Zou,
Patrick F. Sullivan,
Fernando Pardo-Manuel de Villena
Abstract:
We have developed a statistical method named IsoDOT to assess differential isoform expression (DIE) and differential isoform usage (DIU) using RNA-seq data. Here isoform usage refers to relative isoform expression given the total expression of the corresponding gene. IsoDOT performs two tasks that cannot be accomplished by existing methods: to test DIE/DIU with respect to a continuous covariate, a…
▽ More
We have developed a statistical method named IsoDOT to assess differential isoform expression (DIE) and differential isoform usage (DIU) using RNA-seq data. Here isoform usage refers to relative isoform expression given the total expression of the corresponding gene. IsoDOT performs two tasks that cannot be accomplished by existing methods: to test DIE/DIU with respect to a continuous covariate, and to test DIE/DIU for one case versus one control. The latter task is not an uncommon situation in practice, e.g., comparing paternal and maternal allele of one individual or comparing tumor and normal sample of one cancer patient. Simulation studies demonstrate the high sensitivity and specificity of IsoDOT. We apply IsoDOT to study the effects of haloperidol treatment on mouse transcriptome and identify a group of genes whose isoform usages respond to haloperidol treatment.
△ Less
Submitted 29 October, 2014; v1 submitted 1 February, 2014;
originally announced February 2014.