Skip to main content

Showing 1–14 of 14 results for author: Alex, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14312  [pdf, other

    cs.CL cs.AI

    Infusing clinical knowledge into tokenisers for language models

    Authors: Abul Hasan, **ge Wu, Quang Ngoc Nguyen, Salomé Andres, Imane Guellil, Huayu Zhang, Arlene Casey, Beatrice Alex, Bruce Guthrie, Honghan Wu

    Abstract: This study introduces a novel knowledge enhanced tokenisation mechanism, K-Tokeniser, for clinical text processing. Technically, at initialisation stage, K-Tokeniser populates global representations of tokens based on semantic types of domain concepts (such as drugs or diseases) from either a domain ontology like Unified Medical Language System or the training data of the task related corpus. At t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages, 6 figures

  2. arXiv:2405.18028  [pdf, other

    cs.CL cs.AI

    Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints

    Authors: Aryo Pradipta Gema, Chaeeun Lee, Pasquale Minervini, Luke Daines, T. Ian Simpson, Beatrice Alex

    Abstract: The MEDIQA-CORR 2024 shared task aims to assess the ability of Large Language Models (LLMs) to identify and correct medical errors in clinical notes. In this study, we evaluate the capability of general LLMs, specifically GPT-3.5 and GPT-4, to identify and correct medical errors with multiple prompting strategies. Recognising the limitation of LLMs in generating accurate corrections only via promp… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  3. arXiv:2404.00484  [pdf, other

    cs.CL

    Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4

    Authors: Aryo Pradipta Gema, Giwon Hong, Pasquale Minervini, Luke Daines, Beatrice Alex

    Abstract: The NLI4CT task assesses Natural Language Inference systems in predicting whether hypotheses entail or contradict evidence from Clinical Trial Reports. In this study, we evaluate various Large Language Models (LLMs) with multiple strategies, including Chain-of-Thought, In-Context Learning, and Parameter-Efficient Fine-Tuning (PEFT). We propose a PEFT method to improve the consistency of LLMs by me… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  4. arXiv:2401.13512  [pdf, other

    cs.CL

    Can GPT-3.5 Generate and Code Discharge Summaries?

    Authors: Matúš Falis, Aryo Pradipta Gema, Hang Dong, Luke Daines, Siddharth Basetti, Michael Holder, Rose S Penfold, Alexandra Birch, Beatrice Alex

    Abstract: Objective: To investigate GPT-3.5 in generating and coding medical documents with ICD-10 codes for data augmentation on low-resources labels. Materials and Methods: Employing GPT-3.5 we generated and coded 9,606 discharge summaries based on lists of ICD-10 code descriptions of patients with infrequent (generation) codes within the MIMIC-IV dataset. Combined with the baseline training set, this f… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: 15 pages; 250 words in abstract; 3,929 words in main body; 2 figures (0 black and white, 2 colour); 4 tables; 34 references

  5. arXiv:2312.03747  [pdf, other

    cs.CL cs.AI cs.LG

    Classifying patient voice in social media data using neural networks: A comparison of AI models on different data sources and therapeutic domains

    Authors: Giorgos Lysandrou, Roma English Owen, Vanja Popovic, Grant Le Brun, Beatrice Alex, Elizabeth A. L. Fairley

    Abstract: It is essential that healthcare professionals and members of the healthcare community can access and easily understand patient experiences in the real world, so that care standards can be improved and driven towards personalised drug treatment. Social media platforms and message boards are deemed suitable sources of patient experience information, as patients have been observed to discuss and exch… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: 14 pages, 1 figure, 7 tables

  6. arXiv:2307.03042  [pdf, other

    cs.CL cs.LG

    Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain

    Authors: Aryo Pradipta Gema, Pasquale Minervini, Luke Daines, Tom Hope, Beatrice Alex

    Abstract: Adapting pretrained language models to novel domains, such as clinical applications, traditionally involves retraining their entire set of parameters. Parameter-Efficient Fine-Tuning (PEFT) techniques for fine-tuning language models significantly reduce computational requirements by selectively fine-tuning small subsets of parameters. In this study, we propose a two-step PEFT framework and evaluat… ▽ More

    Submitted 9 June, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

  7. Ontology-Driven and Weakly Supervised Rare Disease Identification from Clinical Notes

    Authors: Hang Dong, Víctor Suárez-Paniagua, Huayu Zhang, Minhong Wang, Arlene Casey, Emma Davidson, Jiaoyan Chen, Beatrice Alex, William Whiteley, Honghan Wu

    Abstract: Computational text phenoty** is the practice of identifying patients with certain disorders and traits from clinical notes. Rare diseases are challenging to be identified due to few cases available for machine learning and the need for data annotation from domain experts. We propose a method using ontologies and weak supervision, with recent pre-trained contextual representations from Bi-directi… ▽ More

    Submitted 3 May, 2023; v1 submitted 11 May, 2022; originally announced May 2022.

    Comments: Accepted for BMC Medical Informatics and Decision Making, structured abstract in full text, 16 pages, 4 figures (and extra 7 pages, 1 figure in the supplementary material)

    MSC Class: 68T50 (Primary); 68T30 (Secondary) ACM Class: I.2.7; J.3

  8. arXiv:2203.11092  [pdf

    cs.CL cs.AI

    Automated Clinical Coding: What, Why, and Where We Are?

    Authors: Hang Dong, Matúš Falis, William Whiteley, Beatrice Alex, Joshua Matterson, Shaoxiong Ji, Jiaoyan Chen, Honghan Wu

    Abstract: Clinical coding is the task of transforming medical information in a patient's health records into structured codes so that they can be used for statistical analysis. This is a cognitive and time-consuming task that follows a standard process in order to achieve a high level of consistency. Clinical coding could potentially be supported by an automated system to improve the efficiency and accuracy… ▽ More

    Submitted 9 October, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: accepted for npj Digital Medicine

    MSC Class: 68T07 (Primary); 68T50 (Secondary) ACM Class: I.2.7; J.3

  9. arXiv:2109.04853  [pdf, other

    cs.CL

    CoPHE: A Count-Preserving Hierarchical Evaluation Metric in Large-Scale Multi-Label Text Classification

    Authors: Matúš Falis, Hang Dong, Alexandra Birch, Beatrice Alex

    Abstract: Large-Scale Multi-Label Text Classification (LMTC) includes tasks with hierarchical label spaces, such as automatic assignment of ICD-9 codes to discharge summaries. Performance of models in prior art is evaluated with standard precision, recall, and F1 measures without regard for the rich hierarchical structure. In this work we argue for hierarchical evaluation of the predictions of neural LMTC m… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: 5 pages, 2 figures, EMNLP 2021

  10. arXiv:2105.07847  [pdf, other

    cs.CY cs.CL

    The Online Pivot: Lessons Learned from Teaching a Text and Data Mining Course in Lockdown, Enhancing online Teaching with Pair Programming and Digital Badges

    Authors: Beatrice Alex, Clare Llewellyn, Pawel Michal Orzechowski, Maria Boutchkova

    Abstract: In this paper we provide an account of how we ported a text and data mining course online in summer 2020 as a result of the COVID-19 pandemic and how we improved it in a second pilot run. We describe the course, how we adapted it over the two pilot runs and what teaching techniques we used to improve students' learning and community building online. We also provide information on the relentless fe… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

    Comments: 11 pages, 4 figures, to appear in the Proceedings of the Fifth Workshop on Teaching NLP @ NAACL 2021

  11. A Systematic Review of Natural Language Processing Applied to Radiology Reports

    Authors: Arlene Casey, Emma Davidson, Michael Poon, Hang Dong, Daniel Duma, Andreas Grivas, Claire Grover, Víctor Suárez-Paniagua, Richard Tobin, William Whiteley, Honghan Wu, Beatrice Alex

    Abstract: NLP has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses recent literature in NLP applied to radiology reports. Our automated literature search yields 4,799… ▽ More

    Submitted 18 February, 2021; originally announced February 2021.

    Journal ref: BMC Medical Informatics and Decision Making 2021

  12. arXiv:2011.05911  [pdf, ps, other

    cs.CL

    Situated Data, Situated Systems: A Methodology to Engage with Power Relations in Natural Language Processing Research

    Authors: Lucy Havens, Melissa Terras, Benjamin Bach, Beatrice Alex

    Abstract: We propose a bias-aware methodology to engage with power relations in natural language processing (NLP) research. NLP research rarely engages with bias in social contexts, limiting its ability to mitigate bias. While researchers have recommended actions, technical methods, and documentation practices, no methodology exists to integrate critical reflections on bias with technical NLP methods. In th… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: Accepted to the 2nd Workshop on Gender Bias in Natural Language Processing at COLING 2020

  13. Plague Dot Text: Text mining and annotation of outbreak reports of the Third Plague Pandemic (1894-1952)

    Authors: Arlene Casey, Mike Bennett, Richard Tobin, Claire Grover, Iona Walker, Lukas Engelmann, Beatrice Alex

    Abstract: The design of models that govern diseases in population is commonly built on information and data gathered from past outbreaks. However, epidemic outbreaks are never captured in statistical data alone but are communicated by narratives, supported by empirical observations. Outbreak reports discuss correlations between populations, locations and the disease to infer insights into causes, vectors an… ▽ More

    Submitted 11 January, 2021; v1 submitted 4 February, 2020; originally announced February 2020.

    Comments: Journal of Data Mining & Digital Humanities 2021

    Journal ref: Journal of Data Mining & Digital Humanities, HistoInformatics, HistoInformatics (January 20, 2021) jdmdh:6071

  14. arXiv:1903.03985  [pdf, other

    cs.CL cs.AI

    Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches

    Authors: Philip John Gorinski, Honghan Wu, Claire Grover, Richard Tobin, Conn Talbot, Heather Whalley, Cathie Sudlow, William Whiteley, Beatrice Alex

    Abstract: This work investigates multiple approaches to Named Entity Recognition (NER) for text in Electronic Health Record (EHR) data. In particular, we look into the application of (i) rule-based, (ii) deep learning and (iii) transfer learning systems for the task of NER on brain imaging reports with a focus on records from patients with stroke. We explore the strengths and weaknesses of each approach, de… ▽ More

    Submitted 5 June, 2019; v1 submitted 10 March, 2019; originally announced March 2019.

    Comments: 8 pages, presented at HealTAC 2019, Cardiff, 24-25/04/2019