-
Validating transformers for redaction of text from electronic health records in real-world healthcare
Authors:
Zeljko Kraljevic,
Anthony Shek,
Joshua Au Yeung,
Ewart Jonathan Sheldon,
Mohammad Al-Agil,
Haris Shuaib,
Xi Bai,
Kawsar Noor,
Anoop D. Shah,
Richard Dobson,
James Teo
Abstract:
Protecting patient privacy in healthcare records is a top priority, and redaction is a commonly used method for obscuring directly identifiable information in text. Rule-based methods have been widely used, but their precision is often low causing over-redaction of text and frequently not being adaptable enough for non-standardised or unconventional structures of personal health information. Deep…
▽ More
Protecting patient privacy in healthcare records is a top priority, and redaction is a commonly used method for obscuring directly identifiable information in text. Rule-based methods have been widely used, but their precision is often low causing over-redaction of text and frequently not being adaptable enough for non-standardised or unconventional structures of personal health information. Deep learning techniques have emerged as a promising solution, but implementing them in real-world environments poses challenges due to the differences in patient record structure and language across different departments, hospitals, and countries.
In this study, we present AnonCAT, a transformer-based model and a blueprint on how deidentification models can be deployed in real-world healthcare. AnonCAT was trained through a process involving manually annotated redactions of real-world documents from three UK hospitals with different electronic health record systems and 3116 documents. The model achieved high performance in all three hospitals with a Recall of 0.99, 0.99 and 0.96.
Our findings demonstrate the potential of deep learning techniques for improving the efficiency and accuracy of redaction in global healthcare data and highlight the importance of building workflows which not just use these models but are also able to continually fine-tune and audit the performance of these algorithms to ensure continuing effectiveness in real-world settings. This approach provides a blueprint for the real-world use of de-identifying algorithms through fine-tuning and localisation, the code together with tutorials is available on GitHub (https://github.com/CogStack/MedCAT).
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Robust and Context-Aware Real-Time Collaborative Robot Handling via Dynamic Gesture Commands
Authors:
Rui Chen,
Alvin Shek,
Changliu Liu
Abstract:
This paper studies real-time collaborative robot (cobot) handling, where the cobot maneuvers an object under human dynamic gesture commands. Enabling dynamic gesture commands is useful when the human needs to avoid direct contact with the robot or the object handled by the robot. However, the key challenge lies in the heterogeneity in human behaviors and the stochasticity in the perception of dyna…
▽ More
This paper studies real-time collaborative robot (cobot) handling, where the cobot maneuvers an object under human dynamic gesture commands. Enabling dynamic gesture commands is useful when the human needs to avoid direct contact with the robot or the object handled by the robot. However, the key challenge lies in the heterogeneity in human behaviors and the stochasticity in the perception of dynamic gestures, which requires the robot handling policy to be adaptable and robust. To address these challenges, we introduce Conditional Collaborative Handling Process (CCHP) to encode a contextaware cobot handling policy and a procedure to learn such policy from human-human collaboration. We thoroughly evaluate the adaptability and robustness of CCHP and apply our approach to a real-time cobot assembly task with Kinova Gen3 robot arm. Results show that our method leads to significantly less human effort and smoother human-robot collaboration than state-of-the-art rule-based approach even with first-time users.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Foresight -- Generative Pretrained Transformer (GPT) for Modelling of Patient Timelines using EHRs
Authors:
Zeljko Kraljevic,
Dan Bean,
Anthony Shek,
Rebecca Bendayan,
Harry Hemingway,
Joshua Au Yeung,
Alexander Deng,
Alfie Baston,
Jack Ross,
Esther Idowu,
James T Teo,
Richard J Dobson
Abstract:
Background: Electronic Health Records hold detailed longitudinal information about each patient's health status and general clinical history, a large portion of which is stored within the unstructured text. Existing approaches focus mostly on structured data and a subset of single-domain outcomes. We explore how temporal modelling of patients from free text and structured data, using deep generati…
▽ More
Background: Electronic Health Records hold detailed longitudinal information about each patient's health status and general clinical history, a large portion of which is stored within the unstructured text. Existing approaches focus mostly on structured data and a subset of single-domain outcomes. We explore how temporal modelling of patients from free text and structured data, using deep generative transformers can be used to forecast a wide range of future disorders, substances, procedures or findings. Methods: We present Foresight, a novel transformer-based pipeline that uses named entity recognition and linking tools to convert document text into structured, coded concepts, followed by providing probabilistic forecasts for future medical events such as disorders, substances, procedures and findings. We processed the entire free-text portion from three different hospital datasets totalling 811336 patients covering both physical and mental health. Findings: On tests in two UK hospitals (King's College Hospital, South London and Maudsley) and the US MIMIC-III dataset precision@10 0.68, 0.76 and 0.88 was achieved for forecasting the next disorder in a patient timeline, while precision@10 of 0.80, 0.81 and 0.91 was achieved for forecasting the next biomedical concept. Foresight was also validated on 34 synthetic patient timelines by five clinicians and achieved relevancy of 97% for the top forecasted candidate disorder. As a generative model, it can forecast follow-on biomedical concepts for as many steps as required. Interpretation: Foresight is a general-purpose model for biomedical concept modelling that can be used for real-world risk forecasting, virtual trials and clinical research to study the progression of disorders, simulate interventions and counterfactuals, and educational purposes.
△ Less
Submitted 24 January, 2023; v1 submitted 13 December, 2022;
originally announced December 2022.
-
Spontaneous Phase Separation of Ternary Fluid Mixtures
Authors:
Alvin C. M. Shek,
Halim Kusumaatmaja
Abstract:
We computationally study the spontaneous phase separation of ternary fluid mixtures using the lattice Boltzmann method, both when all the surface tensions are equal and when they have different values. Previous theoretical works typically rely on analysing the sign of the eigenvalues resulting from a simple linear stability analysis, but we find this does not explain the fluid morphologies observe…
▽ More
We computationally study the spontaneous phase separation of ternary fluid mixtures using the lattice Boltzmann method, both when all the surface tensions are equal and when they have different values. Previous theoretical works typically rely on analysing the sign of the eigenvalues resulting from a simple linear stability analysis, but we find this does not explain the fluid morphologies observed. Here, by combining systematic computer simulations over the full range of the composition space and theoretical analysis on the eigenvalues and eigenvectors of the unstable modes, we identify four fundamental phase separation pathways. In particular, we highlight a dominant but so-far overlooked mechanism involving enrichment and instability of the minor component at the fluid-fluid interface
△ Less
Submitted 28 July, 2022; v1 submitted 1 April, 2022;
originally announced April 2022.
-
Learning from Physical Human Feedback: An Object-Centric One-Shot Adaptation Method
Authors:
Alvin Shek,
Bo Ying Su,
Rui Chen,
Changliu Liu
Abstract:
For robots to be effectively deployed in novel environments and tasks, they must be able to understand the feedback expressed by humans during intervention. This can either correct undesirable behavior or indicate additional preferences. Existing methods either require repeated episodes of interactions or assume prior known reward features, which is data-inefficient and can hardly transfer to new…
▽ More
For robots to be effectively deployed in novel environments and tasks, they must be able to understand the feedback expressed by humans during intervention. This can either correct undesirable behavior or indicate additional preferences. Existing methods either require repeated episodes of interactions or assume prior known reward features, which is data-inefficient and can hardly transfer to new tasks. We relax these assumptions by describing human tasks in terms of object-centric sub-tasks and interpreting physical interventions in relation to specific objects. Our method, Object Preference Adaptation (OPA), is composed of two key stages: 1) pre-training a base policy to produce a wide variety of behaviors, and 2) online-updating according to human feedback. The key to our fast, yet simple adaptation is that general interaction dynamics between agents and objects are fixed, and only object-specific preferences are updated. Our adaptation occurs online, requires only one human intervention (one-shot), and produces new behaviors never seen during training. Trained on cheap synthetic data instead of expensive human demonstrations, our policy correctly adapts to human perturbations on realistic tasks on a physical 7DOF robot. Videos, code, and supplementary material are provided.
△ Less
Submitted 2 June, 2023; v1 submitted 9 March, 2022;
originally announced March 2022.
-
Learn from Human Teams: a Probabilistic Solution to Real-Time Collaborative Robot Handling with Dynamic Gesture Commands
Authors:
Rui Chen,
Alvin Shek,
Changliu Liu
Abstract:
We study real-time collaborative robot (cobot) handling, where the cobot maneuvers a workpiece under human commands. This is useful when it is risky for humans to directly handle the workpiece. However, it is hard to make the cobot both easy to command and flexible in possible operations. In this work, we propose a Real-Time Collaborative Robot Handling (RTCoHand) framework that allows the control…
▽ More
We study real-time collaborative robot (cobot) handling, where the cobot maneuvers a workpiece under human commands. This is useful when it is risky for humans to directly handle the workpiece. However, it is hard to make the cobot both easy to command and flexible in possible operations. In this work, we propose a Real-Time Collaborative Robot Handling (RTCoHand) framework that allows the control of cobot via user-customized dynamic gestures. This is hard due to variations among users, human motion uncertainties, and noisy human input. We model the task as a probabilistic generative process, referred to as Conditional Collaborative Handling Process (CCHP), and learn from human-human collaboration. We thoroughly evaluate the adaptability and robustness of CCHP and apply our approach to a real-time cobot handling task with Kinova Gen3 robot arm. We achieve seamless human-robot collaboration with both experienced and new users. Compared to classical controllers, RTCoHand allows significantly more complex maneuvers and lower user cognitive burden. It also eliminates the need for trial-and-error, rendering it advantageous in safety-critical tasks.
△ Less
Submitted 11 December, 2021;
originally announced December 2021.
-
MedGPT: Medical Concept Prediction from Clinical Narratives
Authors:
Zeljko Kraljevic,
Anthony Shek,
Daniel Bean,
Rebecca Bendayan,
James Teo,
Richard Dobson
Abstract:
The data available in Electronic Health Records (EHRs) provides the opportunity to transform care, and the best way to provide better care for one patient is through learning from the data available on all other patients. Temporal modelling of a patient's medical history, which takes into account the sequence of past events, can be used to predict future events such as a diagnosis of a new disorde…
▽ More
The data available in Electronic Health Records (EHRs) provides the opportunity to transform care, and the best way to provide better care for one patient is through learning from the data available on all other patients. Temporal modelling of a patient's medical history, which takes into account the sequence of past events, can be used to predict future events such as a diagnosis of a new disorder or complication of a previous or existing disorder. While most prediction approaches use mostly the structured data in EHRs or a subset of single-domain predictions and outcomes, we present MedGPT a novel transformer-based pipeline that uses Named Entity Recognition and Linking tools (i.e. MedCAT) to structure and organize the free text portion of EHRs and anticipate a range of future medical events (initially disorders). Since a large portion of EHR data is in text form, such an approach benefits from a granular and detailed view of a patient while introducing modest additional noise. MedGPT effectively deals with the noise and the added granularity, and achieves a precision of 0.344, 0.552 and 0.640 (vs LSTM 0.329, 0.538 and 0.633) when predicting the top 1, 3 and 5 candidate future disorders on real world hospital data from King's College Hospital, London, UK (\textasciitilde600k patients). We also show that our model captures medical knowledge by testing it on an experimental medical multiple choice question answering task, and by examining the attentional focus of the model using gradient-based saliency methods.
△ Less
Submitted 7 July, 2021;
originally announced July 2021.
-
Capillary Bridges on Liquid Infused Surfaces
Authors:
Alvin C. M. Shek,
Ciro Semprebon,
Jack R. Panter,
Halim Kusumaatmaja
Abstract:
We numerically study two-component capillary bridges formed when a liquid droplet is placed in between two liquid infused surfaces (LIS). In contrast to commonly studied one-component capillary bridges on non-infused solid surfaces, two-component liquid bridges can exhibit a range of different morphologies where the liquid droplet is directly in contact with two, one or none of the LIS substrates.…
▽ More
We numerically study two-component capillary bridges formed when a liquid droplet is placed in between two liquid infused surfaces (LIS). In contrast to commonly studied one-component capillary bridges on non-infused solid surfaces, two-component liquid bridges can exhibit a range of different morphologies where the liquid droplet is directly in contact with two, one or none of the LIS substrates. In addition, the capillary bridges may lose stability when compressed due to the envelopment of the droplet by the lubricant. We also characterise the capillary force, maximum separation and effective spring force, and find they are influenced by the shape and size of the lubricant ridge. Importantly, these can be tuned to increase the effective capillary adhesion strength by manipulating the lubricant pressure, Neumann angle, and wetting contact angles. As such, LIS are not only "slippery" parallel to the surface, but they are also "sticky" perpendicular to the surface.
△ Less
Submitted 15 December, 2020;
originally announced December 2020.
-
A Knowledge Distillation Ensemble Framework for Predicting Short and Long-term Hospitalisation Outcomes from Electronic Health Records Data
Authors:
Zina M Ibrahim,
Daniel Bean,
Thomas Searle,
Honghan Wu,
Anthony Shek,
Zeljko Kraljevic,
James Galloway,
Sam Norton,
James T Teo,
Richard JB Dobson
Abstract:
The ability to perform accurate prognosis of patients is crucial for proactive clinical decision making, informed resource management and personalised care. Existing outcome prediction models suffer from a low recall of infrequent positive outcomes. We present a highly-scalable and robust machine learning framework to automatically predict adversity represented by mortality and ICU admission from…
▽ More
The ability to perform accurate prognosis of patients is crucial for proactive clinical decision making, informed resource management and personalised care. Existing outcome prediction models suffer from a low recall of infrequent positive outcomes. We present a highly-scalable and robust machine learning framework to automatically predict adversity represented by mortality and ICU admission from time-series vital signs and laboratory results obtained within the first 24 hours of hospital admission. The stacked platform comprises two components: a) an unsupervised LSTM Autoencoder that learns an optimal representation of the time-series, using it to differentiate the less frequent patterns which conclude with an adverse event from the majority patterns that do not, and b) a gradient boosting model, which relies on the constructed representation to refine prediction, incorporating static features of demographics, admission details and clinical summaries. The model is used to assess a patient's risk of adversity over time and provides visual justifications of its prediction based on the patient's static features and dynamic signals. Results of three case studies for predicting mortality and ICU admission show that the model outperforms all existing outcome prediction models, achieving PR-AUC of 0.891 (95$%$ CI: 0.878 - 0.969) in predicting mortality in ICU and general ward settings and 0.908 (95$%$ CI: 0.870-0.935) in predicting ICU admission.
△ Less
Submitted 11 June, 2021; v1 submitted 18 November, 2020;
originally announced November 2020.
-
Multi-domain Clinical Natural Language Processing with MedCAT: the Medical Concept Annotation Toolkit
Authors:
Zeljko Kraljevic,
Thomas Searle,
Anthony Shek,
Lukasz Roguski,
Kawsar Noor,
Daniel Bean,
Aurelie Mascio,
Leilei Zhu,
Amos A Folarin,
Angus Roberts,
Rebecca Bendayan,
Mark P Richardson,
Robert Stewart,
Anoop D Shah,
Wai Keong Wong,
Zina Ibrahim,
James T Teo,
Richard JB Dobson
Abstract:
Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of Information Extraction (IE) technologies to enable clinical analysis. We present the open-source Medical Concept Annotation Toolkit (MedCAT) that provides: a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; b) a f…
▽ More
Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of Information Extraction (IE) technologies to enable clinical analysis. We present the open-source Medical Concept Annotation Toolkit (MedCAT) that provides: a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; b) a feature-rich annotation interface for customising and training IE models; and c) integrations to the broader CogStack ecosystem for vendor-agnostic health system deployment. We show improved performance in extracting UMLS concepts from open datasets (F1:0.448-0.738 vs 0.429-0.650). Further real-world validation demonstrates SNOMED-CT extraction at 3 large London hospitals with self-supervised training over ~8.8B words from ~17M clinical records and further fine-tuning with ~6K clinician annotated examples. We show strong transferability (F1 > 0.94) between hospitals, datasets, and concept types indicating cross-domain EHR-agnostic utility for accelerated clinical and research use cases.
△ Less
Submitted 25 March, 2021; v1 submitted 2 October, 2020;
originally announced October 2020.
-
Modelling ternary fluids in contact with elastic membranes
Authors:
Marianna Pepona,
Alvin C. M. Shek,
Ciro Semprebon,
Timm Krüger,
Halim Kusumaatmaja
Abstract:
We present a thermodynamically consistent model of a ternary fluid interacting with elastic membranes. Following a free-energy modelling approach and taking into account the thermodynamics laws, we derive the equations governing the ternary fluid flow and dynamics of the membranes. We also provide the numerical framework for simulating such fluid-structure interaction problems. It is based on the…
▽ More
We present a thermodynamically consistent model of a ternary fluid interacting with elastic membranes. Following a free-energy modelling approach and taking into account the thermodynamics laws, we derive the equations governing the ternary fluid flow and dynamics of the membranes. We also provide the numerical framework for simulating such fluid-structure interaction problems. It is based on the lattice Boltzmann method, employed for resolving the evolution equations of the ternary fluid in an Eulerian description, coupled to the immersed boundary method, allowing for the membrane equations of motion to be solved in a Lagrangian system. The configuration of an elastic capsule placed at a fluid-fluid interface is considered for validation purposes. Systematic simulations are performed for a detailed comparison with reference numerical results obtained by Surface Evolver, and the Galilean invariance of the proposed model is also proven. The proposed approach is versatile, and a wide range of geometries can be simulated. To demonstrate this, the problem of a capillary bridge formed between two deformable capsules is investigated here.
△ Less
Submitted 28 May, 2019;
originally announced June 2019.