-
NeuraHealth: An Automated Screening Pipeline to Detect Undiagnosed Cognitive Impairment in Electronic Health Records with Deep Learning and Natural Language Processing
Authors:
Tanish Tyagi,
Colin G. Magdamo,
Ayush Noori,
Zhaozhi Li,
Xiao Liu,
Mayuresh Deodhar,
Zhuoqiao Hong,
Wendong Ge,
Elissa M. Ye,
Yi-han Sheu,
Haitham Alabsi,
Laura Brenner,
Gregory K. Robbins,
Sahar Zafar,
Nicole Benson,
Lidia Moura,
John Hsu,
Alberto Serrano-Pozo,
Dimitry Prokopenko,
Rudolph E. Tanzi,
Bradley T. Hyman,
Deborah Blacker,
Shibani S. Mukerji,
M. Brandon Westover,
Sudeshna Das
Abstract:
Dementia related cognitive impairment (CI) is a neurodegenerative disorder, affecting over 55 million people worldwide and growing rapidly at the rate of one new case every 3 seconds. 75% cases go undiagnosed globally with up to 90% in low-and-middle-income countries, leading to an estimated annual worldwide cost of USD 1.3 trillion, forecasted to reach 2.8 trillion by 2030. With no cure, a recurr…
▽ More
Dementia related cognitive impairment (CI) is a neurodegenerative disorder, affecting over 55 million people worldwide and growing rapidly at the rate of one new case every 3 seconds. 75% cases go undiagnosed globally with up to 90% in low-and-middle-income countries, leading to an estimated annual worldwide cost of USD 1.3 trillion, forecasted to reach 2.8 trillion by 2030. With no cure, a recurring failure of clinical trials, and a lack of early diagnosis, the mortality rate is 100%. Information in electronic health records (EHR) can provide vital clues for early detection of CI, but a manual review by experts is tedious and error prone. Several computational methods have been proposed, however, they lack an enhanced understanding of the linguistic context in complex language structures of EHR. Therefore, I propose a novel and more accurate framework, NeuraHealth, to identify patients who had no earlier diagnosis. In NeuraHealth, using patient EHR from Mass General Brigham BioBank, I fine-tuned a bi-directional attention-based deep learning natural language processing model to classify sequences. The sequence predictions were used to generate structured features as input for a patient level regularized logistic regression model. This two-step framework creates high dimensionality, outperforming all existing state-of-the-art computational methods as well as clinical methods. Further, I integrate the models into a real-world product, a web app, to create an automated EHR screening pipeline for scalable and high-speed discovery of undetected CI in EHR, making early diagnosis viable in medical facilities and in regions with scarce health services.
△ Less
Submitted 20 June, 2022; v1 submitted 12 January, 2022;
originally announced February 2022.
-
Using Deep Learning to Identify Patients with Cognitive Impairment in Electronic Health Records
Authors:
Tanish Tyagi,
Colin G. Magdamo,
Ayush Noori,
Zhaozhi Li,
Xiao Liu,
Mayuresh Deodhar,
Zhuoqiao Hong,
Wendong Ge,
Elissa M. Ye,
Yi-han Sheu,
Haitham Alabsi,
Laura Brenner,
Gregory K. Robbins,
Sahar Zafar,
Nicole Benson,
Lidia Moura,
John Hsu,
Alberto Serrano-Pozo,
Dimitry Prokopenko,
Rudolph E. Tanzi,
Bradley T. Hyman,
Deborah Blacker,
Shibani S. Mukerji,
M. Brandon Westover,
Sudeshna Das
Abstract:
Dementia is a neurodegenerative disorder that causes cognitive decline and affects more than 50 million people worldwide. Dementia is under-diagnosed by healthcare professionals - only one in four people who suffer from dementia are diagnosed. Even when a diagnosis is made, it may not be entered as a structured International Classification of Diseases (ICD) diagnosis code in a patient's charts. In…
▽ More
Dementia is a neurodegenerative disorder that causes cognitive decline and affects more than 50 million people worldwide. Dementia is under-diagnosed by healthcare professionals - only one in four people who suffer from dementia are diagnosed. Even when a diagnosis is made, it may not be entered as a structured International Classification of Diseases (ICD) diagnosis code in a patient's charts. Information relevant to cognitive impairment (CI) is often found within electronic health records (EHR), but manual review of clinician notes by experts is both time consuming and often prone to errors. Automated mining of these notes presents an opportunity to label patients with cognitive impairment in EHR data. We developed natural language processing (NLP) tools to identify patients with cognitive impairment and demonstrate that linguistic context enhances performance for the cognitive impairment classification task. We fine-tuned our attention based deep learning model, which can learn from complex language structures, and substantially improved accuracy (0.93) relative to a baseline NLP model (0.84). Further, we show that deep learning NLP can successfully identify dementia patients without dementia-related ICD codes or medications.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
Natural Language Processing to Detect Cognitive Concerns in Electronic Health Records Using Deep Learning
Authors:
Zhuoqiao Hong,
Colin G. Magdamo,
Yi-han Sheu,
Prathamesh Mohite,
Ayush Noori,
Elissa M. Ye,
Wendong Ge,
Haoqi Sun,
Laura Brenner,
Gregory Robbins,
Shibani Mukerji,
Sahar Zafar,
Nicole Benson,
Lidia Moura,
John Hsu,
Bradley T. Hyman,
Michael B. Westover,
Deborah Blacker,
Sudeshna Das
Abstract:
Dementia is under-recognized in the community, under-diagnosed by healthcare professionals, and under-coded in claims data. Information on cognitive dysfunction, however, is often found in unstructured clinician notes within medical records but manual review by experts is time consuming and often prone to errors. Automated mining of these notes presents a potential opportunity to label patients wi…
▽ More
Dementia is under-recognized in the community, under-diagnosed by healthcare professionals, and under-coded in claims data. Information on cognitive dysfunction, however, is often found in unstructured clinician notes within medical records but manual review by experts is time consuming and often prone to errors. Automated mining of these notes presents a potential opportunity to label patients with cognitive concerns who could benefit from an evaluation or be referred to specialist care. In order to identify patients with cognitive concerns in electronic medical records, we applied natural language processing (NLP) algorithms and compared model performance to a baseline model that used structured diagnosis codes and medication data only. An attention-based deep learning model outperformed the baseline model and other simpler models.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Modeling semi-competing risks data as a longitudinal bivariate process
Authors:
Daniel Nevo,
Deborah Blacker,
Eric B. Larson,
Sebastien Haneuse
Abstract:
The Adult Changes in Thought (ACT) study is a long-running prospective study of incident all-cause dementia and Alzheimer's disease (AD). As the cohort ages, death (a terminal event) is a prominent competing risk for AD (a non-terminal event), although the reverse is not the case. As such, analyses of data from ACT can be placed within the semi-competing risks framework. Central to semi-competing…
▽ More
The Adult Changes in Thought (ACT) study is a long-running prospective study of incident all-cause dementia and Alzheimer's disease (AD). As the cohort ages, death (a terminal event) is a prominent competing risk for AD (a non-terminal event), although the reverse is not the case. As such, analyses of data from ACT can be placed within the semi-competing risks framework. Central to semi-competing risks, and in contrast to standard competing risks, is that one can learn about the dependence structure between the two events. To-date, however, most methods for semi-competing risks treat dependence as a nuisance and not a potential source of new clinical knowledge. We propose a novel regression-based framework that views the two time-to-event outcomes through the lens of a longitudinal bivariate process on a partition of the time scale. A key innovation of the framework is that dependence is represented in two distinct forms, $\textit{local}$ and $\textit{global}$ dependence, both of which have intuitive clinical interpretations. Estimation and inference are performed via penalized maximum likelihood, and can accommodate right censoring, left truncation and time-varying covariates. The framework is used to investigate the role of gender and having $\ge$1 APOE-$\epsilon4$ allele on the joint risk of AD and death.
△ Less
Submitted 8 July, 2020;
originally announced July 2020.