VaultDB: A Real-World Pilot of Secure Multi-Party Computation within a Clinical Research Network
Authors:
Jennie Rogers,
Elizabeth Adetoro,
Johes Bater,
Talia Canter,
Dong Fu,
Andrew Hamilton,
Amro Hassan,
Ashley Martinez,
Erick Michalski,
Vesna Mitrovic,
Fred Rachman,
Raj Shah,
Matt Sterling,
Kyra VanDoren,
Theresa L. Walunas,
Xiao Wang,
Abel Kho
Abstract:
Electronic health records represent a rich and growing source of clinical data for research. Privacy, regulatory, and institutional concerns limit the speed and ease of sharing this data. VaultDB is a framework for securely computing SQL queries over private data from two or more sources. It evaluates queries using secure multiparty computation: cryptographic protocols that evaluate a function suc…
▽ More
Electronic health records represent a rich and growing source of clinical data for research. Privacy, regulatory, and institutional concerns limit the speed and ease of sharing this data. VaultDB is a framework for securely computing SQL queries over private data from two or more sources. It evaluates queries using secure multiparty computation: cryptographic protocols that evaluate a function such that the only information revealed from running it is the query answer. We describe the development of a HIPAA-compliant version of VaultDB on the Chicago Area Patient Centered Outcomes Research Network (CAPriCORN). This multi-institutional clinical research network spans the electronic health records of nearly 13M patients over hundreds of clinics and hospitals in the Chicago metropolitan area. Our results from deploying at three health systems within this network show its efficiency and scalability for distributed clinical research analyses without moving patient records from their site of origin.
△ Less
Submitted 25 July, 2022; v1 submitted 28 February, 2022;
originally announced March 2022.
Natural language processing to identify lupus nephritis phenotype in electronic health records
Authors:
Yu Deng,
Jennifer A. Pacheco,
Anh Chung,
Chengsheng Mao,
Joshua C. Smith,
Juan Zhao,
Wei-Qi Wei,
April Barnado,
Chunhua Weng,
Cong Liu,
Adam Cordon,
**gzhi Yu,
Yacob Tedla,
Abel Kho,
Rosalind Ramsey-Goldman,
Theresa Walunas,
Yuan Luo
Abstract:
Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore b…
▽ More
Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data. We developed four algorithms: a rule-based algorithm using only structured data (baseline algorithm) and three algorithms using different NLP models. The three NLP models are based on regularized logistic regression and use different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components respectively. The baseline algorithm and the best performed NLP algorithm were external validated on a dataset from Vanderbilt University Medical Center (VUMC). Our best performing NLP model incorporating features from both structured data, regular expression concepts, and mapped CUIs improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.62 vs 0.96) datasets compared to the baseline lupus nephritis algorithm.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.