-
Domain Generalization through Meta-Learning: A Survey
Authors:
Arsham Gholamzadeh Khoee,
Yinan Yu,
Robert Feldt
Abstract:
Deep neural networks (DNNs) have revolutionized artificial intelligence but often lack performance when faced with out-of-distribution (OOD) data, a common scenario due to the inevitable domain shifts in real-world applications. This limitation stems from the common assumption that training and testing data share the same distribution-an assumption frequently violated in practice. Despite their ef…
▽ More
Deep neural networks (DNNs) have revolutionized artificial intelligence but often lack performance when faced with out-of-distribution (OOD) data, a common scenario due to the inevitable domain shifts in real-world applications. This limitation stems from the common assumption that training and testing data share the same distribution-an assumption frequently violated in practice. Despite their effectiveness with large amounts of data and computational power, DNNs struggle with distributional shifts and limited labeled data, leading to overfitting and poor generalization across various tasks and domains. Meta-learning presents a promising approach by employing algorithms that acquire transferable knowledge across various tasks for fast adaptation, eliminating the need to learn each task from scratch. This survey paper delves into the realm of meta-learning with a focus on its contribution to domain generalization. We first clarify the concept of meta-learning for domain generalization and introduce a novel taxonomy based on the feature extraction strategy and the classifier learning methodology, offering a granular view of methodologies. Through an exhaustive review of existing methods and underlying theories, we map out the fundamentals of the field. Our survey provides practical insights and an informed discussion on promising research directions, paving the way for future innovation in meta-learning for domain generalization.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Meta-Learning in Spiking Neural Networks with Reward-Modulated STDP
Authors:
Arsham Gholamzadeh Khoee,
Alireza Javaheri,
Saeed Reza Kheradpisheh,
Mohammad Ganjtabesh
Abstract:
The human brain constantly learns and rapidly adapts to new situations by integrating acquired knowledge and experiences into memory. Develo** this capability in machine learning models is considered an important goal of AI research since deep neural networks perform poorly when there is limited data or when they need to adapt quickly to new unseen tasks. Meta-learning models are proposed to fac…
▽ More
The human brain constantly learns and rapidly adapts to new situations by integrating acquired knowledge and experiences into memory. Develo** this capability in machine learning models is considered an important goal of AI research since deep neural networks perform poorly when there is limited data or when they need to adapt quickly to new unseen tasks. Meta-learning models are proposed to facilitate quick learning in low-data regimes by employing absorbed information from the past. Although some models have recently been introduced that reached high-performance levels, they are not biologically plausible. We have proposed a bio-plausible meta-learning model inspired by the hippocampus and the prefrontal cortex using spiking neural networks with a reward-based learning system. Our proposed model includes a memory designed to prevent catastrophic forgetting, a phenomenon that occurs when meta-learning models forget what they have learned as soon as the new task begins. Also, our new model can easily be applied to spike-based neuromorphic devices and enables fast learning in neuromorphic hardware. The final analysis will discuss the implications and predictions of the model for solving few-shot classification tasks. In solving these tasks, our model has demonstrated the ability to compete with the existing state-of-the-art meta-learning techniques.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
VaultDB: A Real-World Pilot of Secure Multi-Party Computation within a Clinical Research Network
Authors:
Jennie Rogers,
Elizabeth Adetoro,
Johes Bater,
Talia Canter,
Dong Fu,
Andrew Hamilton,
Amro Hassan,
Ashley Martinez,
Erick Michalski,
Vesna Mitrovic,
Fred Rachman,
Raj Shah,
Matt Sterling,
Kyra VanDoren,
Theresa L. Walunas,
Xiao Wang,
Abel Kho
Abstract:
Electronic health records represent a rich and growing source of clinical data for research. Privacy, regulatory, and institutional concerns limit the speed and ease of sharing this data. VaultDB is a framework for securely computing SQL queries over private data from two or more sources. It evaluates queries using secure multiparty computation: cryptographic protocols that evaluate a function suc…
▽ More
Electronic health records represent a rich and growing source of clinical data for research. Privacy, regulatory, and institutional concerns limit the speed and ease of sharing this data. VaultDB is a framework for securely computing SQL queries over private data from two or more sources. It evaluates queries using secure multiparty computation: cryptographic protocols that evaluate a function such that the only information revealed from running it is the query answer. We describe the development of a HIPAA-compliant version of VaultDB on the Chicago Area Patient Centered Outcomes Research Network (CAPriCORN). This multi-institutional clinical research network spans the electronic health records of nearly 13M patients over hundreds of clinics and hospitals in the Chicago metropolitan area. Our results from deploying at three health systems within this network show its efficiency and scalability for distributed clinical research analyses without moving patient records from their site of origin.
△ Less
Submitted 25 July, 2022; v1 submitted 28 February, 2022;
originally announced March 2022.
-
A least squares support vector regression for anisotropic diffusion filtering
Authors:
Arsham Gholamzadeh Khoee,
Kimia Mohammadi Mohammadi,
Mostafa Jani,
Kourosh Parand
Abstract:
Anisotropic diffusion filtering for signal smoothing as a low-pass filter has the advantage of the edge-preserving, i.e., it does not affect the edges that contain more critical data than the other parts of the signal. In this paper, we present a numerical algorithm based on least squares support vector regression by using Legendre orthogonal kernel with the discretization of the nonlinear diffusi…
▽ More
Anisotropic diffusion filtering for signal smoothing as a low-pass filter has the advantage of the edge-preserving, i.e., it does not affect the edges that contain more critical data than the other parts of the signal. In this paper, we present a numerical algorithm based on least squares support vector regression by using Legendre orthogonal kernel with the discretization of the nonlinear diffusion problem in time by the Crank-Nicolson method. This method transforms the signal smoothing process into solving an optimization problem that can be solved by efficient numerical algorithms. In the final analysis, we have reported some numerical experiments to show the effectiveness of the proposed machine learning based approach for signal smoothing.
△ Less
Submitted 30 January, 2022;
originally announced February 2022.
-
Natural language processing to identify lupus nephritis phenotype in electronic health records
Authors:
Yu Deng,
Jennifer A. Pacheco,
Anh Chung,
Chengsheng Mao,
Joshua C. Smith,
Juan Zhao,
Wei-Qi Wei,
April Barnado,
Chunhua Weng,
Cong Liu,
Adam Cordon,
**gzhi Yu,
Yacob Tedla,
Abel Kho,
Rosalind Ramsey-Goldman,
Theresa Walunas,
Yuan Luo
Abstract:
Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore b…
▽ More
Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data. We developed four algorithms: a rule-based algorithm using only structured data (baseline algorithm) and three algorithms using different NLP models. The three NLP models are based on regularized logistic regression and use different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components respectively. The baseline algorithm and the best performed NLP algorithm were external validated on a dataset from Vanderbilt University Medical Center (VUMC). Our best performing NLP model incorporating features from both structured data, regular expression concepts, and mapped CUIs improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.62 vs 0.96) datasets compared to the baseline lupus nephritis algorithm.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
Learning Bundled Care Opportunities from Electronic Medical Records
Authors:
You Chen,
Abel N. Kho,
David Liebovitz,
Catherine Ivory,
Sarah Osmundson,
Jiang Bian,
Bradley A. Malin
Abstract:
Objectives: The fee-for-service approach to healthcare leads to the management of a patient's conditions in an independent manner, inducing various negative consequences. It is recognized that a bundled care approach to healthcare-one that manages a collection of health conditions together-may enable greater efficacy and cost savings. However, it is not always evident which sets of conditions shou…
▽ More
Objectives: The fee-for-service approach to healthcare leads to the management of a patient's conditions in an independent manner, inducing various negative consequences. It is recognized that a bundled care approach to healthcare-one that manages a collection of health conditions together-may enable greater efficacy and cost savings. However, it is not always evident which sets of conditions should be managed in a bundled program. Study Design: Retrospective inference of clusters of health conditions from an electronic medical record (EMR) system. A survey of healthcare experts to ascertain the plausibility of the clusters for bundled care programs. Methods: We designed a data-driven framework to infer clusters of health conditions via their shared clinical workflows according to EMR utilization by healthcare employees. We evaluated the framework with approximately 16,500 inpatient stays from a large medical center. The plausibility of the clusters for bundled care was assessed through a survey of a panel of healthcare experts using an analysis of variance (ANOVA) under a 95% confidence interval. Results: The framework inferred four condition clusters: 1) fetal abnormalities, 2) late pregnancies, 3) prostate problems, and 4) chronic diseases (with congestive heart failure featuring prominently). Each cluster was deemed plausible by the experts for bundled care. Conclusions: The findings suggest that data from EMRs may provide a basis for discovering new directions in bundled care. Still, translating such findings into actual care management will require further refinement, implementation, and evaluation.
△ Less
Submitted 26 May, 2017;
originally announced June 2017.
-
SMCQL: Secure Querying for Federated Databases
Authors:
Johes Bater,
Gregory Elliott,
Craig Eggen,
Satyender Goel,
Abel Kho,
Jennie Rogers
Abstract:
People and machines are collecting data at an unprecedented rate. Despite this newfound abundance of data, progress has been slow in sharing it for open science, business, and other data-intensive endeavors. Many such efforts are stymied by privacy concerns and regulatory compliance issues. For example, many hospitals are interested in pooling their medical records for research, but none may discl…
▽ More
People and machines are collecting data at an unprecedented rate. Despite this newfound abundance of data, progress has been slow in sharing it for open science, business, and other data-intensive endeavors. Many such efforts are stymied by privacy concerns and regulatory compliance issues. For example, many hospitals are interested in pooling their medical records for research, but none may disclose arbitrary patient records to researchers or other healthcare providers. In this context we propose the Private Data Network (PDN), a federated database for querying over the collective data of mutually distrustful parties. In a PDN, each member database does not reveal its tuples to its peers nor to the query writer. Instead, the user submits a query to an honest broker that plans and coordinates its execution over multiple private databases using secure multiparty computation (SMC). Here, each database's query execution is oblivious, and its program counters and memory traces are agnostic to the inputs of others. We introduce a framework for executing PDN queries named SMCQL. This system translates SQL statements into SMC primitives to compute query results over the union of its source databases without revealing sensitive information about individual tuples to peer data providers or the honest broker. Only the honest broker and the querier receive the results of a PDN query. For fast, secure query evaluation, we explore a heuristics-driven optimizer that minimizes the PDN's use of secure computation and partitions its query evaluation into scalable slices.
△ Less
Submitted 6 March, 2017; v1 submitted 21 June, 2016;
originally announced June 2016.
-
Designing a Linked Data Migrational Framework for Singapore Government Datasets
Authors:
Aravind Sesagiri Raamkumar,
Muthu Kumaar Thangavelu,
Sudarsan Kaleeswaran amd Christopher S. G. Khoo
Abstract:
The subject area of this report is Linked Data and its application to the Government domain. Linked Data is an alternative method of data representation that aims to interlink data from varied sources through relationships. Governments around the world have started publishing their data in this format to assist citizens in making better use of public services. This report provides an eight step mi…
▽ More
The subject area of this report is Linked Data and its application to the Government domain. Linked Data is an alternative method of data representation that aims to interlink data from varied sources through relationships. Governments around the world have started publishing their data in this format to assist citizens in making better use of public services. This report provides an eight step migrational framework for converting Singapore Government data from legacy systems to Linked Data format. The framework formulation is based on a study of the Singapore data ecosystem with help from Infocomm Development Authority (iDA) of Singapore. Each step in the migrational framework has been constructed with objectives, recommendations, best practices and issues with entry and exit points. This work builds on the existing Linked Data literature, implementations in other countries and cookbooks provided by Linked Data researchers. iDA can use this report to gain an understanding of the effort and work involved in the implementation of Linked Data system on top of their legacy systems. The framework can be evaluated by building a Proof of Concept (POC) application.
△ Less
Submitted 8 April, 2015;
originally announced April 2015.