Search | arXiv e-print repository

RESTORE: Graph Embedding Assessment Through Reconstruction

Authors: Hong Yung Yip, Chidaksh Ravuru, Neelabha Banerjee, Shashwat Jha, Amit Sheth, Aman Chadha, Amitava Das

Abstract: Following the success of Word2Vec embeddings, graph embeddings (GEs) have gained substantial traction. GEs are commonly generated and evaluated extrinsically on downstream applications, but intrinsic evaluations of the original graph properties in terms of topological structure and semantic information have been lacking. Understanding these will help identify the deficiency of the various families… ▽ More Following the success of Word2Vec embeddings, graph embeddings (GEs) have gained substantial traction. GEs are commonly generated and evaluated extrinsically on downstream applications, but intrinsic evaluations of the original graph properties in terms of topological structure and semantic information have been lacking. Understanding these will help identify the deficiency of the various families of GE methods when vectorizing graphs in terms of preserving the relevant knowledge or learning incorrect knowledge. To address this, we propose RESTORE, a framework for intrinsic GEs assessment through graph reconstruction. We show that reconstructing the original graph from the underlying GEs yields insights into the relative amount of information preserved in a given vector form. We first introduce the graph reconstruction task. We generate GEs from three GE families based on factorization methods, random walks, and deep learning (with representative algorithms from each family) on the CommonSense Knowledge Graph (CSKG). We analyze their effectiveness in preserving the (a) topological structure of node-level graph reconstruction with an increasing number of hops and (b) semantic information on various word semantic and analogy tests. Our evaluations show deep learning-based GE algorithm (SDNE) is overall better at preserving (a) with a mean average precision (mAP) of 0.54 and 0.35 for 2 and 3-hop reconstruction respectively, while the factorization-based algorithm (HOPE) is better at encapsulating (b) with an average Euclidean distance of 0.14, 0.17, and 0.11 for 1, 2, and 3-hop reconstruction respectively. The modest performance of these GEs leaves room for further research avenues on better graph representation learning. △ Less

Submitted 5 September, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

arXiv:2204.12716 [pdf, other]

UBERT: A Novel Language Model for Synonymy Prediction at Scale in the UMLS Metathesaurus

Authors: Thilini Wijesiriwardene, Vinh Nguyen, Goonmeet Bajaj, Hong Yung Yip, Vishesh Javangula, Yuqing Mao, Kin Wah Fung, Srinivasan Parthasarathy, Amit P. Sheth, Olivier Bodenreider

Abstract: The UMLS Metathesaurus integrates more than 200 biomedical source vocabularies. During the Metathesaurus construction process, synonymous terms are clustered into concepts by human editors, assisted by lexical similarity algorithms. This process is error-prone and time-consuming. Recently, a deep learning model (LexLM) has been developed for the UMLS Vocabulary Alignment (UVA) task. This work intr… ▽ More The UMLS Metathesaurus integrates more than 200 biomedical source vocabularies. During the Metathesaurus construction process, synonymous terms are clustered into concepts by human editors, assisted by lexical similarity algorithms. This process is error-prone and time-consuming. Recently, a deep learning model (LexLM) has been developed for the UMLS Vocabulary Alignment (UVA) task. This work introduces UBERT, a BERT-based language model, pretrained on UMLS terms via a supervised Synonymy Prediction (SP) task replacing the original Next Sentence Prediction (NSP) task. The effectiveness of UBERT for UMLS Metathesaurus construction process is evaluated using the UMLS Vocabulary Alignment (UVA) task. We show that UBERT outperforms the LexLM, as well as biomedical BERT-based models. Key to the performance of UBERT are the synonymy prediction task specifically developed for UBERT, the tight alignment of training data to the UVA task, and the similarity of the models used for pretrained UBERT. △ Less

Submitted 27 April, 2022; originally announced April 2022.

arXiv:2109.13348 [pdf, other]

Evaluating Biomedical BERT Models for Vocabulary Alignment at Scale in the UMLS Metathesaurus

Authors: Goonmeet Bajaj, Vinh Nguyen, Thilini Wijesiriwardene, Hong Yung Yip, Vishesh Javangula, Srinivasan Parthasarathy, Amit Sheth, Olivier Bodenreider

Abstract: The current UMLS (Unified Medical Language System) Metathesaurus construction process for integrating over 200 biomedical source vocabularies is expensive and error-prone as it relies on the lexical algorithms and human editors for deciding if the two biomedical terms are synonymous. Recent advances in Natural Language Processing such as Transformer models like BERT and its biomedical variants wit… ▽ More The current UMLS (Unified Medical Language System) Metathesaurus construction process for integrating over 200 biomedical source vocabularies is expensive and error-prone as it relies on the lexical algorithms and human editors for deciding if the two biomedical terms are synonymous. Recent advances in Natural Language Processing such as Transformer models like BERT and its biomedical variants with contextualized word embeddings have achieved state-of-the-art (SOTA) performance on downstream tasks. We aim to validate if these approaches using the BERT models can actually outperform the existing approaches for predicting synonymy in the UMLS Metathesaurus. In the existing Siamese Networks with LSTM and BioWordVec embeddings, we replace the BioWordVec embeddings with the biomedical BERT embeddings extracted from each BERT model using different ways of extraction. In the Transformer architecture, we evaluate the use of the different biomedical BERT models that have been pre-trained using different datasets and tasks. Given the SOTA performance of these BERT models for other downstream tasks, our experiments yield surprisingly interesting results: (1) in both model architectures, the approaches employing these biomedical BERT-based models do not outperform the existing approaches using Siamese Network with BioWordVec embeddings for the UMLS synonymy prediction task, (2) the original BioBERT large model that has not been pre-trained with the UMLS outperforms the SapBERT models that have been pre-trained with the UMLS, and (3) using the Siamese Networks yields better performance for synonymy prediction when compared to using the biomedical BERT models. △ Less

Submitted 15 October, 2021; v1 submitted 14 September, 2021; originally announced September 2021.

arXiv:1811.10073 [pdf]

Determination of Personalized Asthma Triggers from Evidence based on Multimodal Sensing and Mobile Application

Authors: Revathy Venkataramanan, Dipesh Kadariya, Hong Yung Yip, Utkarshani Jamini, Krishnaprasad Thirunarayan, Maninder Kalra, Amit Sheth

Abstract: Objective: Asthma is a chronic pulmonary disease with multiple triggers manifesting as symptoms with various intensities. This paper evaluates the suitability of long-term monitoring of pediatric asthma using diverse data to qualify and quantify triggers that contribute to the asthma symptoms and control to enable a personalized management plan. Materials and Methods: Asthma condition, environme… ▽ More Objective: Asthma is a chronic pulmonary disease with multiple triggers manifesting as symptoms with various intensities. This paper evaluates the suitability of long-term monitoring of pediatric asthma using diverse data to qualify and quantify triggers that contribute to the asthma symptoms and control to enable a personalized management plan. Materials and Methods: Asthma condition, environment, and adherence to the prescribed care plan were continuously tracked for 97 pediatric patients using kHealth-Asthma technology for one or three months. Result: At the cohort level, among 21% of the patients deployed in spring, 63% and 19% indicated pollen and Particulate Matter (PM2.5), respectively, as the major asthma contributors. Of the 18% of the patients deployed in fall, 29% and 21% found pollen and PM2.5 respectively, to be the contributors. For the 28% of the patients deployed in winter, PM2.5 was identified as the major contributor for 80% of them. One patient across each season has been chosen to explain the determination of personalized triggers by observing correlations between triggers and asthma symptoms gathered from anecdotal evidence. Discussion and Conclusion: Both public and personal health signals including compliance to prescribed care plan have been captured through continuous monitoring using the kHealth-Asthma technology which generated insights on causes of asthma symptoms across different seasons. Collectively, they can form the underlying basis for personalized management plan and intervention. KEYWORDS: Personalized Digital Health, Medical Internet of Things, Pediatric Asthma Management, Patient Generated Health Data, Personalized Triggers, Telehealth, △ Less

Submitted 25 November, 2018; originally announced November 2018.

Comments: contact author: Amit Sheth [email protected]

arXiv:1801.00356 [pdf]

How will the Internet of Things enable Augmented Personalized Health?

Authors: Amit Sheth, Utkarshani Jaimini, Hong Yung Yip

Abstract: Internet-of-Things (IoT) is profoundly redefining the way we create, consume, and share information. Health aficionados and citizens are increasingly using IoT technologies to track their sleep, food intake, activity, vital body signals, and other physiological observations. This is complemented by IoT systems that continuously collect health-related data from the environment and inside the living… ▽ More Internet-of-Things (IoT) is profoundly redefining the way we create, consume, and share information. Health aficionados and citizens are increasingly using IoT technologies to track their sleep, food intake, activity, vital body signals, and other physiological observations. This is complemented by IoT systems that continuously collect health-related data from the environment and inside the living quarters. Together, these have created an opportunity for a new generation of healthcare solutions. However, interpreting data to understand an individual's health is challenging. It is usually necessary to look at that individual's clinical record and behavioral information, as well as social and environmental information affecting that individual. Interpreting how well a patient is doing also requires looking at his adherence to respective health objectives, application of relevant clinical knowledge and the desired outcomes. We resort to the vision of Augmented Personalized Healthcare (APH) to exploit the extensive variety of relevant data and medical knowledge using Artificial Intelligence (AI) techniques to extend and enhance human health to presents various stages of augmented health management strategies: self-monitoring, self-appraisal, self-management, intervention, and disease progress tracking and prediction. kHealth technology, a specific incarnation of APH, and its application to Asthma and other diseases are used to provide illustrations and discuss alternatives for technology-assisted health management. Several prominent efforts involving IoT and patient-generated health data (PGHD) with respect converting multimodal data into actionable information (big data to smart data) are also identified. Roles of three components in an evidence-based semantic perception approach- Contextualization, Abstraction, and Personalization are discussed. △ Less

Submitted 31 December, 2017; originally announced January 2018.

Showing 1–5 of 5 results for author: Yip, H Y