HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: manyfoot

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY 4.0
arXiv:2401.15814v2 [cs.LG] 15 Feb 2024

[1]\fnmWeiqing \surWang

[1]\orgdivFaculty of Information Technology, \orgnameMonash University, \orgaddress\stateVictoria, \postcode3800, \countryAustralia

2]\orgdivCollege of Engineering and Computer Science, \orgnameVin University, \orgaddress\cityHanoi, \countryVietnam

3]\orgdivNursing Services, \orgnameAlfred Health, \orgaddress\cityMelbourne, \stateVictoria, \postcode3004, \countryAustralia

4]\orgdivFaculty of Engineering, \orgnameThe University of Queensland, \orgaddress\cityBrisbane, \stateQueensland, \postcode4072, \countryAustralia

OntoMedRec: Logically-Pretrained Model-Agnostic Ontology Encoders for Medication Recommendation

\fnmWeicong \surTan [email protected]    [email protected]    \fnmXin \surZhou [email protected]    \fnmWray \surBuntine [email protected]    \fnmGordon \surBingham [email protected]    \fnmHongzhi \surYin [email protected] * [ [ [
Abstract

Recommending medications with electronic health records (EHRs) is a challenging task for data-driven clinical decision support systems. Most existing models learnt representations for medical concepts based on EHRs and make recommendations with the learnt representations. However, most medications appear in EHR datasets for limited times (the frequency distribution of medications follows power law distribution), resulting in insufficient learning of their representations of the medications. Medical ontologies are the hierarchical classification systems for medical terms where similar terms will be in the same class on a certain level. In this paper, we propose OntoMedRec, the logically-pretrained and model-agnostic medical Ontology Encoders for Medication Recommendation that addresses data sparsity problem with medical ontologies.

We conduct comprehensive experiments on real-world EHR datasets to evaluate the effectiveness of OntoMedRec by integrating it into various existing downstream medication recommendation models. The result shows the integration of OntoMedRec improves the performance of various models in both the entire EHR datasets and the admissions with few-shot medications. We provide the GitHub repository for the source code111https://github.com/WaicongTam/OntoMedRec.

keywords:
medication recommendation, logic tensor networks, medical ontology

1 Introduction

The mass application of electronic health records (EHRs) has made data-driven clinical decision-support systems possible [1]. Deep learning models designed to assist clinical practitioners in a range of tasks have emerged, with notable categories encompassing patient risk prediction, re-admission forecasting, the generation of EHR representations, and medication recommendations for prescribers. To assist medical practitioners in prescribing medications, recommending sets of medications for them accurately and efficiently has become a challenging yet crucial task. Therefore, numerous data-driven medication recommendation models have been developed, exemplified by notable solutions such as 4SDrug[2], EDGE [3], and SafeDrug [4]. These models aim to predict the most suitable medication regimen based on a patient’s diagnoses, medical procedures, and/or prior prescription history, as demonstrated by systems like COGNet [5] and SARMR [6]. Existing medication recommendation models fall into two categories: instance-based models and longitudinal models. Instance-based models (e.g., LEAP [7] and 4SDrug [2]) recommend sets of drugs with patients’ diagnoses in the current admission, whereas longitudinal models (e.g., MICRON[8], SafeDrug [4] and COGNet [5]) utilise patients’ previous admissions.

Refer to caption
Refer to caption
Figure 1: Frequency distribution of diagnoses and medications in MIMIC-III dataset. The last bin is the cropped diagnoses/medications with a frequency higher than 200/40000.

For both instance-based models and longitudinal medication recommendation models, we identify one challenge that has not been sufficiently addressed: data sparsity issue (challenge 1). Similar to the user-interaction sparsity challenge in other recommender system models [9, 10], medication recommendation models suffer from data sparsity issues deriving from the frequency distribution of medical concepts. As demonstrated in Fig.1, the majority of diagnoses and medications only appear at limited times in the entire MIMIC-III dataset and their occurrence follows the power law distribution. This inevitably leads to insufficient learning of the indication relationships between diagnoses and medications (i.e., for what medical conditions a medication was designed) in instance-based models and their respective embeddings in longitudinal models. As proven many other recommendation tasks (e.g., [11] and [12]), utilising external knowledge bases can alleviate cold start effect. One category of the notable knowledge base for medication recommendation models is medical ontologies. Therefore, to alleviate the data sparsity issue (challenge 1), similar to [13, 14], we leverage external structured knowledge (i.e., medical ontologies) [13, 14] as it provides prior knowledge for the medical terms in EHRs. In EHRs, diagnoses, procedures and medications are encoded in standardised hierarchical classification systems called as medical ontologies. Each medical term is a node of the ontology and the relation between them is “is-a” (e.g., benproperine is a cough suppressant).

Fig.2 shows part of ATC ontology which is an ontology of medications. In this ontology, similar medications fall into the same parent node, yet there are definitive differences that distinguish them (i.e., the difference between siblings). For example, as demonstrated in Fig.2, medications in “Other cough suppressant in ATC” (R05DB) and “Opium alkaloids and derivatives, cough suppressants” (R05DA) fall into the same category “Cough suppressants, excl. combinations with expectorants”(R05). However, they are intrinsically different since codeine cough suppressants (i.e., R05DA) and non-codeine cough suppressants (i.e., R05DB) have different clinical characteristics (e.g., physical dependency and drug-drug interaction). Benproperine and cloperastine have the same therapeutical classification (i.e., they are both non-codeine cough suppressants), yet they are two different chemicals. Thus, we can see from this example, that effectively modelling the parental, ancestral and sibling relationships (similarities and differences) is beneficial to the medication recommendation task.

Refer to caption
Figure 2: An excerpt of the ATC ontology. Some nodes are omitted.

Even though there are some works exploiting the modelling of medical ontologies in the medication recommendation task, these existing works cannot effectively model ontology relationships to benefit medication recommendation task (challenge 2). Notable models integrating ontology information in medication recommendation include G-BERT [13] and KnowAugNet [14]. G-BERT uses a Graph Attention Network (GAT) [15] encoder trained end-to-end along with the medication recommendation module. KnowAugNet pretrains ontology encoders with an unsupervised contrastive learning method. However, both models encode ontology with GAT and treat ontology as an undirected graph, whereas ontology is by definition a direct acyclic graph (DAG). Moreover, they cannot model some important relationships such as the difference between siblings as shown in Fig.2. There are also models designed for other tasks that utilised medical ontologies (e.g., GRAM [16] and KAME [17]). However, the modelling of the ontology in these methods is deeply coupled with their downstream tasks which are not medication recommendation.

To effectively model the ontology relationships to improve the medication recommendation task (challenge 2), we propose a model OntoMedRec based on logic tensor networks (LTN) [18] in this paper. As we know, LTN aims at combining symbolic rules and neural computation together. The advantage of using LTN in our task is that it allows us to easily integrate the modelling of various identified ontology relationships as symbolic rules (e.g., the parental and sibling relations in Fig.2) into the training process (i.e., neural computation). Recent advances in logic tensor networks (LTN) [18] have shown its effectiveness in graph learning tasks such as ontology deduction and reasoning [19]. However, we find that directly applying existing LTN technique to our task is challenging for two reasons (challenge 3). The first reason is that the existing LTN works are designed for different task. To adapt to medication recommendation task, we need to design new sets of predicates, axioms, constants and variables. The second reason is that directly applying existing LTN methods is memory consuming. Existing LTN studies that designed for ontology data (e.g., [18] and [19]) are based on smaller ontologies (i.e., \leq 100 nodes) with small representation dimensions. To model an ontology of |𝒩|𝒩|\mathcal{N}|| caligraphic_N | nodes with n𝑛nitalic_n variables and d𝑑ditalic_d as model dimension, the space complexity is O(n|𝒩|d)𝑂𝑛𝒩𝑑O(n|\mathcal{N}|d)italic_O ( italic_n | caligraphic_N | italic_d ), which requires large amount of memory when the ontology is large (e.g., 17,737 nodes in our task). This affects the efficiency of the training process since the memories in GPUs are usually more scarce than RAMs. The high space complexity calls for an efficient sampling method for the effective training of larger node representations on larger ontologies. We devise a sampling method based on the structure of medical ontologies and our modelling method. It decreases the space complexity to O(nbd)𝑂𝑛𝑏𝑑O(nbd)italic_O ( italic_n italic_b italic_d ) where b<<|𝒩|much-less-than𝑏𝒩b<<|\mathcal{N}|italic_b < < | caligraphic_N | is the batch size. The contribution of this paper can be summarised as follows:

  • Logically-pretrained ontology encoder: We carefully design an LTN-based encoder by devising novel predicates, axioms, constants, and variables for the self-supervised logical training on medical ontologies. The design is based on the insights of what structural information is beneficial for the medication recommendation task. The devised axioms are naturally interpretable for humans. Moreover, for the efficient training of the model, we also designed an axiom-oriented sampling method to enable the learning of larger node representations on large ontologies. Furthermore, to infuse the indication relationships between medications and medical diagnoses, we utilised the MEDI dataset [20] to logically align the representation space of diagnoses and medications.

  • Model-agnostic ontology representation learning model for medication recommendation: Once the encoder is well trained, its output can be loaded into various existing medication recommendation models to improve their performance as the initialisation of the embeddings of medical codes (i.e., diagnoses, procedures and/or medication embeddings). Thus, similar to other pretrained models, our encoder is “once trained and ready to use for any medication recommendation models”.

  • Comprehensive experiments: Comprehensive experiments have been done (with code published) to validate the effectiveness of our model in improving different existing medication recommendation methods including both instance-based methods and longitudinal methods for both normal scenarios and few-shot scenarios. The results show that: 1) our model is able to improve the performance of both instance-based and longitudinal downstream recommendation methods but the improvements on longitudinal methods are more obvious compared to instance-based methods; 2) our model is able to improve the performance of existing recommendation methods in both normal scenarios and few-shot scenarios but the improvement is more obvious for few-shot scenarios.

2 Related Work

2.1 Instance-Based Medication Recommendation

Instance-based models recommend a set of medications based on the current admission. LEAP [7] was an early model that predicted the prescribed medications as sequences, and it made inferences with beam search. SMR [21] recommends drugs based on knowledge graph embeddings of diagnoses and medications. More recently, 4SDrug [2] was proposed. It is a set-based model trained by comparing the difference between medication sets with similar corresponding diagnoses sets.

2.2 Longitudinal Medication Recommendation

Longitudinal models make use of patients’ previous diagnoses and procedures records. RETAIN [22] was a representation learning model that encodes a patient’s EHR into a representation, and it can be used for the medication recommendation task with extra output layers. DCw-MANN [23] used all past medications to predict current medications using a LSTMs-based encoder-decoder model. GAMENet [24] used memory bank matrices to associate past diagnoses and procedures with medications. SafeDrug [4] uses a global molecule encoder and a local molecule substructure encoder to encode medications. COGNet [5] uses Transformer-based [25] to encode the patient’s diagnoses, procedures and medication history. MICRON [8] is a model designed for predicting the change in prescriptions, it models the change of prescribed medications with residual vectors. In addition to a patient’s EHR, MERITS [26] used the neural ordinary differential equation to model the irregular time series of the patient’s vital signs. The model proposed by Yao et al. [27] used RNN to model the path from the root node to medical concepts on medical ontologies. Other than medication recommendation, some longitudinal models use longitudinal EHR data to perform other tasks such as diagnoses prediction (e.g., KAME [17]) and representation learning (e.g., GRAM [16]). These two models also had medical ontology encoding modules, but they were trained end-to-end with downstream tasks.

2.3 Existing Solutions to Data Sparsity Issue

Some existing models have attempted to address the data sparsity issue. G-BERT [13] used GAT encoders to encode diagnoses and medications. However, the pretraining data used in G-BERT is the patient records with one admission. These admission data still follow the distribution we described in Fig.1. kampnet [14] used unsupervised contrastive learning to pre-train encoders for medical ontologies and medication-diagnoses co-existence graph. EDGE [3] considers drugs that never appear in a certain time range in the EHR dataset as novel drugs, and uses meta-learning to alleviate the cold-start effect of those drugs. However, an interpretable and EHR-independent pretrained encoder for medical ontologies has not been proposed. Moreover, the data sparsity issue in medication recommendation has not been sufficiently addressed.

3 Preliminaries

3.1 Electronic Medical Record

An electronic health record (EHR) dataset can be considered as a collection of |𝒰|𝒰|\mathcal{U}|| caligraphic_U | patients’ medical records 𝒰={𝒰(n)}n=1|𝒰|𝒰superscriptsubscriptsuperscript𝒰𝑛𝑛1𝒰\mathcal{U}=\{\mathcal{U}^{(n)}\}_{n=1}^{|\mathcal{U}|}caligraphic_U = { caligraphic_U start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | caligraphic_U | end_POSTSUPERSCRIPT where a patient’s medical record 𝒰(n)superscript𝒰𝑛\mathcal{U}^{(n)}caligraphic_U start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT is constituted by their admissions [𝒱t(n)]t=1T(n)superscriptsubscriptdelimited-[]subscriptsuperscript𝒱𝑛𝑡𝑡1superscript𝑇𝑛[\mathcal{V}^{(n)}_{t}]_{t=1}^{T^{(n)}}[ caligraphic_V start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT to the hospital. For the sake of brevity, we will omit the (n)𝑛(n)( italic_n ) superscript in future formulae where there is no confusion. In each admission, a set of medical diagnosis codes (𝒟tsubscript𝒟𝑡\mathcal{D}_{t}caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT), a set of medical procedure codes (𝒫tsubscript𝒫𝑡\mathcal{P}_{t}caligraphic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT) and a set of prescribed medication codes (tsubscript𝑡\mathcal{M}_{t}caligraphic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT) will be recorded as 𝒱t={𝒟t,𝒫t,t}subscript𝒱𝑡subscript𝒟𝑡subscript𝒫𝑡subscript𝑡\mathcal{V}_{t}=\{\mathcal{D}_{t},\mathcal{P}_{t},\mathcal{M}_{t}\}caligraphic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT }. Note that, in some medication recommendation models (e.g., COGNet[5] and LEAP [7]), the set of medical diagnosis codes (𝒟tsubscript𝒟𝑡\mathcal{D}_{t}caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT) and the set of medical diagnosis codes (𝒟tsubscript𝒟𝑡\mathcal{D}_{t}caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT) are considered sequences.

It is worth noticing that the diagnosis does not only record the chief complaints (i.e., the prominent symptoms that cause this specific admission to the hospital [28]) of the patient’s admission. It also records other medical conditions of the patient. Assume that there is a diabetic patient with existing liver conditions who was admitted to the hospital due to a broken arm. Not only the bone fracture will be recorded, but the cause of the fracture (e.g., falling), the diabetes and liver conditions will be recorded as well. All the diagnosis information is codified as codes on a medical ontology that can be modelled by OntoMedRec.

3.2 Medical Concept Ontologies

A medical ontology 𝒯*={𝒩*,*,𝐄*}subscript𝒯subscript𝒩subscriptsubscript𝐄\mathcal{T}_{*}=\{\mathcal{N}_{*},\mathcal{E}_{*},\mathbf{E}_{*}\}caligraphic_T start_POSTSUBSCRIPT * end_POSTSUBSCRIPT = { caligraphic_N start_POSTSUBSCRIPT * end_POSTSUBSCRIPT , caligraphic_E start_POSTSUBSCRIPT * end_POSTSUBSCRIPT , bold_E start_POSTSUBSCRIPT * end_POSTSUBSCRIPT } is a hierarchical taxonomy of medical concepts in a certain domain. It is a directed acyclic graph (DAG) where 𝒩*subscript𝒩\mathcal{N}_{*}caligraphic_N start_POSTSUBSCRIPT * end_POSTSUBSCRIPT is the set of nodes, *subscript\mathcal{E}_{*}caligraphic_E start_POSTSUBSCRIPT * end_POSTSUBSCRIPT is the set of edges and 𝐄*subscript𝐄\mathbf{E}_{*}bold_E start_POSTSUBSCRIPT * end_POSTSUBSCRIPT is the matrix of node features. An edge ej=na,nb*subscript𝑒𝑗subscript𝑛𝑎subscript𝑛𝑏subscripte_{j}=\langle n_{a},n_{b}\rangle\in\mathcal{E}_{*}italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ⟨ italic_n start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ⟩ ∈ caligraphic_E start_POSTSUBSCRIPT * end_POSTSUBSCRIPT represents that nbsubscript𝑛𝑏n_{b}italic_n start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is a more specific concept deriving from nasubscript𝑛𝑎n_{a}italic_n start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT (i.e., nasubscript𝑛𝑎n_{a}italic_n start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is a parent of nbsubscript𝑛𝑏n_{b}italic_n start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT). Take the excerpt in Figure 2 again as an example. There is a directed edge from “Cough suppressants, excl. combinations with expectorants (R05D)” to “Other cough suppressant in ATC” (R05DB)” since R05DB is a more specific term to classify a medication.

For OntoMedRec, we will use three non-overlap** taxonomies respectively for diagnoses, medical procedures and medications, namely 𝒯dsubscript𝒯𝑑\mathcal{T}_{d}caligraphic_T start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, 𝒯psubscript𝒯𝑝\mathcal{T}_{p}caligraphic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT and 𝒯msubscript𝒯𝑚\mathcal{T}_{m}caligraphic_T start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. Note that, 𝒯dsubscript𝒯𝑑\mathcal{T}_{d}caligraphic_T start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, 𝒯psubscript𝒯𝑝\mathcal{T}_{p}caligraphic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT and 𝒯msubscript𝒯𝑚\mathcal{T}_{m}caligraphic_T start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT are publicly available and shared by all EHR datasets by their linkage to 𝒟tsubscript𝒟𝑡\mathcal{D}_{t}caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, 𝒫tsubscript𝒫𝑡\mathcal{P}_{t}caligraphic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and tsubscript𝑡\mathcal{M}_{t}caligraphic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT respectively. More specifically, each medical code is a node on the corresponding medical ontology. They can be either a leaf node or a parent node. Since the medical ontology is independent of EHR datasets and all medical concepts in EHR datasets belong to the medical ontology, the pretrained representations of OntoMedRec can be integrated into any downstream recommendation models trained and tested with EHR datasets.

3.3 Medication Recommendation

Following the task definition in in Sec.1 and Sec.2, the medication recommendation task can be formulated as follows:

  • Longitudinal models predict Tsubscript𝑇\mathcal{M}_{T}caligraphic_M start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT given a patient’s admission history [𝒟t,𝒫t]t=1Tsuperscriptsubscriptsubscript𝒟𝑡subscript𝒫𝑡𝑡1𝑇[\mathcal{D}_{t},\mathcal{P}_{t}]_{t=1}^{T}[ caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. Some of them add past medication records [t]t=1T1superscriptsubscriptdelimited-[]subscript𝑡𝑡1𝑇1[\mathcal{M}_{t}]_{t=1}^{T-1}[ caligraphic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT (e.g., MICRON [8] and COGNet [5]).

  • Instance-based models predict Tsubscript𝑇\mathcal{M}_{T}caligraphic_M start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT given a patient’s diagnosis information in the current visit [𝒟T,𝒫T]subscript𝒟𝑇subscript𝒫𝑇[\mathcal{D}_{T},\mathcal{P}_{T}][ caligraphic_D start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , caligraphic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ].

3.4 Logic Tensor Networks

Logic Tensor Networks (LTNs) are the neural networks for data modelling with quantifiable and human-interpretable rules. They are based on real logic [19] defined on a first-order language \mathcal{L}caligraphic_L. \mathcal{L}caligraphic_L is composed of [18]:

  • A set of constants. In our case, it is the node feature matrices 𝐄*|𝒩*|×dsubscript𝐄superscriptsubscript𝒩𝑑\mathbf{E}_{*}\in\mathbb{R}^{|\mathcal{N}_{*}|\times d}bold_E start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_N start_POSTSUBSCRIPT * end_POSTSUBSCRIPT | × italic_d end_POSTSUPERSCRIPT where |𝒩*|subscript𝒩|\mathcal{N}_{*}|| caligraphic_N start_POSTSUBSCRIPT * end_POSTSUBSCRIPT | is the number of nodes and d𝑑ditalic_d is the dimension of the node representations.

  • A set of variables. They are the symbols created over the subset of the constants to describe the logical relationships in the graph.

  • A set of predicates. They are a set of functions {f1(),f2(),,fn()}subscript𝑓1subscript𝑓2subscript𝑓𝑛\{f_{1}(\cdot),f_{2}(\cdot),\cdots,f_{n}(\cdot)\}{ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ⋅ ) , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ⋅ ) , ⋯ , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ ) } that take variables as inputs and calculate the satisfiability scores of a logical relationship.

  • A set of connectives. They are logical operators and aggregation operators (e.g., “and (\land)” and “not (¬\neg¬)”).

There, a knowledge base can be defined as a triple <𝒦,𝒢(|θ),Θ><\mathcal{K},\mathcal{G}(\cdot|\theta),\Theta>< caligraphic_K , caligraphic_G ( ⋅ | italic_θ ) , roman_Θ >, where

  • 𝒦𝒦\mathcal{K}caligraphic_K is a set of closed formulae (i.e., axioms) defined by the variables, predicates and connectives in \mathcal{L}caligraphic_L and the set of domain symbols. They are highly interpretable propositional logic expressions.

  • 𝒢(|θ)\mathcal{G}(\cdot|\theta)caligraphic_G ( ⋅ | italic_θ ) is the parameter groundings of the symbols and logical operators,

  • ΘΘ\Thetaroman_Θ is the set of parameters in the groundings. This includes the trainable parameters of predicates and constants.

The training of an LTN model aims to find the set of optimal parameters Θ*superscriptΘ\Theta^{*}roman_Θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT that maximise the aggregated satisfiability of <𝒦,𝒢(|θ)><\mathcal{K},\mathcal{G}(\cdot|\theta)>< caligraphic_K , caligraphic_G ( ⋅ | italic_θ ) >

Θ*superscriptΘ\displaystyle\Theta^{*}roman_Θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT =argmaxθΘSatAggϕ𝒦(𝒢θ(ϕ))absentsubscriptargmax𝜃ΘsubscriptSatAggitalic-ϕ𝒦subscript𝒢𝜃italic-ϕ\displaystyle=\operatorname*{argmax}_{\theta\in\Theta}\operatorname*{SatAgg}_{% \phi\in\mathcal{K}}(\mathcal{G}_{\theta}(\phi))= roman_argmax start_POSTSUBSCRIPT italic_θ ∈ roman_Θ end_POSTSUBSCRIPT roman_SatAgg start_POSTSUBSCRIPT italic_ϕ ∈ caligraphic_K end_POSTSUBSCRIPT ( caligraphic_G start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_ϕ ) ) (1)
SatAggϕ𝒦(𝒢θ(ϕ))subscriptSatAggitalic-ϕ𝒦subscript𝒢𝜃italic-ϕ\displaystyle\operatorname*{SatAgg}_{\phi\in\mathcal{K}}(\mathcal{G}_{\theta}(% \phi))roman_SatAgg start_POSTSUBSCRIPT italic_ϕ ∈ caligraphic_K end_POSTSUBSCRIPT ( caligraphic_G start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_ϕ ) ) =1(1|𝒦|ϕ𝒦(𝒢θ(ϕ)))1pabsent1superscript1𝒦subscriptitalic-ϕ𝒦subscript𝒢𝜃italic-ϕ1𝑝\displaystyle=1-(\frac{1}{|\mathcal{K}|}\sum_{\phi\in\mathcal{K}}(\mathcal{G}_% {\theta}(\phi)))^{\frac{1}{p}}= 1 - ( divide start_ARG 1 end_ARG start_ARG | caligraphic_K | end_ARG ∑ start_POSTSUBSCRIPT italic_ϕ ∈ caligraphic_K end_POSTSUBSCRIPT ( caligraphic_G start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_ϕ ) ) ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p end_ARG end_POSTSUPERSCRIPT (2)

where SatAgg()SatAgg\text{SatAgg}(\cdot)SatAgg ( ⋅ ) is the function that aggregates the satisfiabilities of each axiom and p𝑝pitalic_p is a hyperparameter.

Therefore, the training goal can be formulated as the minimisation of the loss \mathcal{L}caligraphic_L :

=1SatAggϕ𝒦(𝒢θ(ϕ))1subscriptSatAggitalic-ϕ𝒦subscript𝒢𝜃italic-ϕ\mathcal{L}=1-\operatorname*{SatAgg}_{\phi\in\mathcal{K}}(\mathcal{G}_{\theta}% (\phi))caligraphic_L = 1 - roman_SatAgg start_POSTSUBSCRIPT italic_ϕ ∈ caligraphic_K end_POSTSUBSCRIPT ( caligraphic_G start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_ϕ ) ) (3)

A more detailed and illustrative description of how these components are used to describe the logical relationship in medical ontologies is at 4.1

3.5 The Indication Relationship between Medications and Diagnoses

If a medication m𝑚mitalic_m was designed for treating a medical diagnosis d𝑑ditalic_d, an indication relationship <m,d><m,d>< italic_m , italic_d > can be defined. A medication can be designed to treat a set of medical conditions. If a medication is able to treat a parent node on the diagnosis ontology, it can be considered that it can cure all its children nodes. It is worth noticing that the indication relations graph does not enumerate all the existing indication relations.

4 The OntoMedRec Model

4.1 Pre-training Ontology Encoders

By definition, the chosen medical ontologies have the following characteristics:

  • Explicit directed edges. An edge in a medical ontology refers to a parent-child relationship between the two nodes. This relationship is not interchangeable or reflexive.

  • Implicit deductive relationships. Besides explicit edges in *subscript\mathcal{E}_{*}caligraphic_E start_POSTSUBSCRIPT * end_POSTSUBSCRIPT, there are deductive relationships in medical ontologies.

    • Two nodes with the same parent node are sibling nodes. They have definitive differences on their level.

    • Ancestor nodes are multi-hop parent nodes. Ancestral relationships are not commutative or reflexive. We define one-hop ancestors as “parents” but not “ancestors”.

  • Each node (except the root node) has only one parent.

To accurately model the structural characteristics of medical ontologies, we pre-train three medical ontology node encoders, respectively for diagnoses, procedures and medications using logic tensor networks. Following the axioms used for the ontology deduction task in [19], we devise a set of additional axioms regarding the explicit and deductive relationships among nodes. Additionally, we devise axioms to define the sibling relationships in the ontology.

Since medical ontologies are much larger compared to the ontology in [19], it is impractical to define variables over all the nodes in these three ontologies. For instance, to describe the axiom ”the parent node of the parent node of a node is an ancestor node”, three variables are required (“x,y,z:P(x,y)P(y,z)A(x,z):for-all𝑥𝑦𝑧𝑃𝑥𝑦𝑃𝑦𝑧𝐴𝑥𝑧\forall x,y,z:P(x,y)\land P(y,z)\to A(x,z)∀ italic_x , italic_y , italic_z : italic_P ( italic_x , italic_y ) ∧ italic_P ( italic_y , italic_z ) → italic_A ( italic_x , italic_z )” where x,y,z𝑥𝑦𝑧x,y,zitalic_x , italic_y , italic_z are variables and P(,)𝑃P(\cdot,\cdot)italic_P ( ⋅ , ⋅ ) and A(,)𝐴A(\cdot,\cdot)italic_A ( ⋅ , ⋅ ) are the predicates that calculate the satisfiability of the parent and ancestor relation). Each time a new variable is created over the entire ontology, a new copy of the embedding matrix of all the nodes (𝐄*|𝒩|×dsubscript𝐄superscript𝒩𝑑\mathbf{E}_{*}\in\mathbb{R}^{|\mathcal{N}|\times d}bold_E start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_N | × italic_d end_POSTSUPERSCRIPT) is required, which is a task of the space complexity of O(n|𝒩|d)𝑂𝑛𝒩𝑑O(n|\mathcal{N}|d)italic_O ( italic_n | caligraphic_N | italic_d ), where n𝑛nitalic_n is the number of variables. To achieve efficient and effective training of the encoders, we design an axiom-oriented sampling method. We, firstly, randomly sample a batch of nodes from the ontology. Then, we sample all their respective ancestors, parents and siblings. This set of nodes constitutes a training node batch. All the directed edges between two nodes in the set constitute the positive edge samples, whereas all the node pairs without directive edges between them constitute the negative edge samples. With the adoption of the sampling method, the space complexity of the creation of variables are reduced to O(nbd)𝑂𝑛𝑏𝑑O(nbd)italic_O ( italic_n italic_b italic_d ), where b𝑏bitalic_b is the batch size.

4.1.1 The Knowledge Formulation of Ontology Data

Therefore, the knowledge of an ontology can be formulated as follows, using the notations in [18].

  • Domain Medical terms in the ontology

  • Variables x𝑥xitalic_x, y𝑦yitalic_y and z𝑧zitalic_z, ranging over a batch of sampled nodes Nb𝒯*subscript𝑁𝑏subscript𝒯N_{b}\subset\mathcal{T}_{*}italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ⊂ caligraphic_T start_POSTSUBSCRIPT * end_POSTSUBSCRIPT

  • Predicates P*(x,y)subscript𝑃𝑥𝑦P_{*}(x,y)italic_P start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_y ) as the parent scorer, S*(x,y)subscript𝑆𝑥𝑦S_{*}(x,y)italic_S start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_y ) as the sibling scorer and A*(x,y)subscript𝐴𝑥𝑦A_{*}(x,y)italic_A start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_y ) as the ancestor scorer

  • Axioms

    • Parental relationships are not reflexive and commutative: xNb:¬P*(x,x):for-all𝑥subscript𝑁𝑏subscript𝑃𝑥𝑥\forall x\in N_{b}:\neg P_{*}(x,x)∀ italic_x ∈ italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT : ¬ italic_P start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_x ), x,yNb:P*(x,y)¬P*(y,x):for-all𝑥𝑦subscript𝑁𝑏subscript𝑃𝑥𝑦subscript𝑃𝑦𝑥\forall x,y\in N_{b}:P_{*}(x,y)\to\neg P_{*}(y,x)∀ italic_x , italic_y ∈ italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT : italic_P start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_y ) → ¬ italic_P start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_y , italic_x )

    • Ancestral relationships are not reflexive and commutative: xNb:¬A*(x,x):for-all𝑥subscript𝑁𝑏subscript𝐴𝑥𝑥\forall x\in N_{b}:\neg A_{*}(x,x)∀ italic_x ∈ italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT : ¬ italic_A start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_x ), x,yNb:A*(x,y)¬A*(y,x):for-all𝑥𝑦subscript𝑁𝑏subscript𝐴𝑥𝑦subscript𝐴𝑦𝑥\forall x,y\in N_{b}:A_{*}(x,y)\to\neg A_{*}(y,x)∀ italic_x , italic_y ∈ italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT : italic_A start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_y ) → ¬ italic_A start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_y , italic_x )

    • The definition of sibling relationships (nodes with the same parent node): x,y,zNb:P*(x,y)P*(x,z)S*(y,z):for-all𝑥𝑦𝑧subscript𝑁𝑏subscript𝑃𝑥𝑦subscript𝑃𝑥𝑧subscript𝑆𝑦𝑧\forall x,y,z\in N_{b}:P_{*}(x,y)\land P_{*}(x,z)\to S_{*}(y,z)∀ italic_x , italic_y , italic_z ∈ italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT : italic_P start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_y ) ∧ italic_P start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_z ) → italic_S start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_y , italic_z )

    • Sibling relationships are not reflexive but commutative: xNb:¬S*(x,x):for-all𝑥subscript𝑁𝑏subscript𝑆𝑥𝑥\forall x\in N_{b}:\neg S_{*}(x,x)∀ italic_x ∈ italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT : ¬ italic_S start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_x ), x,yNb:S*(x,y)S*(y,x):for-all𝑥𝑦subscript𝑁𝑏subscript𝑆𝑥𝑦subscript𝑆𝑦𝑥\forall x,y\in N_{b}:S_{*}(x,y)\to S_{*}(y,x)∀ italic_x , italic_y ∈ italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT : italic_S start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_y ) → italic_S start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_y , italic_x )

    • The parent node of a parent node is an ancestor node: x,y,zNb:P*(x,y)P*(y,z)A*(x,z):for-all𝑥𝑦𝑧subscript𝑁𝑏subscript𝑃𝑥𝑦subscript𝑃𝑦𝑧subscript𝐴𝑥𝑧\forall x,y,z\in N_{b}:P_{*}(x,y)\land P_{*}(y,z)\to A_{*}(x,z)∀ italic_x , italic_y , italic_z ∈ italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT : italic_P start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_y ) ∧ italic_P start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_y , italic_z ) → italic_A start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_z )

    • The parent node of an ancestor node is an ancestor node: x,y,zNb:P*(x,y)A*(y,z)A*(x,z):for-all𝑥𝑦𝑧subscript𝑁𝑏subscript𝑃𝑥𝑦subscript𝐴𝑦𝑧subscript𝐴𝑥𝑧\forall x,y,z\in N_{b}:P_{*}(x,y)\land A_{*}(y,z)\to A_{*}(x,z)∀ italic_x , italic_y , italic_z ∈ italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT : italic_P start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_y ) ∧ italic_A start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_y , italic_z ) → italic_A start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_z )

    • Positive and negative edges in the batch: (x,y)Pb:P*(x,y):for-all𝑥𝑦subscript𝑃𝑏subscript𝑃𝑥𝑦\forall(x,y)\in P_{b}:P_{*}(x,y)∀ ( italic_x , italic_y ) ∈ italic_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT : italic_P start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_y ), (x,y)Pb:¬P*(x,y):for-all𝑥𝑦subscript𝑃𝑏subscript𝑃𝑥𝑦\forall(x,y)\notin P_{b}:\neg P_{*}(x,y)∀ ( italic_x , italic_y ) ∉ italic_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT : ¬ italic_P start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ( italic_x , italic_y )

  • Grounding

    • Let 𝐯nsubscript𝐯𝑛\mathbf{v}_{n}bold_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be the representation of node n𝑛nitalic_n, 𝒢(𝐯n)=d𝒢subscript𝐯𝑛superscript𝑑\mathcal{G}(\mathbf{v}_{n})=\mathbb{R}^{d}caligraphic_G ( bold_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT

    • 𝒢(x|θ)=𝒢(y|θ)=𝒢(z|θ)=[𝐯n|nNn]𝒢conditional𝑥𝜃𝒢conditional𝑦𝜃𝒢conditional𝑧𝜃delimited-[]conditionalsubscript𝐯𝑛𝑛subscript𝑁𝑛\mathcal{G}(x|\theta)=\mathcal{G}(y|\theta)=\mathcal{G}(z|\theta)=[\mathbf{v}_% {n}|n\in N_{n}]caligraphic_G ( italic_x | italic_θ ) = caligraphic_G ( italic_y | italic_θ ) = caligraphic_G ( italic_z | italic_θ ) = [ bold_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | italic_n ∈ italic_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ]

    • 𝒢(P*|θ)𝒢conditionalsubscript𝑃𝜃\mathcal{G}(P_{*}|\theta)caligraphic_G ( italic_P start_POSTSUBSCRIPT * end_POSTSUBSCRIPT | italic_θ ), 𝒢(S*|θ)𝒢conditionalsubscript𝑆𝜃\mathcal{G}(S_{*}|\theta)caligraphic_G ( italic_S start_POSTSUBSCRIPT * end_POSTSUBSCRIPT | italic_θ ) and 𝒢(A*|θ)𝒢conditionalsubscript𝐴𝜃\mathcal{G}(A_{*}|\theta)caligraphic_G ( italic_A start_POSTSUBSCRIPT * end_POSTSUBSCRIPT | italic_θ ) are σ(MLP(x,y))𝜎MLP𝑥𝑦\sigma(\text{MLP}(x,y))italic_σ ( MLP ( italic_x , italic_y ) ) with one output neuron and sigmoid function (σ()𝜎\sigma(\cdot)italic_σ ( ⋅ )) as the activation of the final layer

Ontology encoders are trained to maximise the aggregated satisfiability of all these axioms describing the structural characteristics of the ontology. For each ontology, we use a different set of predicates with the same structure. The three sets of predicates are optimised separately.

4.1.2 The Alignment of Diagnosis and Medication Representations

Intuitively, aligning the representations of medications and diagnoses after they have been respectively trained shortens the distance of these representations. The representations of diagnoses and medications are infused with the indication relationship. Therefore, using the pretrained representations from OntoMedRec as a starting point can improve the performance of the model, particularly in admissions with few-shot medications. Similarly, the knowledge of the MEDI dataset can be formulated as follows:

  • Domains: Medical terms in the medication and diagnoses ontology

  • Variables

    • Medication m𝑚mitalic_m ranging over all the medications in the batch of sampled indication pairs

    • Diagnoses sxsubscript𝑠𝑥s_{x}italic_s start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and sysubscript𝑠𝑦s_{y}italic_s start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ranging over all the medications in the batch of sampled indication pairs

  • Predicates I(m,d)𝐼𝑚𝑑I(m,d)italic_I ( italic_m , italic_d ) for the indication relationship

  • Axioms: Let \mathcal{I}caligraphic_I be all the indication pairs in a sampled batch in MEDI dataset: (m,sx)b:I(m,sx):for-all𝑚subscript𝑠𝑥subscript𝑏𝐼𝑚subscript𝑠𝑥\forall(m,s_{x})\in\mathcal{I}_{b}:I(m,s_{x})∀ ( italic_m , italic_s start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) ∈ caligraphic_I start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT : italic_I ( italic_m , italic_s start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT )

  • Grounding

    • Let 𝐦𝐦\mathbf{m}bold_m and 𝐝𝐝\mathbf{d}bold_d and be the representation of medication m𝑚mitalic_m and diagnosis d𝑑ditalic_d, 𝒢(𝐦)=𝒢(𝐝)=d𝒢𝐦𝒢𝐝superscript𝑑\mathcal{G}(\mathbf{m})=\mathcal{G}(\mathbf{d})=\mathbb{R}^{d}caligraphic_G ( bold_m ) = caligraphic_G ( bold_d ) = blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT

    • 𝒢(m|θ)=[𝐦m|m]𝒢conditional𝑚𝜃delimited-[]conditionalsubscript𝐦𝑚𝑚\mathcal{G}(m|\theta)=[\mathbf{m}_{m}|m\in\mathcal{I}]caligraphic_G ( italic_m | italic_θ ) = [ bold_m start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT | italic_m ∈ caligraphic_I ], 𝒢(d|θ)=[𝐝d|d]𝒢conditional𝑑𝜃delimited-[]conditionalsubscript𝐝𝑑𝑑\mathcal{G}(d|\theta)=[\mathbf{d}_{d}|d\in\mathcal{I}]caligraphic_G ( italic_d | italic_θ ) = [ bold_d start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT | italic_d ∈ caligraphic_I ]

    • 𝒢(I|θ)𝒢conditional𝐼𝜃\mathcal{G}(I|\theta)caligraphic_G ( italic_I | italic_θ ) is σ(MLP(m,d))𝜎MLP𝑚𝑑\sigma(\text{MLP}(m,d))italic_σ ( MLP ( italic_m , italic_d ) ) with one output neuron and sigmoid function (σ()𝜎\sigma(\cdot)italic_σ ( ⋅ )) as the activation of the final layer

In each pretraining epoch, we train the three encoders sequentially then align medication and diagnosis embeddings with the indication dataset. We save the procedure embeddings with the highest satisfiability on the procedure ontology, and the medication and diagnoses embeddings with the highest satisfiability on the indication dataset.

4.2 Fine-Tuning with Downstream Models

Following the pre-training phase, the embeddings of medical terms are integrated with downstream medication recommendation models for further fine-tuning. We choose both instance-based models (Leap [7] and 4SDrug [2]) and longitudinal models (RETAIN [22], SafeDrug [4] and MICRON [8]) to fine-tune and evaluate OntoMedRec.

The representations of diagnoses and procedures (and medications, where possible) are loaded as a starting point for the respective embedding table and are further end-to-end fine-tuned with the medication recommendation task.

5 Experiments

5.1 Experimental Setup

5.1.1 Dataset

We use the ATC ontology for medications and the ICD9-CM ontology for diagnoses and procedures from BioPortal [29] to pretrain OntoMedRec. ICD9-CM is split into two sub-ontologies, respectively for diagnoses and procedures. The characteristics of these ontologies are listed in Table 1.

Table 1: The statistical characteristics of the medical ontologies
Diagnosis Procedure Medication
# nodes 17737 4670 6441
# edges 17736 4669 6440
Max depth 7 4 5
Table 2: The statistical characteristics of the MIMIC-III dataset
Item MIMIC-III
# patients 35441
# admissions 44129
# medications 120
# procedures 1975
# diagnoses 6658

We use the benchmark dataset MIMIC-III [30] to fine-tune and evaluate the performance of downstream models integrated with the representations of OntoMedRec and other baselines. The statistical characteristics of the datasets are described in Table 2. To explore the performance of downstream models with or without OntoMedRec in sparse cases, we reserve the patients with only one admission, low-frequency diagnoses and low-frequency medications that were discarded in previous studies (e.g., in [4]). The ratio of training, testing and validation set is 4:1:1:41:14:1:14 : 1 : 1. We split out a set of admissions with few-shot medications. We use TWOSIDES dataset [31] as the ground truth of drug-drug interactions (DDIs). In contrast to previous studies, we reserve the drug pairs with lower numbers of DDIs.

5.1.2 The Generation of Few-Shot Medications Test Cases

We sort all medications in the EHR dataset by their frequencies. The medications with the lowest 30% frequency (i.e., tail percentage) are designated as few-shot medications. Prescriptions in the test set with more than 1 few-shot medication are added to the few-shot test set.

5.1.3 Baselines

There are two major categories of existing medical ontology modelling methods: EHR-independent models (KAMPNet [14]) and EHR-dependent models (G-BERT [13]). Both KAMPNet and G-BERT use GAT [15] to model medical ontology. Thus, we also choose GAT as one of the baselines to validate the effectiveness of our model. KAME [17] and GRAM [16] are not comparable to our models because their ontology training is deeply coupled with their downstream task which are not medication recommendation. GCN is commonly used for the modelling of EHR and DDI graphs [24]. Therefore, we choose randomly initiated naive embedding table, GAT [15] and GCN as baselines. GAT and GCN are fine-tuned along with the downstream models. We use link prediction as the pretraining task for these two models. The two baselines are trained 20 epochs, the best checkpoints with the lowest loss are selected.

5.1.4 Evaluation Metrics

Following the evaluation protocol of many medication recommendation models, we use the following metrics:

  • Jaccard coefficient. It is the most common benchmark score for medication recommendation models. The Jaccard coefficient of all the the n𝑛nitalic_n patient’s T𝑇Titalic_T admissions is calculated as follows:

    Jaccardt(n)subscriptsuperscriptJaccard𝑛𝑡\displaystyle\text{Jaccard}^{(n)}_{t}Jaccard start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =|{i:𝐦t,i(n)=1}{i:𝐦^t,i(n)=1}||{i:𝐦t,i(n)=1}{i:𝐦^t,i(n)=1}|absentconditional-set𝑖subscriptsuperscript𝐦𝑛𝑡𝑖1conditional-set𝑖subscriptsuperscript^𝐦𝑛𝑡𝑖1conditional-set𝑖subscriptsuperscript𝐦𝑛𝑡𝑖1conditional-set𝑖subscriptsuperscript^𝐦𝑛𝑡𝑖1\displaystyle=\frac{|\{i:\mathbf{m}^{(n)}_{t,i}=1\}\cap\{i:\mathbf{\hat{m}}^{(% n)}_{t,i}=1\}|}{|\{i:\mathbf{m}^{(n)}_{t,i}=1\}\cup\{i:\mathbf{\hat{m}}^{(n)}_% {t,i}=1\}|}= divide start_ARG | { italic_i : bold_m start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 1 } ∩ { italic_i : over^ start_ARG bold_m end_ARG start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 1 } | end_ARG start_ARG | { italic_i : bold_m start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 1 } ∪ { italic_i : over^ start_ARG bold_m end_ARG start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 1 } | end_ARG (4)
    Jaccard(n)superscriptJaccard𝑛\displaystyle\text{Jaccard}^{(n)}Jaccard start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT =1T(n)t=1T(n)Jaccardt(n)absent1superscript𝑇𝑛superscriptsubscript𝑡1superscript𝑇𝑛subscriptsuperscriptJaccard𝑛𝑡\displaystyle=\frac{1}{T^{(n)}}\sum_{t=1}^{T^{(n)}}\text{Jaccard}^{(n)}_{t}= divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT Jaccard start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (5)

    where {i:𝐦t,i(n)=1}conditional-set𝑖subscriptsuperscript𝐦𝑛𝑡𝑖1\{i:\mathbf{m}^{(n)}_{t,i}=1\}{ italic_i : bold_m start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 1 } is the set of indices where the element at i𝑖iitalic_i on multi-hot encoding vector is 1. The higher the Jaccard coefficient is, the more accurate the recommendation is (i.e., the recommended set is more similar to the label).

  • Drug-Drug Interaction (DDI) score. It is the percentage of medication pairs with known DDIs in the recommended set of medications. The lower it is, the fewer DDIs there are in the generated medication recommendation, and the safer the recommended medication combination can be considered. The DDI score of the n𝑛nitalic_n patient’s admissions is calculated as follows:

    DDI(n)=tT(n)j,k𝐦^t,i(n)=1𝟏{𝐃j,k=1}j,k𝐦^t,i(n)=11superscriptDDI𝑛superscriptsubscript𝑡superscript𝑇𝑛subscript𝑗𝑘subscriptsuperscript^𝐦𝑛𝑡𝑖11subscript𝐃𝑗𝑘1subscript𝑗𝑘subscriptsuperscript^𝐦𝑛𝑡𝑖11\text{DDI}^{(n)}=\frac{\sum_{t}^{T^{(n)}}\sum_{j,k\in\mathbf{\hat{m}}^{(n)}_{t% ,i}=1}\mathbf{1}\{\mathbf{D}_{j,k}=1\}}{\sum_{j,k\in\mathbf{\hat{m}}^{(n)}_{t,% i}=1}1}DDI start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j , italic_k ∈ over^ start_ARG bold_m end_ARG start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT bold_1 { bold_D start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT = 1 } end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j , italic_k ∈ over^ start_ARG bold_m end_ARG start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT 1 end_ARG (6)

    where 𝐃||×||𝐃superscript\mathbf{D}\in\mathbb{R}^{|\mathcal{M}|\times|\mathcal{M}|}bold_D ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_M | × | caligraphic_M | end_POSTSUPERSCRIPT is the DDI matrix retrieve from the TWOSIDES dataset [31] and 𝟏{}1\mathbf{1}\{\cdot\}bold_1 { ⋅ } is an indicator function that returns 1 when the input is true and 0 otherwise. 𝐃j,k=1subscript𝐃𝑗𝑘1\mathbf{D}_{j,k}=1bold_D start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT = 1 indicates that medication j𝑗jitalic_j and medication k𝑘kitalic_k have at least one adverse effect when prescribed together.

  • F1 score. It is commonly used as a metric for classification tasks. It is the harmonic mean of the precision and recall score and is calculated as follows:

    Precisiont(n)subscriptsuperscriptPrecision𝑛𝑡\displaystyle\text{Precision}^{(n)}_{t}Precision start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =|{i:𝐦t,i(n)=1}{i:𝐦^t,i(n)=1}||{i:𝐦^t,i(n)=1}|absentconditional-set𝑖subscriptsuperscript𝐦𝑛𝑡𝑖1conditional-set𝑖subscriptsuperscript^𝐦𝑛𝑡𝑖1conditional-set𝑖subscriptsuperscript^𝐦𝑛𝑡𝑖1\displaystyle=\frac{|\{i:\mathbf{m}^{(n)}_{t,i}=1\}\cap\{i:\mathbf{\hat{m}}^{(% n)}_{t,i}=1\}|}{|\{i:\mathbf{\hat{m}}^{(n)}_{t,i}=1\}|}= divide start_ARG | { italic_i : bold_m start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 1 } ∩ { italic_i : over^ start_ARG bold_m end_ARG start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 1 } | end_ARG start_ARG | { italic_i : over^ start_ARG bold_m end_ARG start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 1 } | end_ARG (7)
    Recallt(n)subscriptsuperscriptRecall𝑛𝑡\displaystyle\text{Recall}^{(n)}_{t}Recall start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =|{i:𝐦t,i(n)=1}{i:𝐦^t,i(n)=1}||{i:𝐦t,i(n)=1}|absentconditional-set𝑖subscriptsuperscript𝐦𝑛𝑡𝑖1conditional-set𝑖subscriptsuperscript^𝐦𝑛𝑡𝑖1conditional-set𝑖subscriptsuperscript𝐦𝑛𝑡𝑖1\displaystyle=\frac{|\{i:\mathbf{m}^{(n)}_{t,i}=1\}\cap\{i:\mathbf{\hat{m}}^{(% n)}_{t,i}=1\}|}{|\{i:\mathbf{m}^{(n)}_{t,i}=1\}|}= divide start_ARG | { italic_i : bold_m start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 1 } ∩ { italic_i : over^ start_ARG bold_m end_ARG start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 1 } | end_ARG start_ARG | { italic_i : bold_m start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 1 } | end_ARG (8)
    F1t(n)subscriptsuperscriptF1𝑛𝑡\displaystyle\text{F1}^{(n)}_{t}F1 start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =2Precisiont(n)Recallt(n)Precisiont(n)+Recallt(n)absent2subscriptsuperscriptPrecision𝑛𝑡subscriptsuperscriptRecall𝑛𝑡subscriptsuperscriptPrecision𝑛𝑡subscriptsuperscriptRecall𝑛𝑡\displaystyle=2\cdot\frac{\text{Precision}^{(n)}_{t}\cdot\text{Recall}^{(n)}_{% t}}{\text{Precision}^{(n)}_{t}+\text{Recall}^{(n)}_{t}}= 2 ⋅ divide start_ARG Precision start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ Recall start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG Precision start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + Recall start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG (9)

5.2 Results Discussion

Table 3 lists the results of the performance of downstream models in the entire testing set, and Table 4 lists the results with the few-shot medications testing set. The drop in performance between Table 3 and Table 4 in all downstream models proves our assumption that the data sparsity issue harms the performance of downstream models. Although OntoMedRec cannot achieve the lowest DDI in some models in both test settings, they are lower than the ground truth DDI score (0.078 among the entire test set and 0.069 in the few-shot test set).

Table 3: Performance of selected models on the MIMIC-III dataset.
Jaccard F1 DDI No. of drugs
LEAP Naive 0.4689±0.0019plus-or-minus0.46890.00190.4689\pm 0.00190.4689 ± 0.0019 0.6287±0.0019plus-or-minus0.62870.00190.6287\pm 0.00190.6287 ± 0.0019 0.0603±0.0004plus-or-minus0.06030.00040.0603\pm 0.00040.0603 ± 0.0004 17.8810±0.0405plus-or-minus17.88100.040517.8810\pm 0.040517.8810 ± 0.0405
+GAT 0.4178±0.0009plus-or-minus0.41780.00090.4178\pm 0.00090.4178 ± 0.0009 0.5815±0.0009plus-or-minus0.58150.00090.5815\pm 0.00090.5815 ± 0.0009 0.0631±0.0000plus-or-minus0.06310.00000.0631\pm 0.00000.0631 ± 0.0000 19.9971±0.0014plus-or-minus19.99710.001419.9971\pm 0.001419.9971 ± 0.0014
+GCN 0.3853±0.0012plus-or-minus0.38530.00120.3853\pm 0.00120.3853 ± 0.0012 0.5500±0.0013plus-or-minus0.55000.00130.5500\pm 0.00130.5500 ± 0.0013 0.0769±0.0000plus-or-minus0.07690.00000.0769\pm 0.00000.0769 ± 0.0000 12.9998±0.0002plus-or-minus12.99980.000212.9998\pm 0.000212.9998 ± 0.0002
+OMR 0.4732±0.0017plus-or-minus0.47320.0017\mathbf{0.4732\pm 0.0017}bold_0.4732 ± bold_0.0017 0.6322±0.0016plus-or-minus0.63220.0016\mathbf{0.6322\pm 0.0016}bold_0.6322 ± bold_0.0016 0.0596±0.0004plus-or-minus0.05960.0004\mathbf{0.0596\pm 0.0004}bold_0.0596 ± bold_0.0004 17.1709±0.0465plus-or-minus17.17090.046517.1709\pm 0.046517.1709 ± 0.0465
SafeDrug Naive 0.5431±0.0016plus-or-minus0.54310.00160.5431\pm 0.00160.5431 ± 0.0016 0.6920±0.0014plus-or-minus0.69200.00140.6920\pm 0.00140.6920 ± 0.0014 0.0600±0.0002plus-or-minus0.06000.00020.0600\pm 0.00020.0600 ± 0.0002 21.9985±0.0668plus-or-minus21.99850.066821.9985\pm 0.066821.9985 ± 0.0668
+GAT 0.5232±0.0022plus-or-minus0.52320.00220.5232\pm 0.00220.5232 ± 0.0022 0.6740±0.0020plus-or-minus0.67400.00200.6740\pm 0.00200.6740 ± 0.0020 0.0569±0.0002plus-or-minus0.05690.0002\mathbf{0.0569\pm 0.0002}bold_0.0569 ± bold_0.0002 21.6908±0.0838plus-or-minus21.69080.083821.6908\pm 0.083821.6908 ± 0.0838
+GCN 0.5202±0.0013plus-or-minus0.52020.00130.5202\pm 0.00130.5202 ± 0.0013 0.6707±0.0010plus-or-minus0.67070.00100.6707\pm 0.00100.6707 ± 0.0010 0.0580±0.0002plus-or-minus0.05800.00020.0580\pm 0.00020.0580 ± 0.0002 22.1046±0.1074plus-or-minus22.10460.107422.1046\pm 0.107422.1046 ± 0.1074
+OMR 0.5481±0.0024plus-or-minus0.54810.0024\mathbf{0.5481\pm 0.0024}bold_0.5481 ± bold_0.0024 0.6965±0.0021plus-or-minus0.69650.0021\mathbf{0.6965\pm 0.0021}bold_0.6965 ± bold_0.0021 0.0589±0.0002plus-or-minus0.05890.00020.0589\pm 0.00020.0589 ± 0.0002 22.1837±0.0790plus-or-minus22.18370.079022.1837\pm 0.079022.1837 ± 0.0790
MICRON Naive 0.5147±0.0020plus-or-minus0.51470.00200.5147\pm 0.00200.5147 ± 0.0020 0.6645±0.0020plus-or-minus0.66450.00200.6645\pm 0.00200.6645 ± 0.0020 0.0504±0.0005plus-or-minus0.05040.0005\mathbf{0.0504\pm 0.0005}bold_0.0504 ± bold_0.0005 15.7760±0.1218plus-or-minus15.77600.121815.7760\pm 0.121815.7760 ± 0.1218
+GAT 0.4991±0.0026plus-or-minus0.49910.00260.4991\pm 0.00260.4991 ± 0.0026 0.6508±0.0023plus-or-minus0.65080.00230.6508\pm 0.00230.6508 ± 0.0023 0.0510±0.0003plus-or-minus0.05100.00030.0510\pm 0.00030.0510 ± 0.0003 15.4560±0.1239plus-or-minus15.45600.123915.4560\pm 0.123915.4560 ± 0.1239
+GCN 0.5003±0.0017plus-or-minus0.50030.00170.5003\pm 0.00170.5003 ± 0.0017 0.6524±0.0016plus-or-minus0.65240.00160.6524\pm 0.00160.6524 ± 0.0016 0.0511±0.0004plus-or-minus0.05110.00040.0511\pm 0.00040.0511 ± 0.0004 15.4916±0.0648plus-or-minus15.49160.064815.4916\pm 0.064815.4916 ± 0.0648
+OMR 0.5203±0.0019plus-or-minus0.52030.0019\mathbf{0.5203\pm 0.0019}bold_0.5203 ± bold_0.0019 0.6696±0.0019plus-or-minus0.66960.0019\mathbf{0.6696\pm 0.0019}bold_0.6696 ± bold_0.0019 0.0517±0.0003plus-or-minus0.05170.00030.0517\pm 0.00030.0517 ± 0.0003 16.1107±0.1218plus-or-minus16.11070.121816.1107\pm 0.121816.1107 ± 0.1218
4SDrug Naive 0.4667±0.0017plus-or-minus0.46670.00170.4667\pm 0.00170.4667 ± 0.0017 0.6261±0.0016plus-or-minus0.62610.00160.6261\pm 0.00160.6261 ± 0.0016 0.0478±0.0004plus-or-minus0.04780.00040.0478\pm 0.00040.0478 ± 0.0004 13.7054±0.0463plus-or-minus13.70540.046313.7054\pm 0.046313.7054 ± 0.0463
+GAT 0.4666±0.0016plus-or-minus0.46660.00160.4666\pm 0.00160.4666 ± 0.0016 0.6260±0.0017plus-or-minus0.62600.00170.6260\pm 0.00170.6260 ± 0.0017 0.0477±0.0005plus-or-minus0.04770.00050.0477\pm 0.00050.0477 ± 0.0005 13.7442±0.0624plus-or-minus13.74420.062413.7442\pm 0.062413.7442 ± 0.0624
+GCN 0.4670±0.0017plus-or-minus0.46700.0017\mathbf{0.4670\pm 0.0017}bold_0.4670 ± bold_0.0017 0.6263±0.0016plus-or-minus0.62630.0016\mathbf{0.6263\pm 0.0016}bold_0.6263 ± bold_0.0016 0.0478±0.0004plus-or-minus0.04780.00040.0478\pm 0.00040.0478 ± 0.0004 13.7361±0.0559plus-or-minus13.73610.055913.7361\pm 0.055913.7361 ± 0.0559
+OMR 0.4662±0.0015plus-or-minus0.46620.00150.4662\pm 0.00150.4662 ± 0.0015 0.6257±0.0014plus-or-minus0.62570.00140.6257\pm 0.00140.6257 ± 0.0014 0.0474±0.0005plus-or-minus0.04740.0005\mathbf{0.0474\pm 0.0005}bold_0.0474 ± bold_0.0005 13.7321±0.0724plus-or-minus13.73210.072413.7321\pm 0.072413.7321 ± 0.0724
RETAIN Naive 0.5433±0.0023plus-or-minus0.54330.00230.5433\pm 0.00230.5433 ± 0.0023 0.6913±0.0019plus-or-minus0.69130.00190.6913\pm 0.00190.6913 ± 0.0019 0.0646±0.0006plus-or-minus0.06460.00060.0646\pm 0.00060.0646 ± 0.0006 17.0477±0.1014plus-or-minus17.04770.101417.0477\pm 0.101417.0477 ± 0.1014
+GAT 0.4264±0.0023plus-or-minus0.42640.00230.4264\pm 0.00230.4264 ± 0.0023 0.5871±0.0023plus-or-minus0.58710.00230.5871\pm 0.00230.5871 ± 0.0023 0.0554±0.0006plus-or-minus0.05540.0006\mathbf{0.0554\pm 0.0006}bold_0.0554 ± bold_0.0006 14.9248±0.0877plus-or-minus14.92480.087714.9248\pm 0.087714.9248 ± 0.0877
+GCN 0.4335±0.0020plus-or-minus0.43350.00200.4335\pm 0.00200.4335 ± 0.0020 0.5909±0.0018plus-or-minus0.59090.00180.5909\pm 0.00180.5909 ± 0.0018 0.0574±0.0006plus-or-minus0.05740.00060.0574\pm 0.00060.0574 ± 0.0006 16.4566±0.0917plus-or-minus16.45660.091716.4566\pm 0.091716.4566 ± 0.0917
+OMR 0.5536±0.0019plus-or-minus0.55360.0019\mathbf{0.5536\pm 0.0019}bold_0.5536 ± bold_0.0019 0.7001±0.0015plus-or-minus0.70010.0015\mathbf{0.7001\pm 0.0015}bold_0.7001 ± bold_0.0015 0.0642±0.0005plus-or-minus0.06420.00050.0642\pm 0.00050.0642 ± 0.0005 17.7567±0.0836plus-or-minus17.75670.083617.7567\pm 0.083617.7567 ± 0.0836
Table 4: Performance of selected models on the test set with few-shot medications.
Jaccard F1 DDI No. of drugs
Leap Naive 0.4328±0.0060plus-or-minus0.43280.00600.4328\pm 0.00600.4328 ± 0.0060 0.5981±0.0060plus-or-minus0.59810.00600.5981\pm 0.00600.5981 ± 0.0060 0.0522±0.0018plus-or-minus0.05220.00180.0522\pm 0.00180.0522 ± 0.0018 18.3266±0.1629plus-or-minus18.32660.162918.3266\pm 0.162918.3266 ± 0.1629
+GAT 0.3978±0.0064plus-or-minus0.39780.00640.3978\pm 0.00640.3978 ± 0.0064 0.5636±0.0067plus-or-minus0.56360.00670.5636\pm 0.00670.5636 ± 0.0067 0.0632±0.0000plus-or-minus0.06320.00000.0632\pm 0.00000.0632 ± 0.0000 20.0000±0.0000plus-or-minus20.00000.000020.0000\pm 0.000020.0000 ± 0.0000
+GCN 0.3451±0.0046plus-or-minus0.34510.00460.3451\pm 0.00460.3451 ± 0.0046 0.5083±0.0051plus-or-minus0.50830.00510.5083\pm 0.00510.5083 ± 0.0051 0.0769±0.0000plus-or-minus0.07690.00000.0769\pm 0.00000.0769 ± 0.0000 13.0000±0.0000plus-or-minus13.00000.000013.0000\pm 0.000013.0000 ± 0.0000
+OMR 0.4341±0.0071plus-or-minus0.43410.0071\mathbf{0.4341\pm 0.0071}bold_0.4341 ± bold_0.0071 0.5986±0.0071plus-or-minus0.59860.0071\mathbf{0.5986\pm 0.0071}bold_0.5986 ± bold_0.0071 0.0571±0.0023plus-or-minus0.05710.0023\mathbf{0.0571\pm 0.0023}bold_0.0571 ± bold_0.0023 17.4800±0.2528plus-or-minus17.48000.252817.4800\pm 0.252817.4800 ± 0.2528
SafeDrug Naive 0.5141±0.0083plus-or-minus0.51410.00830.5141\pm 0.00830.5141 ± 0.0083 0.6705±0.0074plus-or-minus0.67050.00740.6705\pm 0.00740.6705 ± 0.0074 0.0577±0.0007plus-or-minus0.05770.00070.0577\pm 0.00070.0577 ± 0.0007 23.8330±0.5112plus-or-minus23.83300.511223.8330\pm 0.511223.8330 ± 0.5112
+GAT 0.5044±0.0054plus-or-minus0.50440.00540.5044\pm 0.00540.5044 ± 0.0054 0.6618±0.0049plus-or-minus0.66180.00490.6618\pm 0.00490.6618 ± 0.0049 0.0564±0.0010plus-or-minus0.05640.00100.0564\pm 0.00100.0564 ± 0.0010 23.8990±0.2941plus-or-minus23.89900.294123.8990\pm 0.294123.8990 ± 0.2941
+GCN 0.4940±0.0082plus-or-minus0.49400.00820.4940\pm 0.00820.4940 ± 0.0082 0.6517±0.0079plus-or-minus0.65170.00790.6517\pm 0.00790.6517 ± 0.0079 0.0562±0.0009plus-or-minus0.05620.0009\mathbf{0.0562\pm 0.0009}bold_0.0562 ± bold_0.0009 24.6039±0.4459plus-or-minus24.60390.445924.6039\pm 0.445924.6039 ± 0.4459
+OMR 0.5206±0.0071plus-or-minus0.52060.0071\mathbf{0.5206\pm 0.0071}bold_0.5206 ± bold_0.0071 0.6769±0.0062plus-or-minus0.67690.0062\mathbf{0.6769\pm 0.0062}bold_0.6769 ± bold_0.0062 0.0587±0.0011plus-or-minus0.05870.00110.0587\pm 0.00110.0587 ± 0.0011 24.7879±0.4228plus-or-minus24.78790.422824.7879\pm 0.422824.7879 ± 0.4228
MICRON Naive 0.4849±0.0099plus-or-minus0.48490.00990.4849\pm 0.00990.4849 ± 0.0099 0.6411±0.0101plus-or-minus0.64110.01010.6411\pm 0.01010.6411 ± 0.0101 0.0615±0.0018plus-or-minus0.06150.00180.0615\pm 0.00180.0615 ± 0.0018 20.5828±0.6436plus-or-minus20.58280.643620.5828\pm 0.643620.5828 ± 0.6436
+GAT 0.4626±0.0061plus-or-minus0.46260.00610.4626\pm 0.00610.4626 ± 0.0061 0.6215±0.0067plus-or-minus0.62150.00670.6215\pm 0.00670.6215 ± 0.0067 0.0583±0.0023plus-or-minus0.05830.0023\mathbf{0.0583\pm 0.0023}bold_0.0583 ± bold_0.0023 19.9857±0.9167plus-or-minus19.98570.916719.9857\pm 0.916719.9857 ± 0.9167
+GCN 0.4660±0.0062plus-or-minus0.46600.00620.4660\pm 0.00620.4660 ± 0.0062 0.6240±0.0052plus-or-minus0.62400.00520.6240\pm 0.00520.6240 ± 0.0052 0.0628±0.0010plus-or-minus0.06280.00100.0628\pm 0.00100.0628 ± 0.0010 20.0187±0.6505plus-or-minus20.01870.650520.0187\pm 0.650520.0187 ± 0.6505
+OMR 0.4876±0.0094plus-or-minus0.48760.0094\mathbf{0.4876\pm 0.0094}bold_0.4876 ± bold_0.0094 0.6428±0.0087plus-or-minus0.64280.0087\mathbf{0.6428\pm 0.0087}bold_0.6428 ± bold_0.0087 0.0622±0.0012plus-or-minus0.06220.00120.0622\pm 0.00120.0622 ± 0.0012 20.6576±0.5616plus-or-minus20.65760.561620.6576\pm 0.561620.6576 ± 0.5616
4SDrug Naive 0.4310±0.0071plus-or-minus0.43100.00710.4310\pm 0.00710.4310 ± 0.0071 0.5953±0.0064plus-or-minus0.59530.00640.5953\pm 0.00640.5953 ± 0.0064 0.0385±0.0020plus-or-minus0.03850.00200.0385\pm 0.00200.0385 ± 0.0020 14.4564±0.2999plus-or-minus14.45640.299914.4564\pm 0.299914.4564 ± 0.2999
+GAT 0.4298±0.0064plus-or-minus0.42980.00640.4298\pm 0.00640.4298 ± 0.0064 0.5947±0.0062plus-or-minus0.59470.00620.5947\pm 0.00620.5947 ± 0.0062 0.0385±0.0016plus-or-minus0.03850.00160.0385\pm 0.00160.0385 ± 0.0016 14.3754±0.3247plus-or-minus14.37540.324714.3754\pm 0.324714.3754 ± 0.3247
+GCN 0.4304±0.0068plus-or-minus0.43040.00680.4304\pm 0.00680.4304 ± 0.0068 0.5949±0.0065plus-or-minus0.59490.00650.5949\pm 0.00650.5949 ± 0.0065 0.0380±0.0023plus-or-minus0.03800.00230.0380\pm 0.00230.0380 ± 0.0023 14.2446±0.3319plus-or-minus14.24460.331914.2446\pm 0.331914.2446 ± 0.3319
+OMR 0.4316±0.0079plus-or-minus0.43160.0079\mathbf{0.4316\pm 0.0079}bold_0.4316 ± bold_0.0079 0.5961±0.0075plus-or-minus0.59610.0075\mathbf{0.5961\pm 0.0075}bold_0.5961 ± bold_0.0075 0.0379±0.0018plus-or-minus0.03790.0018\mathbf{0.0379\pm 0.0018}bold_0.0379 ± bold_0.0018 14.4943±0.3259plus-or-minus14.49430.325914.4943\pm 0.325914.4943 ± 0.3259
Retain Naive 0.5057±0.0077plus-or-minus0.50570.00770.5057\pm 0.00770.5057 ± 0.0077 0.6615±0.0069plus-or-minus0.66150.00690.6615\pm 0.00690.6615 ± 0.0069 0.0608±0.0018plus-or-minus0.06080.00180.0608\pm 0.00180.0608 ± 0.0018 19.7118±0.6843plus-or-minus19.71180.684319.7118\pm 0.684319.7118 ± 0.6843
+GAT 0.3650±0.0076plus-or-minus0.36500.00760.3650\pm 0.00760.3650 ± 0.0076 0.5283±0.0081plus-or-minus0.52830.00810.5283\pm 0.00810.5283 ± 0.0081 0.0585±0.0014plus-or-minus0.05850.0014\mathbf{0.0585\pm 0.0014}bold_0.0585 ± bold_0.0014 16.1552±0.2670plus-or-minus16.15520.267016.1552\pm 0.267016.1552 ± 0.2670
+GCN 0.3637±0.0069plus-or-minus0.36370.00690.3637\pm 0.00690.3637 ± 0.0069 0.5263±0.0076plus-or-minus0.52630.00760.5263\pm 0.00760.5263 ± 0.0076 0.0700±0.0011plus-or-minus0.07000.00110.0700\pm 0.00110.0700 ± 0.0011 18.2791±0.3432plus-or-minus18.27910.343218.2791\pm 0.343218.2791 ± 0.3432
+OMR 0.5229±0.0089plus-or-minus0.52290.0089\mathbf{0.5229\pm 0.0089}bold_0.5229 ± bold_0.0089 0.6780±0.0078plus-or-minus0.67800.0078\mathbf{0.6780\pm 0.0078}bold_0.6780 ± bold_0.0078 0.0609±0.0015plus-or-minus0.06090.00150.0609\pm 0.00150.0609 ± 0.0015 20.6692±0.6741plus-or-minus20.66920.674120.6692\pm 0.674120.6692 ± 0.6741

5.2.1 Results on the entire MIMIC-III dataset

As we can observe from Table 3, integrating the representations of diagnoses, procedures and medications (where possible) from OntoMedRec can improve the performance of most selected medication recommendation models in the entire dataset compared to all baselines. This demonstrates the representation of OntoMedRec is model-agnostic for downstream medication recommendation models.

5.2.2 Results on the few-shot cases

The representations of OntoMedRec improve all compared downstream models in the few-shot medication test set. We can observe from the result that the representations improve the performance for all compared longitudinal models. For compared instance-based models, although the representations of OntoMedRec do not improve their performance by a large margin, we notice that 1) they have lower performance scores compared to selected longitudinal models, and 2) the performance after the integration of OntoMedRec is not lower than the best performance by a large margin. We speculate the reason is that instance-based models adopted fewer pretrained embeddings comparing to longitudinal models.

5.2.3 Further Investigation of Sparse Scenarios

To further investigate how few-shot medications affect the performance of medication recommendation models with OntoMedRec, we further compare OntoMedRec representations and randomly initialised embedding table with different sparse settings starting with the lowest frequency being 20% (as visualised in Fig.3). The representation of OntoMedRec can improve the performance of longitudinal downstream models in all three test sets. Overall, the performance gap between models with OntoMedRec pretraining and models without pretraining is larger when the data is sparser (20% is the sparsest scenario) which shows that medication recommendation models can benefit more from OntoMedRec in sparser scenarios. For LEAP, OntoMedRec can improve its performance for the entire testing set and the test set with the 20%-least-frequent medications. For 4SDrug, the performance margin is small.

Refer to caption
Figure 3: Performance of randomly initialised embedding table and OntoMedRec embedding table in different downstream models and tail percentages

6 Conclusion

In this paper, we proposed OntoMedRec, the self-supervised, logically-pretrained model-agnostic ontology encoders for medication recommendation. We devise axioms that collectively define the structure of medical ontologies, and use logical tensor networks (LTNs) to maximise the satisfiability of the representations. Furthermore, we align the representations of diagnoses and medications with medication indication information. The ontology-enhanced representation can be integrated into various downstream medication recommendation models to alleviate the negative effect brought by the data sparsity issue. We conducted experiments to evaluate the efficacy of OntoMedRec. Results show that the representation of OntoMedRec can improve the performance of most selected models in the entire testing dataset, and that it can improve the performance of all longitudinal models in the few-shot medications test set.

References

\bibcommenthead
  • Yu et al. [2018] Yu, K.-H., Beam, A.L., Kohane, I.S.: Artificial intelligence in healthcare. Nature biomedical engineering 2(10), 719–731 (2018)
  • Tan et al. [2022] Tan, Y., Kong, C., Yu, L., Li, P., Chen, C., Zheng, X., Hertzberg, V.S., Yang, C.: 4sdrug: Symptom-based set-to-set small and safe drug recommendation. KDD ’22, pp. 3970–3980. Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3534678.3539089
  • Wu et al. [2022] Wu, Z., Yao, H., Su, Z., Liebovitz, D.M., Glass, L.M., Zou, J., Finn, C., Sun, J.: Knowledge-driven new drug recommendation. arXiv preprint arXiv:2210.05572 (2022)
  • Yang et al. [2021] Yang, C., Xiao, C., Ma, F., Glass, L., Sun, J.: Safedrug: Dual molecular graph encoders for recommending effective and safe drug combinations. In: Zhou, Z.-H. (ed.) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 3735–3741 (2021). https://doi.org/10.24963/ijcai.2021/514 . Main Track. https://doi.org/10.24963/ijcai.2021/514
  • Wu et al. [2022] Wu, R., Qiu, Z., Jiang, J., Qi, G., Wu, X.: Conditional generation net for medication recommendation. In: Proceedings of the ACM Web Conference 2022. WWW ’22, pp. 935–945. Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3485447.3511936
  • Wang et al. [2021] Wang, Y., Chen, W., PI, D., Yue, L., Wang, S., Xu, M.: Self-supervised adversarial distribution regularization for medication recommendation. In: Zhou, Z.-H. (ed.) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 3134–3140 (2021). https://doi.org/10.24963/ijcai.2021/431 . Main Track
  • Zhang et al. [2017] Zhang, Y., Chen, R., Tang, J., Stewart, W.F., Sun, J.: Leap: Learning to prescribe effective and safe treatment combinations for multimorbidity. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’17, pp. 1315–1324. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3097983.3098109
  • Yang et al. [2021] Yang, C., Xiao, C., Glass, L., Sun, J.: Change matters: Medication change prediction with recurrent residual networks. In: Zhou, Z.-H. (ed.) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 3728–3734 (2021). Main Track
  • Yin et al. [2022] Yin, H., Wang, Q., Zheng, K., Li, Z., Zhou, X.: Overcoming data sparsity in group recommendation. IEEE Transactions on Knowledge and Data Engineering 34(7), 3447–3460 (2022) https://doi.org/10.1109/TKDE.2020.3023787
  • Lin et al. [2023] Lin, R., Tang, F., He, C., Wu, Z., Yuan, C., Tang, Y.: Dirs-kg: a kg-enhanced interactive recommender system based on deep reinforcement learning. World Wide Web 26(5), 2471–2493 (2023) https://doi.org/10.1007/s11280-022-01135-x
  • Yang et al. [2020] Yang, N., Ma, Y., Chen, L., Yu, P.S.: A meta-feature based unified framework for both cold-start and warm-start explainable recommendations. World Wide Web 23(1), 241–265 (2020) https://doi.org/10.1007/s11280-019-00683-z
  • Zhong et al. [2020] Zhong, T., Zhang, S., Zhou, F., Zhang, K., Trajcevski, G., Wu, J.: Hybrid graph convolutional networks with multi-head attention for location recommendation. World Wide Web 23(6), 3125–3151 (2020) https://doi.org/10.1007/s11280-020-00824-9
  • Shang et al. [2019] Shang, J., Ma, T., Xiao, C., Sun, J.: Pre-training of graph augmented transformers for medication recommendation. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 5953–5959 (2019)
  • An et al. [2023] An, Y., Tang, H., **, B., Xu, Y., Wei, X.: Kampnet: multi-source medical knowledge augmented medication prediction network with multi-level graph contrastive learning. BMC Medical Informatics and Decision Making 23(1), 1–19 (2023)
  • Veličković et al. [2018] Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=rJXMpikCZ
  • Choi et al. [2017] Choi, E., Bahadori, M.T., Song, L., Stewart, W.F., Sun, J.: Gram: Graph-based attention model for healthcare representation learning. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’17, pp. 787–795. Association for Computing Machinery, New York, NY, USA (2017)
  • Ma et al. [2018] Ma, F., You, Q., Xiao, H., Chitta, R., Zhou, J., Gao, J.: Kame: Knowledge-based attention model for diagnosis prediction in healthcare. CIKM ’18, pp. 743–752. Association for Computing Machinery, New York, NY, USA (2018)
  • Badreddine et al. [2022] Badreddine, S., Garcez, A.d., Serafini, L., Spranger, M.: Logic tensor networks. Artificial Intelligence 303, 103649 (2022)
  • Bianchi and Hitzler [2019] Bianchi, F., Hitzler, P.: On the capabilities of logic tensor networks for deductive reasoning. In: AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering (2019)
  • Wei et al. [2013] Wei, W.-Q., Mosley, J.D., Bastarache, L., Denny, J.C.: Validation and enhancement of a computable medication indication resource (medi) using a large practice-based dataset. In: AMIA Annual Symposium Proceedings, vol. 2013, p. 1448 (2013). American Medical Informatics Association
  • Gong et al. [2021] Gong, F., Wang, M., Wang, H., Wang, S., Liu, M.: Smr: medical knowledge graph embedding for safe medicine recommendation. Big Data Research 23, 100174 (2021)
  • Choi et al. [2016] Choi, E., Bahadori, M.T., Sun, J., Kulas, J., Schuetz, A., Stewart, W.: Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. Advances in neural information processing systems 29 (2016)
  • Le et al. [2018] Le, H., Tran, T., Venkatesh, S.: Dual control memory augmented neural networks for treatment recommendations. In: Advances in Knowledge Discovery and Data Mining, pp. 273–284. Springer, Cham (2018)
  • Shang et al. [2019] Shang, J., Xiao, C., Ma, T., Li, H., Sun, J.: Gamenet: Graph augmented memory networks for recommending medication combination. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 1126–1133 (2019)
  • Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
  • Zhang et al. [2021] Zhang, S., Li, J., Zhou, H., Zhu, Q., Zhang, S., Wang, D.: Merits: Medication recommendation for chronic disease with irregular time-series. In: 2021 IEEE International Conference on Data Mining (ICDM), pp. 1481–1486 (2021). https://doi.org/10.1109/ICDM51629.2021.00192
  • Yao et al. [2023] Yao, Z., Liu, B., Wang, F., Sow, D., Li, Y.: Ontology-aware prescription recommendation in treatment pathways using multi-evidence healthcare data. ACM Transactions on Information Systems (2023)
  • Wagner et al. [2006] Wagner, M.M., Hogan, W.R., Chapman, W.W., Gesteland, P.H.: Chief complaints and icd codes. Handbook of biosurveillance, 333 (2006)
  • Whetzel et al. [2011] Whetzel, P.L., Noy, N.F., Shah, N.H., Alexander, P.R., Nyulas, C., Tudorache, T., Musen, M.A.: Bioportal: enhanced functionality via new web services from the national center for biomedical ontology to access and use ontologies in software applications. Nucleic acids research 39(suppl_2), 541–545 (2011)
  • Johnson et al. [2016] Johnson, A.E., Pollard, T.J., Shen, L., Lehman, L.-w.H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Anthony Celi, L., Mark, R.G.: Mimic-iii, a freely accessible critical care database. Scientific data 3(1), 1–9 (2016)
  • Tatonetti et al. [2012] Tatonetti, N.P., Ye, P.P., Daneshjou, R., Altman, R.B.: Data-driven prediction of drug effects and interactions. Science translational medicine 4(125), 125–3112531 (2012)
  • 7 Declarations

    7.1 Ethical Approval

    not applicable

    7.2 Funding

    This paper is funded by the Graduate Research Industry Partnership (GRIP) program. More information can be found here: https://www.monash.edu/msdi/study/graduate-research-program/engagement-opportunities/graduate-research-industry-partnership-grip-program.

    7.3 Availability of Data and Materials

    We use the ATC ontology for medications and the ICD9-CM ontology for diagnoses and procedures from BioPortal [29] to pretrain OntoMedRec. We use TWOSIDES dataset [31] as the ground truth of drug-drug interactions (DDIs). These three datasets are all publicly available. We use the benchmark dataset MIMIC-III [30] to fine-tune and evaluate the perfor- mance of downstream models integrated with the representations of OntoMedRec and other baselines. It is a free-to-use large medical records dataset that can be accessed here: https://physionet.org/content/mimiciii/1.4/. It can be accessed by credentialed users upon the completion of online ethics training and the signing of the user agreement.