WISER: Weak supervISion and supErvised Representation learning to improve drug response prediction in cancer

Kumar Shubham    Aishwarya Jayagopal    Syed Mohammed Danish    Prathosh AP    Vaibhav Rajan
Abstract

Cancer, a leading cause of death globally, occurs due to genomic changes and manifests heterogeneously across patients. To advance research on personalized treatment strategies, the effectiveness of various drugs on cells derived from cancers (‘cell lines’) is experimentally determined in laboratory settings. Nevertheless, variations in the distribution of genomic data and drug responses between cell lines and humans arise due to biological and environmental differences. Moreover, while genomic profiles of many cancer patients are readily available, the scarcity of corresponding drug response data limits the ability to train machine learning models that can predict drug response in patients effectively. Recent cancer drug response prediction methods have largely followed the paradigm of unsupervised domain-invariant representation learning followed by a downstream drug response classification step. Introducing supervision in both stages is challenging due to heterogeneous patient response to drugs and limited drug response data. This paper addresses these challenges through a novel representation learning method in the first phase and weak supervision in the second. Experimental results on real patient data demonstrate the efficacy of our method (WISER) over state-of-the-art alternatives on predicting personalized drug response. Our implementation is available at https://github.com/kyrs/WISER


1 Introduction

Cancer is a major cause of global morbidity and mortality (WHO, 2022). Cancer develops due to changes in our genome, which enable cancer cells to gain a selective advantage over healthy cells, resulting in uncontrolled proliferation as a cancerous tumour. Significant variability in treatment sensitivity and outcomes among patients makes cancer treatment difficult (Bedard et al., 2013). Hence, cancer care is transitioning from a ‘one-size-fits-all’ approach to a more personalized strategy guided by patient-specific genomic characteristics (Wahida et al., 2023).

To aid therapeutic development, there have been large-scale global efforts, e.g., through The Cancer Genome Atlas (TCGA) database (Hutter & Zenklusen, 2018), to catalog high-dimensional genomic information (𝒳𝒳{\mathcal{X}}caligraphic_X) of cancer patients. However, patient drug response data [𝒴tdi(𝒳)subscriptsuperscript𝒴subscript𝑑𝑖𝑡𝒳{\mathcal{Y}}^{d_{i}}_{t}({\mathcal{X}})caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( caligraphic_X ) for drug disubscript𝑑𝑖d_{i}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT] is scarce due to limited number of patients, with only a few drugs administered on each patient (Sharifi-Noghabi et al., 2021). This has motivated researchers to explore preclinical datasets – e.g., cell lines, comprising cells extracted from patient cancers, which can be cloned in a way that the same genomic information is replicated across them. Such clones can be exposed to different drugs to obtain drug response information 𝒴cdi(𝒳)subscriptsuperscript𝒴subscript𝑑𝑖𝑐𝒳{\mathcal{Y}}^{d_{i}}_{c}({\mathcal{X}})caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X ) for multiple disubscript𝑑𝑖d_{i}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT on the same 𝒳𝒳{\mathcal{X}}caligraphic_X. This data is immensely useful and cannot be directly obtained from patients, who cannot be subjected to multiple drug regimens simultaneously. While such fine-grained drug response data is only available for a limited number of cell lines (similar-to\sim 1000) and drugs, it provides a valuable starting point to build personalized drug response models based on genomic information.

However, previous studies have shown that such cell line-based response models do not accurately predict drug efficacy in patients due to several reasons (Seyhan, 2019). Cell line data (𝒳csubscript𝒳𝑐{\mathcal{X}}_{c}caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT) is more homogeneous than patient cancer cells (𝒳tsubscript𝒳𝑡{\mathcal{X}}_{t}caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT) and the environments in which they reside are different. This results in differences in the distributions (P𝑃Pitalic_P) of genomic information across cell lines and patients (P(𝒳c)P(𝒳t))𝑃subscript𝒳𝑐𝑃subscript𝒳𝑡\big{(}P({\mathcal{X}}_{c})\neq P({\mathcal{X}}_{t})\big{)}( italic_P ( caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ≠ italic_P ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ), and they can be considered as different domains (See Appendix B). Further, within the human body, in addition to the genomic structure, several other factors (e.g., the immune system) play a role in drug response. Thus, the drug response functions are different across cell lines and patients (i.e., 𝒴tdi(.)𝒴cdi(.){\mathcal{Y}}^{d_{i}}_{t}(.)\neq{\mathcal{Y}}^{d_{i}}_{c}(.)caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( . ) ≠ caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( . )).

To address these challenges, several domain adaptation and transfer learning-based drug response models, that use a combination of cell line and patient data, have been developed. These methods generally consist of two stages: (1) an unsupervised representation learning phase where domain-invariant representations of genomic data are learned and, (2) a classification phase where these representations are used to train a drug response prediction model by categorizing responses as positive or negative based on the drug’s impact on inhibiting cancer growth. The classifier is trained using labeled data and used to predict drug response in patients.

Unsupervised representation learning approaches used by extant methods do not consider the drug response information (𝒴cdi(𝒳c)subscriptsuperscript𝒴subscript𝑑𝑖𝑐subscript𝒳𝑐{\mathcal{Y}}^{d_{i}}_{c}({\mathcal{X}}_{c})caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT )) associated with genomic profiles in cell lines, and hence do not distinguish between responders and non-responders to drugs. Supervised contrastive learning approaches (Barbano et al., 2022; Khosla et al., 2020; Graf et al., 2021; Hermans et al., 2017; Schroff et al., 2015; Lee et al., 2021) can address this by bringing the representations (𝒵𝒵{\mathcal{Z}}caligraphic_Z) of data points with similar class labels closer together i.e., 𝒵(𝒳cm)𝒵(𝒳cn)similar-to𝒵subscriptsuperscript𝒳𝑚𝑐𝒵subscriptsuperscript𝒳𝑛𝑐{\mathcal{Z}}({\mathcal{X}}^{m}_{c})\sim{\mathcal{Z}}({\mathcal{X}}^{n}_{c})caligraphic_Z ( caligraphic_X start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ∼ caligraphic_Z ( caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) if 𝒴cdi(𝒳cm)=𝒴cdi(𝒳cn)subscriptsuperscript𝒴subscript𝑑𝑖𝑐subscriptsuperscript𝒳𝑚𝑐subscriptsuperscript𝒴subscript𝑑𝑖𝑐subscriptsuperscript𝒳𝑛𝑐{\mathcal{Y}}^{d_{i}}_{c}({\mathcal{X}}^{m}_{c})={\mathcal{Y}}^{d_{i}}_{c}({% \mathcal{X}}^{n}_{c})caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) = caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ), emphasizing their shared characteristics over dissimilar classes. However, genomic profiles that respond to one drug may behave differently for another i.e, for drug disubscript𝑑𝑖d_{i}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, 𝒴cdi(𝒳cm)=𝒴cdi(𝒳cn)subscriptsuperscript𝒴subscript𝑑𝑖𝑐subscriptsuperscript𝒳𝑚𝑐subscriptsuperscript𝒴subscript𝑑𝑖𝑐subscriptsuperscript𝒳𝑛𝑐{\mathcal{Y}}^{d_{i}}_{c}({\mathcal{X}}^{m}_{c})={\mathcal{Y}}^{d_{i}}_{c}({% \mathcal{X}}^{n}_{c})caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) = caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) but for drug dksubscript𝑑𝑘d_{k}italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, 𝒴cdk(𝒳cm)𝒴cdk(𝒳cn)subscriptsuperscript𝒴subscript𝑑𝑘𝑐subscriptsuperscript𝒳𝑚𝑐subscriptsuperscript𝒴subscript𝑑𝑘𝑐subscriptsuperscript𝒳𝑛𝑐{\mathcal{Y}}^{d_{k}}_{c}({\mathcal{X}}^{m}_{c})\neq{\mathcal{Y}}^{d_{k}}_{c}(% {\mathcal{X}}^{n}_{c})caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ≠ caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ). Hence, for drug disubscript𝑑𝑖d_{i}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, 𝒵(𝒳cm)𝒵(𝒳cn)similar-to𝒵subscriptsuperscript𝒳𝑚𝑐𝒵subscriptsuperscript𝒳𝑛𝑐{\mathcal{Z}}({\mathcal{X}}^{m}_{c})\sim{\mathcal{Z}}({\mathcal{X}}^{n}_{c})caligraphic_Z ( caligraphic_X start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ∼ caligraphic_Z ( caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) but for drug dksubscript𝑑𝑘d_{k}italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, 𝒵(𝒳cm)𝒵(𝒳cn)not-similar-to𝒵subscriptsuperscript𝒳𝑚𝑐𝒵subscriptsuperscript𝒳𝑛𝑐{\mathcal{Z}}({\mathcal{X}}^{m}_{c})\nsim{\mathcal{Z}}({\mathcal{X}}^{n}_{c})caligraphic_Z ( caligraphic_X start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ≁ caligraphic_Z ( caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ). Further, limited patient data with documented drug response makes it difficult to find genomic profiles with similar efficacy across multiple drugs. These difficulties, in turn, limit the ability to use standard supervised contrastive learning methods to bring the representations of genomic profiles closer together. Our study addresses this challenge by learning a discrete representation per drug ({\mathcal{R}}caligraphic_R(Van Den Oord et al., 2017) and representing each genomic profile as a weighted combination (𝒵=𝒲𝒵𝒲{\mathcal{Z}}=\sum{\mathcal{W}}{\mathcal{R}}caligraphic_Z = ∑ caligraphic_W caligraphic_R). To ensure that 𝒵𝒵{\mathcal{Z}}caligraphic_Z is simultaneously reflective of the responses from multiple drugs, we increase the weights of drugs with positive response compared to those with negative response, through a supervised triplet loss (Hermans et al., 2017; Schroff et al., 2015; Barbano et al., 2022).

It is worth noting that while there is scarcity of labeled data in both domains, relatively abundant unlabeled patient data is available (See Appendix B). While prior studies have leveraged unlabeled patient data for learning domain-invariant representations, the training of drug response prediction classifiers has predominantly relied on the cell line dataset due to insufficient labeled response data for patients. Techniques like weak supervision can be employed to generate pseudo-labels for the abundant unlabeled data. However, naïvely using all pseudo-labeled samples does not improve performance (Lang et al., 2022; Shubham et al., 2023) (also seen in our results). In fact, there exists a trade-off between the noise introduced in the downstream classifier due to pseudo labels and the generalization it achieves when trained in a weak supervision setting (Lang et al., 2022). To address this, we introduce a subset selection step (Lang et al., 2022; Shubham et al., 2023), which to our knowledge is novel in this context and helps boost performance. We employ majority-vote-based weak supervision techniques (Ratner et al., 2017; Zhang et al., 2022) to create pseudo labels for patient genomic profiles without documented drug response, followed by a subset selection strategy (Muhlenbach et al., 2004). This subset is combined with labeled cell line data to train the drug response prediction classifier.

Our contributions can be summarized as follows:

  • We design a new supervised domain-invariant representation learning approach which offers better distinction between drug responders and non-responders by addressing the challenges of limited sample size and heterogeneous drug response of genomic profiles.

  • We propose a novel strategy that carefully selects a subset of least noisy pseudo-labeled patient data for classifier training on the domain-invariant representations.

  • Using these techniques we propose a new method, called WISER, to estimate drug response for patients using unlabeled patient data and a small set of labeled cell line data.

  • Our experiments on benchmark datasets demonstrate the superiority of WISER over state-of-the-art methods for drug response prediction, with improvements of up to 15.7% in AUROC.

  • The most important features (genes) responsible for pseudo-labeling patient samples in the selected subsets correlates well with independent clinical evidence based on gene-drug interactions that impact patient survival, which further validates our pseudo-labeling approach.

2 Related Work

2.1 Drug Response Prediction

Prior literature on drug response prediction in patients has primarily focused on transfer learning (Pan & Yang, 2009). These approaches are useful when the target domain (patients) has limited samples, and a related source domain (cell lines) has more labeled samples. Transductive transfer learning methods (Bousmalis et al., 2016; Sharifi-Noghabi et al., 2021; Sun & Saenko, 2016) use labeled source domain samples for drug response prediction but often build one model per drug and, thus, lack correlations across drugs. Inductive transfer learning methods (Sharifi-Noghabi et al., 2020; Ma et al., 2021) utilize few-shot and multi-task learning on available labeled patient samples but exhibit inferior performance to other approaches. Recent methods, like CODE-AE (He et al., 2022) learn shared representations using unlabeled genomic profiles from both domains.

Among the extant approaches, CODE-AE (He et al., 2022) has demonstrated superior predictive accuracy and robustness through extensive benchmark studies. CODE-AE is trained in two stages: (1) unsupervised pretraining of autoencoders to learn both domain-specific private and domain-invariant shared representations and (2) downstream drug response prediction based on the learned shared representations. A key shortcoming of this approach is that the representations learnt do not factor in the downstream drug response prediction task. Further, they do not utilize a large number of unlabeled patient genomic profiles in the downstream drug response prediction. Our proposed method, WISER, can handle these shortcomings and differs from CODE-AE in two aspects - (1) we incorporate drug response information of cell lines through supervised domain-invariant representation learning, and (2) we also utilize the available unlabeled patient genomic profiles through weak supervision techniques followed by subset selection.

2.2 Weak Supervision and Subset Selection

Weak supervision techniques (Ratner et al., 2016) are designed to address the challenge of limited data size. They leverage information from various sources (label functions), such as data from different domains (Mazzetto et al., 2021; Zhang et al., 2022), to generate cost-effective but noisy labels for unlabeled data. To further enhance the accuracy of the estimation process, confident predictions from different sources of pseudo labels are systematically combined, through weighing or voting schemes (Ratner et al., 2016; Dawid & Skene, 1979; Fu et al., 2020). For a smaller set of label functions, a majority vote-based scheme (Ratner et al., 2017) outperforms weighing techniques.

In addition, recent works (Lang et al., 2022; Shubham et al., 2023), in weak supervision, have shown that a subset of original data can generate optimal results compared to the use of the entire pseudo-labeled dataset. In fact, there exists a trade-off between the generalization achieved by the classifier and the noise introduced by the pseudo labels. Previous studies on subset selection have primarily concentrated on natural language tasks and employed pre-trained word embeddings (Kenton & Toutanova, 2019). However, the application of these on cancer research remains unexplored.

3 Proposed Method

3.1 Problem Formulation and Solution Overview

Problem Definition: Let us assume that there are 𝒩csubscript𝒩𝑐{\mathcal{N}}_{c}caligraphic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT labeled samples of genomic profiles associated with cell lines (𝒢cellsubscript𝒢𝑐𝑒𝑙𝑙{\mathcal{G}}_{cell}caligraphic_G start_POSTSUBSCRIPT italic_c italic_e italic_l italic_l end_POSTSUBSCRIPT) and 𝒩tsubscript𝒩𝑡{\mathcal{N}}_{t}caligraphic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT unlabeled samples of genomic profiles from patients (𝒢patientsubscript𝒢𝑝𝑎𝑡𝑖𝑒𝑛𝑡{\mathcal{G}}_{patient}caligraphic_G start_POSTSUBSCRIPT italic_p italic_a italic_t italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT). In this work, although we focus on gene expression profiles, our method can also be applied to other omics data types, such as mutations. In general, 𝒩c<<𝒩tmuch-less-thansubscript𝒩𝑐subscript𝒩𝑡{\mathcal{N}}_{c}<<{\mathcal{N}}_{t}caligraphic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT < < caligraphic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Let {d1,d2dn}subscript𝑑1subscript𝑑2subscript𝑑𝑛\{d_{1},d_{2}\ldots d_{n}\}{ italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT … italic_d start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } be the set of n𝑛nitalic_n drugs with documented drug response for 𝒢cellsubscript𝒢𝑐𝑒𝑙𝑙{\mathcal{G}}_{cell}caligraphic_G start_POSTSUBSCRIPT italic_c italic_e italic_l italic_l end_POSTSUBSCRIPT and 𝒴cdi(𝒳cj)subscriptsuperscript𝒴subscript𝑑𝑖𝑐subscriptsuperscript𝒳𝑗𝑐{\mathcal{Y}}^{d_{i}}_{c}\big{(}{\mathcal{X}}^{j}_{c}\big{)}caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) be the corresponding response of a drug (disubscript𝑑𝑖d_{i}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) to a genomic profile 𝒳cj𝒢cellsubscriptsuperscript𝒳𝑗𝑐subscript𝒢𝑐𝑒𝑙𝑙{\mathcal{X}}^{j}_{c}\in{\mathcal{G}}_{cell}caligraphic_X start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∈ caligraphic_G start_POSTSUBSCRIPT italic_c italic_e italic_l italic_l end_POSTSUBSCRIPT and 𝒴tdi(𝒳tm)subscriptsuperscript𝒴subscript𝑑𝑖𝑡subscriptsuperscript𝒳𝑚𝑡{\mathcal{Y}}^{d_{i}}_{t}\big{(}{\mathcal{X}}^{m}_{t}\big{)}caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) be the drug response for patients 𝒳tm𝒢patientsubscriptsuperscript𝒳𝑚𝑡subscript𝒢𝑝𝑎𝑡𝑖𝑒𝑛𝑡{\mathcal{X}}^{m}_{t}\in{\mathcal{G}}_{patient}caligraphic_X start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_G start_POSTSUBSCRIPT italic_p italic_a italic_t italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT. Note that 𝒴cdi(𝒳cj){1,0,1}subscriptsuperscript𝒴subscript𝑑𝑖𝑐subscriptsuperscript𝒳𝑗𝑐101{\mathcal{Y}}^{d_{i}}_{c}\big{(}{\mathcal{X}}^{j}_{c}\big{)}\in\{-1,0,1\}caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ∈ { - 1 , 0 , 1 } where 1 indicates a positive response of a genomic profile 𝒳cjsubscriptsuperscript𝒳𝑗𝑐{\mathcal{X}}^{j}_{c}caligraphic_X start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT to drug disubscript𝑑𝑖d_{i}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, 0 indicates a negative response to the drug disubscript𝑑𝑖d_{i}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and -1 represents that the response data is not available. The main objective of our work is to use the labeled cell line data (𝒢cell,𝒴cdisubscript𝒢𝑐𝑒𝑙𝑙subscriptsuperscript𝒴subscript𝑑𝑖𝑐{\mathcal{G}}_{cell},{\mathcal{Y}}^{d_{i}}_{c}caligraphic_G start_POSTSUBSCRIPT italic_c italic_e italic_l italic_l end_POSTSUBSCRIPT , caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT) for n drugs {d1,d2dn}subscript𝑑1subscript𝑑2subscript𝑑𝑛\{d_{1},d_{2}\ldots d_{n}\}{ italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT … italic_d start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } and the unlabeled patient genomic profile (𝒢patientsubscript𝒢𝑝𝑎𝑡𝑖𝑒𝑛𝑡{\mathcal{G}}_{patient}caligraphic_G start_POSTSUBSCRIPT italic_p italic_a italic_t italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT) to estimate drug response for patients (𝒴tdisubscriptsuperscript𝒴subscript𝑑𝑖𝑡{\mathcal{Y}}^{d_{i}}_{t}caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT). Further details about both domains are provided in Appendix B.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 1: This diagram outlines WISER’s comprehensive training process, divided into four key phases. First, in the Representation Learning phase, a domain-invariant representation (𝒵𝒵{\mathcal{Z}}caligraphic_Z) is learned between cell line and patient genomic profiles using a shared encoder and private encoding scheme. Next, in the Weak Supervision phase, multiple label functions are trained using labeled genomic profiles of cell lines to assign pseudo labels to unlabeled patient genomic profiles. Following that, in the Subset Selection phase, pseudo labels and the domain-invariant representation (𝒵𝒵{\mathcal{Z}}caligraphic_Z) are used to select a subset of patient genomic profiles (𝒟patientsubsuperscriptsubscript𝒟𝑝𝑎𝑡𝑖𝑒𝑛𝑡𝑠𝑢𝑏{\mathcal{D}}_{patient}^{sub}caligraphic_D start_POSTSUBSCRIPT italic_p italic_a italic_t italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s italic_u italic_b end_POSTSUPERSCRIPT) and associated pseudo labels based on the consistency of the labels among nearest neighbors. Finally, in the Drug Response Prediction phase, the selected subset, along with labeled genomic profiles from cell lines, is utilized for downstream classifier training and predicting drug responses among patients.

Solution Overview: Here, we describe the overview of our method comprising four major stages as depicted in Fig. 1.

Stage 1: Representation Learning In the first stage, we learn representations that are invariant between patient and cell line domains. Specifically, we learn discrete latent representations for individual drugs. The desired domain-invariant representation 𝒵𝒵{\mathcal{Z}}caligraphic_Z is generated through a weighted combination of these drug representations.

Stage 2: Weak Supervision To incorporate the unlabeled patient genomic profiles in the training of the downstream drug response prediction model, we train multiple classifiers (label functions) using labeled cell line data and the domain invariant representation (𝒵𝒵{\mathcal{Z}}caligraphic_Z). These label functions are then used to predict labels for the unlabeled patient dataset. The confident predictions from all label functions are combined based on majority-vote to assign the pseudo labels.

Stage 3: Subset Selection In this stage, we propose to utilize a subset of genomic profiles with confident predictions as indicated by the label functions. We employ cut statistics (Muhlenbach et al., 2004) in conjunction with the domain-invariant representation (𝒵𝒵{\mathcal{Z}}caligraphic_Z) to select a subset of least noisy samples.

Stage 4: Drug Response Prediction We combine the subset of patient genomic profiles and associated pseudo labels, chosen after subset selection in Stage 3, with the labeled cell line genomic profiles to train a downstream drug response prediction classifier. This classifier can be used to infer drug responses in new patients.

3.2 Representation Learning

Genomic profile data collected from cell lines and patients exhibit distributional shifts owing to multiple confounding factors (He et al., 2022). This can cause a model trained using cell line data to not generalize to patients. In line with previous work, we address this using a private and shared encoder scheme, where a shared encoder (𝒞𝒮subscript𝒞𝒮{\mathcal{C}}_{{\mathcal{S}}}caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT) captures a domain invariant representation between the two domains while a private encoder (𝒞𝒫subscript𝒞𝒫{\mathcal{C}}_{{\mathcal{P}}}caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT) captures domain specific information. However, He et al. (2022) do not consider the drug response information (𝒴cdi(𝒳c)subscriptsuperscript𝒴subscript𝑑𝑖𝑐subscript𝒳𝑐{\mathcal{Y}}^{d_{i}}_{c}({\mathcal{X}}_{c})caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT )) during representation learning. We address this by representing the genomic profile (𝒵𝒵{\mathcal{Z}}caligraphic_Z) as a weighted combination of drug embedding ({\mathcal{R}}caligraphic_R) (Eq. 1, Eq. 3) and used a triplet loss to learn these weights based on the drug efficacy results (Eq. 4).

In line with discrete representation learning methods (Lee et al., 2021), we leverage information on how a specific drug responds to a genomic profile, to generate a drug-specific discrete latent representation (={d1,d2dn}subscriptsubscript𝑑1subscriptsubscript𝑑2subscriptsubscript𝑑𝑛{\mathcal{R}}=\{{\mathcal{R}}_{d_{1}},{\mathcal{R}}_{d_{2}}\ldots{\mathcal{R}}% _{d_{n}}\}caligraphic_R = { caligraphic_R start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , caligraphic_R start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT … caligraphic_R start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT }). Similarly, inspired by contextual attention maps (Graves et al., 2014; Bahdanau et al., 2014), we combine the discrete representations of drugs ({\mathcal{R}}caligraphic_R) and the shared representation of genomic profiles (𝒞𝒮subscript𝒞𝒮{\mathcal{C}}_{{\mathcal{S}}}caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT) to form a new representation of the given genomic profile (𝒵𝒵{\mathcal{Z}}caligraphic_Z). This new representation is a weighted sum of drug embeddings ({\mathcal{R}}caligraphic_R), with weights (𝒲𝒲{\mathcal{W}}caligraphic_W) indicating the efficacy of the different drugs on a given genomic profile. To obtain 𝒲𝒲{\mathcal{W}}caligraphic_W, we calculate the cosine similarity (sim(.)sim(.)italic_s italic_i italic_m ( . )) between {\mathcal{R}}caligraphic_R and 𝒞𝒮subscript𝒞𝒮{\mathcal{C}}_{{\mathcal{S}}}caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT.

The scores over different drugs are further normalized using a softmax function with an inverse temperature (ΔΔ\Deltaroman_Δ) to generate the weight 𝒲𝒲{\mathcal{W}}caligraphic_W. A weighted combination of {\mathcal{R}}caligraphic_R using 𝒲𝒲{\mathcal{W}}caligraphic_W is used to generate 𝒵𝒵{\mathcal{Z}}caligraphic_Z, as given in Eq. 1.

𝒵(𝒳)𝒵𝒳\displaystyle{\mathcal{Z}}({\mathcal{X}})caligraphic_Z ( caligraphic_X ) =i=1n𝒲i(𝒳)diabsentsuperscriptsubscript𝑖1𝑛subscript𝒲𝑖𝒳subscriptsubscript𝑑𝑖\displaystyle=\sum\limits_{i=1}^{n}{\mathcal{W}}_{i}\big{(}{\mathcal{X}}\big{)% }{\mathcal{R}}_{d_{i}}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT caligraphic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( caligraphic_X ) caligraphic_R start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT
𝒲i(𝒳)subscript𝒲𝑖𝒳\displaystyle{\mathcal{W}}_{i}\big{(}{\mathcal{X}}\big{)}caligraphic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( caligraphic_X ) =exp(Δsim(𝒞𝒮(𝒳),di))j=1nexp(Δsim(𝒞𝒮(𝒳),dj))absent𝑒𝑥𝑝Δ𝑠𝑖𝑚subscript𝒞𝒮𝒳subscriptsubscript𝑑𝑖superscriptsubscript𝑗1𝑛𝑒𝑥𝑝Δ𝑠𝑖𝑚subscript𝒞𝒮𝒳subscriptsubscript𝑑𝑗\displaystyle=\frac{exp\big{(}\Delta*sim\big{(}{\mathcal{C}}_{{\mathcal{S}}}% \big{(}{\mathcal{X}}\big{)},{\mathcal{R}}_{d_{i}}\big{)}\big{)}}{\sum\limits_{% j=1}^{n}exp\big{(}\Delta*sim\big{(}{\mathcal{C}}_{{\mathcal{S}}}\big{(}{% \mathcal{X}}\big{)},{\mathcal{R}}_{d_{j}}\big{)}\big{)}}= divide start_ARG italic_e italic_x italic_p ( roman_Δ ∗ italic_s italic_i italic_m ( caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( caligraphic_X ) , caligraphic_R start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_e italic_x italic_p ( roman_Δ ∗ italic_s italic_i italic_m ( caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( caligraphic_X ) , caligraphic_R start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) end_ARG
sim(𝒞𝒮(𝒳),di)𝑠𝑖𝑚subscript𝒞𝒮𝒳subscriptsubscript𝑑𝑖\displaystyle sim\big{(}{\mathcal{C}}_{{\mathcal{S}}}\big{(}{\mathcal{X}}\big{% )},{\mathcal{R}}_{d_{i}}\big{)}italic_s italic_i italic_m ( caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( caligraphic_X ) , caligraphic_R start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) =𝒞𝒮(𝒳)Tdi𝒞𝒮(𝒳) diabsentsubscript𝒞𝒮superscript𝒳𝑇subscriptsubscript𝑑𝑖normsubscript𝒞𝒮𝒳 normsubscriptsubscript𝑑𝑖\displaystyle=\frac{{\mathcal{C}}_{{\mathcal{S}}}\big{(}{\mathcal{X}}\big{)}^{% T}{\mathcal{R}}_{d_{i}}}{||{\mathcal{C}}_{{\mathcal{S}}}\big{(}{\mathcal{X}}% \big{)}||\text{ }||{\mathcal{R}}_{d_{i}}||}= divide start_ARG caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( caligraphic_X ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT caligraphic_R start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG | | caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( caligraphic_X ) | | | | caligraphic_R start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT | | end_ARG (1)

For the training of our encoder models (𝒞ssubscript𝒞𝑠{\mathcal{C}}_{s}caligraphic_C start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, 𝒞psubscript𝒞𝑝{\mathcal{C}}_{p}caligraphic_C start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT) we concatenate (direct-sum\oplus) the weighted representation (𝒵𝒵{\mathcal{Z}}caligraphic_Z from Eq.1) and the private representation (𝒞𝒫={𝒞𝒫t,𝒞𝒫c}subscript𝒞𝒫superscriptsubscript𝒞𝒫𝑡superscriptsubscript𝒞𝒫𝑐{\mathcal{C}}_{{\mathcal{P}}}=\big{\{}{\mathcal{C}}_{{\mathcal{P}}}^{t},{% \mathcal{C}}_{{\mathcal{P}}}^{c}\big{\}}caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT = { caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT }) of a genomic profile before passing it through a shared decoder (D𝐷Ditalic_D) for reconstruction (𝒍1subscript𝒍1{\bm{l}}_{1}bold_italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) in both the domains (𝒳ci𝒢cell,𝒳tj𝒢patientformulae-sequencesuperscriptsubscript𝒳𝑐𝑖subscript𝒢𝑐𝑒𝑙𝑙superscriptsubscript𝒳𝑡𝑗subscript𝒢𝑝𝑎𝑡𝑖𝑒𝑛𝑡{\mathcal{X}}_{c}^{i}\in{\mathcal{G}}_{cell},{\mathcal{X}}_{t}^{j}\in{\mathcal% {G}}_{patient}caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_G start_POSTSUBSCRIPT italic_c italic_e italic_l italic_l end_POSTSUBSCRIPT , caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ∈ caligraphic_G start_POSTSUBSCRIPT italic_p italic_a italic_t italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT), with the following reconstruction loss:

𝒍reconsubscript𝒍𝑟𝑒𝑐𝑜𝑛\displaystyle{\bm{l}}_{recon}bold_italic_l start_POSTSUBSCRIPT italic_r italic_e italic_c italic_o italic_n end_POSTSUBSCRIPT =i=1𝒩c𝒍1(𝒳ci,𝒞𝒫c)𝒩c+j=1𝒩t𝒍1(𝒳tj,𝒞𝒫t)𝒩tabsentsuperscriptsubscript𝑖1subscript𝒩𝑐subscript𝒍1superscriptsubscript𝒳𝑐𝑖superscriptsubscript𝒞𝒫𝑐subscript𝒩𝑐superscriptsubscript𝑗1subscript𝒩𝑡subscript𝒍1superscriptsubscript𝒳𝑡𝑗superscriptsubscript𝒞𝒫𝑡subscript𝒩𝑡\displaystyle=\frac{\sum\limits_{i=1}^{{\mathcal{N}}_{c}}{\bm{l}}_{1}\big{(}{% \mathcal{X}}_{c}^{i},{\mathcal{C}}_{{\mathcal{P}}}^{c}\big{)}}{{\mathcal{N}}_{% c}}+\frac{\sum\limits_{j=1}^{{\mathcal{N}}_{t}}{\bm{l}}_{1}\big{(}{\mathcal{X}% }_{t}^{j},{\mathcal{C}}_{{\mathcal{P}}}^{t}\big{)}}{{\mathcal{N}}_{t}}= divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ) end_ARG start_ARG caligraphic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG + divide start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_ARG start_ARG caligraphic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG
Where, 𝒍1(𝒳,𝒞𝒫)Where, subscript𝒍1𝒳subscript𝒞𝒫\displaystyle\text{Where, }{\bm{l}}_{1}({\mathcal{X}},{\mathcal{C}}_{{\mathcal% {P}}})Where, bold_italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X , caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT ) =D(𝒵(𝒳)𝒞𝒫(𝒳))𝒳2absentsuperscriptnorm𝐷direct-sum𝒵𝒳subscript𝒞𝒫𝒳𝒳2\displaystyle=||D\big{(}{\mathcal{Z}}({\mathcal{X}})\oplus{\mathcal{C}}_{{% \mathcal{P}}}({\mathcal{X}})\big{)}-{\mathcal{X}}||^{2}= | | italic_D ( caligraphic_Z ( caligraphic_X ) ⊕ caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT ( caligraphic_X ) ) - caligraphic_X | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (2)

To ensure that generated embedding (𝒵𝒵{\mathcal{Z}}caligraphic_Z) and the private embedding (𝒞𝒫subscript𝒞𝒫{\mathcal{C}}_{{\mathcal{P}}}caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT) do not capture redundant information, we introduce an orthogonal loss (He et al., 2022) between these two embeddings as: 𝒍ortho=𝒵(𝒳c)T𝒞𝒫c(𝒳c)2+𝒵(𝒳t)T𝒞𝒫t(𝒳t)2subscript𝒍𝑜𝑟𝑡𝑜superscriptnorm𝒵superscriptsubscript𝒳𝑐𝑇superscriptsubscript𝒞𝒫𝑐subscript𝒳𝑐2superscriptnorm𝒵superscriptsubscript𝒳𝑡𝑇superscriptsubscript𝒞𝒫𝑡subscript𝒳𝑡2{\bm{l}}_{ortho}=||{\mathcal{Z}}\big{(}{\mathcal{X}}_{c}\big{)}^{T}{\mathcal{C% }}_{{\mathcal{P}}}^{c}\big{(}{\mathcal{X}}_{c}\big{)}||^{2}+||{\mathcal{Z}}% \big{(}{\mathcal{X}}_{t}\big{)}^{T}{\mathcal{C}}_{{\mathcal{P}}}^{t}\big{(}{% \mathcal{X}}_{t}\big{)}||^{2}bold_italic_l start_POSTSUBSCRIPT italic_o italic_r italic_t italic_h italic_o end_POSTSUBSCRIPT = | | caligraphic_Z ( caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + | | caligraphic_Z ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

We further use the embedding loss 𝒍embedsubscript𝒍𝑒𝑚𝑏𝑒𝑑{\bm{l}}_{embed}bold_italic_l start_POSTSUBSCRIPT italic_e italic_m italic_b italic_e italic_d end_POSTSUBSCRIPT (Van Den Oord et al., 2017) to ensure that the generated embedding (𝒵𝒵{\mathcal{Z}}caligraphic_Z) and the encoded genomic profiles (𝒞𝒮subscript𝒞𝒮{\mathcal{C}}_{{\mathcal{S}}}caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT) are closer to each other for both cell lines and patients. Eq. 3 illustrates this where sg(.)sg(.)italic_s italic_g ( . ) denotes the stop gradient operator.

𝒍embed=i=1𝒩c𝒍(𝒳ci)𝒩c+j=1𝒩t𝒍(𝒳tj)𝒩tsubscript𝒍𝑒𝑚𝑏𝑒𝑑superscriptsubscript𝑖1subscript𝒩𝑐𝒍superscriptsubscript𝒳𝑐𝑖subscript𝒩𝑐superscriptsubscript𝑗1subscript𝒩𝑡𝒍superscriptsubscript𝒳𝑡𝑗subscript𝒩𝑡\displaystyle{\bm{l}}_{embed}=\frac{\sum\limits_{i=1}^{{\mathcal{N}}_{c}}{\bm{% l}}\big{(}{\mathcal{X}}_{c}^{i}\big{)}}{{\mathcal{N}}_{c}}+\frac{\sum\limits_{% j=1}^{{\mathcal{N}}_{t}}{\bm{l}}\big{(}{\mathcal{X}}_{t}^{j}\big{)}}{{\mathcal% {N}}_{t}}bold_italic_l start_POSTSUBSCRIPT italic_e italic_m italic_b italic_e italic_d end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_italic_l ( caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_ARG start_ARG caligraphic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG + divide start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_italic_l ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) end_ARG start_ARG caligraphic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG
𝒍(𝒳)=𝒵(𝒳)sg(𝒞𝒮(𝒳))2𝒍𝒳superscriptnorm𝒵𝒳𝑠𝑔subscript𝒞𝒮𝒳2\displaystyle{\bm{l}}({\mathcal{X}})=||{\mathcal{Z}}\big{(}{\mathcal{X}}\big{)% }-sg({\mathcal{C}}_{{\mathcal{S}}}\big{(}{\mathcal{X}}\big{)}\big{)}||^{2}bold_italic_l ( caligraphic_X ) = | | caligraphic_Z ( caligraphic_X ) - italic_s italic_g ( caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( caligraphic_X ) ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+sg(𝒵(𝒳))𝒞𝒮(𝒳)2superscriptnorm𝑠𝑔𝒵𝒳subscript𝒞𝒮𝒳2\displaystyle+||sg\big{(}{\mathcal{Z}}\big{(}{\mathcal{X}}\big{)}\big{)}-{% \mathcal{C}}_{{\mathcal{S}}}\big{(}{\mathcal{X}}\big{)}||^{2}+ | | italic_s italic_g ( caligraphic_Z ( caligraphic_X ) ) - caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( caligraphic_X ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (3)

To ensure that the learnt representations reflect the drug efficacy on labeled genomic profiles, we rely on supervised triplet loss (Hermans et al., 2017; Schroff et al., 2015), which has a direct correspondence to modern supervised contrastive loss (Barbano et al., 2022). Triplet loss minimizes the distance between an anchor and positive labeled samples while maximizing the distance from negative labeled samples. In our formulation we use the cosine distance (dis()=1sim()𝑑𝑖𝑠1𝑠𝑖𝑚dis(\cdot)=1-sim(\cdot)italic_d italic_i italic_s ( ⋅ ) = 1 - italic_s italic_i italic_m ( ⋅ ) in Eq. 1), with the drug representation ({\mathcal{R}}caligraphic_R) as the anchor. The goal is to minimize (łcnssubscriptitalic-ł𝑐𝑛𝑠\l_{cns}italic_ł start_POSTSUBSCRIPT italic_c italic_n italic_s end_POSTSUBSCRIPT) the average distance of this anchor from the genomic representation with positive efficacy (𝒔+superscript𝒔{\bm{s}}^{+}bold_italic_s start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) and maximize its distance from the genomic representation with negative efficacy (𝒔superscript𝒔{\bm{s}}^{-}bold_italic_s start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT), (Eq.  4) where 𝟏(𝒴cdj(𝒳jj)=1)1subscriptsuperscript𝒴subscript𝑑𝑗𝑐superscriptsubscript𝒳𝑗𝑗1{\bm{1}}\big{(}{\mathcal{Y}}^{d_{j}}_{c}\big{(}{\mathcal{X}}_{j}^{j}\big{)}=1% \big{)}bold_1 ( caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) = 1 ) is an indicator function capturing the positive efficacy of drug (djsubscript𝑑𝑗d_{j}italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT) on genomic profile (𝒳jjsuperscriptsubscript𝒳𝑗𝑗{\mathcal{X}}_{j}^{j}caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT) and 𝟏(𝒴cdj(𝒳jj)=0)1subscriptsuperscript𝒴subscript𝑑𝑗𝑐superscriptsubscript𝒳𝑗𝑗0{\bm{1}}\big{(}{\mathcal{Y}}^{d_{j}}_{c}\big{(}{\mathcal{X}}_{j}^{j}\big{)}=0% \big{)}bold_1 ( caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) = 0 ) captures the negative efficacy; δ𝛿\deltaitalic_δ is the minimum offset between 𝒔+superscript𝒔{\bm{s}}^{+}bold_italic_s start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and 𝒔superscript𝒔{\bm{s}}^{-}bold_italic_s start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT.

𝒍cnssubscript𝒍𝑐𝑛𝑠\displaystyle\small{\bm{l}}_{cns}bold_italic_l start_POSTSUBSCRIPT italic_c italic_n italic_s end_POSTSUBSCRIPT =max(𝒔+𝒔+δ,0)absentsuperscript𝒔superscript𝒔𝛿0\displaystyle=\max\big{(}{\bm{s}}^{+}-{\bm{s}}^{-}+\delta,0\big{)}= roman_max ( bold_italic_s start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT - bold_italic_s start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT + italic_δ , 0 )
𝒔+superscript𝒔\displaystyle{\bm{s}}^{+}bold_italic_s start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT =i=1𝒩cj=1n𝟏(𝒴cdj(𝒳ci)=1)dis(𝒞s(𝒳ci),dj)i=1𝒩cj=1n𝟏(𝒴cdj(𝒳ci)=1)absentsuperscriptsubscript𝑖1subscript𝒩𝑐superscriptsubscript𝑗1𝑛1subscriptsuperscript𝒴subscript𝑑𝑗𝑐superscriptsubscript𝒳𝑐𝑖1𝑑𝑖𝑠subscript𝒞𝑠subscriptsuperscript𝒳𝑖𝑐subscriptsubscript𝑑𝑗superscriptsubscript𝑖1subscript𝒩𝑐superscriptsubscript𝑗1𝑛1subscriptsuperscript𝒴subscript𝑑𝑗𝑐superscriptsubscript𝒳𝑐𝑖1\displaystyle=\frac{\sum\limits_{i=1}^{{{\mathcal{N}}}_{c}}\sum\limits_{j=1}^{% n}{\bm{1}}\big{(}{\mathcal{Y}}^{d_{j}}_{c}\big{(}{\mathcal{X}}_{c}^{i}\big{)}=% 1\big{)}dis\big{(}{\mathcal{C}}_{s}\big{(}{\mathcal{X}}^{i}_{c}\big{)},{% \mathcal{R}}_{d_{j}}\big{)}}{\sum\limits_{i=1}^{{\mathcal{N}}_{c}}\sum\limits_% {j=1}^{n}{\bm{1}}\big{(}{\mathcal{Y}}^{d_{j}}_{c}\big{(}{\mathcal{X}}_{c}^{i}% \big{)}=1\big{)}}= divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_1 ( caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = 1 ) italic_d italic_i italic_s ( caligraphic_C start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) , caligraphic_R start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_1 ( caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = 1 ) end_ARG
𝒔superscript𝒔\displaystyle{\bm{s}}^{-}bold_italic_s start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT =i=1𝒩cj=1n𝟏(𝒴cdj(𝒳ci)=0)dis(𝒞s(𝒳ci),dj)i=1𝒩cj=1n𝟏(𝒴cdj(𝒳ci)=0)absentsuperscriptsubscript𝑖1subscript𝒩𝑐superscriptsubscript𝑗1𝑛1subscriptsuperscript𝒴subscript𝑑𝑗𝑐superscriptsubscript𝒳𝑐𝑖0𝑑𝑖𝑠subscript𝒞𝑠subscriptsuperscript𝒳𝑖𝑐subscriptsubscript𝑑𝑗superscriptsubscript𝑖1subscript𝒩𝑐superscriptsubscript𝑗1𝑛1subscriptsuperscript𝒴subscript𝑑𝑗𝑐superscriptsubscript𝒳𝑐𝑖0\displaystyle=\frac{\sum\limits_{i=1}^{{\mathcal{N}}_{c}}\sum\limits_{j=1}^{n}% {\bm{1}}\big{(}{\mathcal{Y}}^{d_{j}}_{c}\big{(}{\mathcal{X}}_{c}^{i}\big{)}=0% \big{)}dis\big{(}{\mathcal{C}}_{s}\big{(}{\mathcal{X}}^{i}_{c}\big{)},{% \mathcal{R}}_{d_{j}}\big{)}}{\sum\limits_{i=1}^{{\mathcal{N}}_{c}}\sum\limits_% {j=1}^{n}{\bm{1}}\big{(}{\mathcal{Y}}^{d_{j}}_{c}\big{(}{\mathcal{X}}_{c}^{i}% \big{)}=0\big{)}}= divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_1 ( caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = 0 ) italic_d italic_i italic_s ( caligraphic_C start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) , caligraphic_R start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_1 ( caligraphic_Y start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = 0 ) end_ARG (4)

3.2.1 Domain Adaptation

For efficient generalization of the downstream model, the generated representation should be invariant to the domain. Recent works (He et al., 2022; Ganin et al., 2016), have tried to generate such a representation using adversarial networks. Within this framework, a separate critic network is trained to distinguish between the embeddings from the two domains, while an encoder tries to generate indistinguishable embeddings for the critic. This additional training step ensures that as the training proceeds, an equilibrium is reached where the embedding is invariant for the critic network ({\mathcal{F}}caligraphic_F). In our work, we have used the Wasserstein GAN (WSGAN) (Arjovsky et al., 2017) with a gradient penalty-based adversarial loss (Gulrajani et al., 2017) to train our critic network (Eq. 5). The critic network takes as input a concatenation (C^^𝐶\hat{C}over^ start_ARG italic_C end_ARG) of generated embedding and private representation from both the domains. 𝒍criticsubscript𝒍𝑐𝑟𝑖𝑡𝑖𝑐{\bm{l}}_{critic}bold_italic_l start_POSTSUBSCRIPT italic_c italic_r italic_i italic_t italic_i italic_c end_POSTSUBSCRIPT tries to minimize the difference between the mean critic scores for patients (C^(𝒳tj,𝒞𝒫t))^𝐶superscriptsubscript𝒳𝑡𝑗superscriptsubscript𝒞𝒫𝑡{\mathcal{F}}\big{(}\hat{C}\big{(}{\mathcal{X}}_{t}^{j},{\mathcal{C}}_{{% \mathcal{P}}}^{t}\big{)}\big{)}caligraphic_F ( over^ start_ARG italic_C end_ARG ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) and cell lines (C^(𝒳ci,𝒞𝒫c))^𝐶superscriptsubscript𝒳𝑐𝑖superscriptsubscript𝒞𝒫𝑐{\mathcal{F}}\big{(}\hat{C}\big{(}{\mathcal{X}}_{c}^{i},{\mathcal{C}}_{{% \mathcal{P}}}^{c}\big{)}\big{)}caligraphic_F ( over^ start_ARG italic_C end_ARG ( caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ) ). In contrast, the patient representations are learnt to obtain a higher critic score (𝒍gensubscript𝒍𝑔𝑒𝑛{\bm{l}}_{gen}bold_italic_l start_POSTSUBSCRIPT italic_g italic_e italic_n end_POSTSUBSCRIPT). A gradient penalty term is added, which encourages the gradient of the critic to have a norm close to 1 to maintain Lipschitz continuity (Arjovsky et al., 2017). These gradients are calculated on linear interpolate of input representation from both the domains ({\mathcal{L}}caligraphic_L), where =ϵC^(𝒳c,𝒞𝒫c)+(1ϵ)C^(𝒳t,𝒞𝒫t)italic-ϵ^𝐶subscript𝒳𝑐superscriptsubscript𝒞𝒫𝑐1italic-ϵ^𝐶subscript𝒳𝑡superscriptsubscript𝒞𝒫𝑡{\mathcal{L}}=\epsilon\hat{C}\big{(}{\mathcal{X}}_{c},{\mathcal{C}}_{{\mathcal% {P}}}^{c}\big{)}+\big{(}1-\epsilon\big{)}\hat{C}\big{(}{\mathcal{X}}_{t},{% \mathcal{C}}_{{\mathcal{P}}}^{t}\big{)}caligraphic_L = italic_ϵ over^ start_ARG italic_C end_ARG ( caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ) + ( 1 - italic_ϵ ) over^ start_ARG italic_C end_ARG ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) and ϵU(0,1)similar-toitalic-ϵ𝑈01\epsilon\sim U\big{(}0,1\big{)}italic_ϵ ∼ italic_U ( 0 , 1 ). Mathematically, the aforementioned loss functions are defined a s follows:

𝒍criticsubscript𝒍𝑐𝑟𝑖𝑡𝑖𝑐\displaystyle{\bm{l}}_{critic}bold_italic_l start_POSTSUBSCRIPT italic_c italic_r italic_i italic_t italic_i italic_c end_POSTSUBSCRIPT =1𝒩tj=1𝒩t(C^(𝒳tj,𝒞𝒫t))absent1subscript𝒩𝑡superscriptsubscript𝑗1subscript𝒩𝑡^𝐶superscriptsubscript𝒳𝑡𝑗superscriptsubscript𝒞𝒫𝑡\displaystyle=\frac{1}{{\mathcal{N}}_{t}}\sum\limits_{j=1}^{{\mathcal{N}}_{t}}% {\mathcal{F}}\big{(}\hat{C}\big{(}{\mathcal{X}}_{t}^{j},{\mathcal{C}}_{{% \mathcal{P}}}^{t}\big{)}\big{)}= divide start_ARG 1 end_ARG start_ARG caligraphic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT caligraphic_F ( over^ start_ARG italic_C end_ARG ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) )
1𝒩ci=1𝒩c(C^(𝒳ci,𝒞𝒫c))1subscript𝒩𝑐superscriptsubscript𝑖1subscript𝒩𝑐^𝐶superscriptsubscript𝒳𝑐𝑖superscriptsubscript𝒞𝒫𝑐\displaystyle-\frac{1}{{\mathcal{N}}_{c}}\sum\limits_{i=1}^{{\mathcal{N}}_{c}}% {\mathcal{F}}\big{(}\hat{C}\big{(}{\mathcal{X}}_{c}^{i},{\mathcal{C}}_{{% \mathcal{P}}}^{c}\big{)}\big{)}- divide start_ARG 1 end_ARG start_ARG caligraphic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT caligraphic_F ( over^ start_ARG italic_C end_ARG ( caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ) )
+λ(()1)2𝜆superscriptnormsubscript12\displaystyle+\lambda\big{(}||\nabla_{{\mathcal{L}}}{\mathcal{F}}\big{(}{% \mathcal{L}}\big{)}||-1\big{)}^{2}+ italic_λ ( | | ∇ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT caligraphic_F ( caligraphic_L ) | | - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
𝒍gensubscript𝒍𝑔𝑒𝑛\displaystyle{\bm{l}}_{gen}bold_italic_l start_POSTSUBSCRIPT italic_g italic_e italic_n end_POSTSUBSCRIPT =1𝒩ti=1𝒩t(C^(𝒳ti,𝒞𝒫t))absent1subscript𝒩𝑡superscriptsubscript𝑖1subscript𝒩𝑡^𝐶superscriptsubscript𝒳𝑡𝑖superscriptsubscript𝒞𝒫𝑡\displaystyle=-\frac{1}{{\mathcal{N}}_{t}}\sum\limits_{i=1}^{{\mathcal{N}}_{t}% }{\mathcal{F}}\big{(}\hat{C}\big{(}{\mathcal{X}}_{t}^{i},{\mathcal{C}}_{{% \mathcal{P}}}^{t}\big{)}\big{)}= - divide start_ARG 1 end_ARG start_ARG caligraphic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT caligraphic_F ( over^ start_ARG italic_C end_ARG ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) )
C^(𝒳,𝒞𝒫)^𝐶𝒳subscript𝒞𝒫\displaystyle\hat{C}\big{(}{\mathcal{X}},{\mathcal{C}}_{{\mathcal{P}}}\big{)}over^ start_ARG italic_C end_ARG ( caligraphic_X , caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT ) =𝒵(𝒳)𝒞𝒫(𝒳)absentdirect-sum𝒵𝒳subscript𝒞𝒫𝒳\displaystyle={\mathcal{Z}}\big{(}{\mathcal{X}}\big{)}\oplus{\mathcal{C}}_{{% \mathcal{P}}}\big{(}{\mathcal{X}}\big{)}= caligraphic_Z ( caligraphic_X ) ⊕ caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT ( caligraphic_X ) (5)

The complete training occurs in two stages - first where the model is trained only using the loss (𝒍pl=𝒍recon+𝒍cns+𝒍embed+𝒍orthosubscript𝒍𝑝𝑙subscript𝒍𝑟𝑒𝑐𝑜𝑛subscript𝒍𝑐𝑛𝑠subscript𝒍𝑒𝑚𝑏𝑒𝑑subscript𝒍𝑜𝑟𝑡𝑜{\bm{l}}_{pl}={\bm{l}}_{recon}+{\bm{l}}_{cns}+{\bm{l}}_{embed}+{\bm{l}}_{ortho}bold_italic_l start_POSTSUBSCRIPT italic_p italic_l end_POSTSUBSCRIPT = bold_italic_l start_POSTSUBSCRIPT italic_r italic_e italic_c italic_o italic_n end_POSTSUBSCRIPT + bold_italic_l start_POSTSUBSCRIPT italic_c italic_n italic_s end_POSTSUBSCRIPT + bold_italic_l start_POSTSUBSCRIPT italic_e italic_m italic_b italic_e italic_d end_POSTSUBSCRIPT + bold_italic_l start_POSTSUBSCRIPT italic_o italic_r italic_t italic_h italic_o end_POSTSUBSCRIPT ) for a few epochs and later using 𝒍total=𝒍pl+𝒍gensubscript𝒍𝑡𝑜𝑡𝑎𝑙subscript𝒍𝑝𝑙subscript𝒍𝑔𝑒𝑛{\bm{l}}_{total}={\bm{l}}_{pl}+{\bm{l}}_{gen}bold_italic_l start_POSTSUBSCRIPT italic_t italic_o italic_t italic_a italic_l end_POSTSUBSCRIPT = bold_italic_l start_POSTSUBSCRIPT italic_p italic_l end_POSTSUBSCRIPT + bold_italic_l start_POSTSUBSCRIPT italic_g italic_e italic_n end_POSTSUBSCRIPT, and 𝒍criticsubscript𝒍𝑐𝑟𝑖𝑡𝑖𝑐{\bm{l}}_{critic}bold_italic_l start_POSTSUBSCRIPT italic_c italic_r italic_i italic_t italic_i italic_c end_POSTSUBSCRIPT for the critic network.

3.3 Weak Supervision

Once we learn the domain invariant representations, they are subsequently employed to generate pseudo labels for the unlabeled genomic profile of patients. For this task, we partition the labeled cell line data into 𝒪𝒪{\mathcal{O}}caligraphic_O distinct subsets (𝒟celli i1𝒪superscriptsubscript𝒟𝑐𝑒𝑙𝑙𝑖 𝑖1𝒪{\mathcal{D}}_{cell}^{i}\text{ }i\in{1\ldots{\mathcal{O}}}caligraphic_D start_POSTSUBSCRIPT italic_c italic_e italic_l italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_i ∈ 1 … caligraphic_O, where 𝒟celli𝒢cellsuperscriptsubscript𝒟𝑐𝑒𝑙𝑙𝑖subscript𝒢𝑐𝑒𝑙𝑙{\mathcal{D}}_{cell}^{i}\subset{\mathcal{G}}_{cell}caligraphic_D start_POSTSUBSCRIPT italic_c italic_e italic_l italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⊂ caligraphic_G start_POSTSUBSCRIPT italic_c italic_e italic_l italic_l end_POSTSUBSCRIPT) and train a classifier (isubscript𝑖{\mathcal{M}}_{i}caligraphic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) using their representations (𝒵𝒵{\mathcal{Z}}caligraphic_Z). Each individual classifier acts as a label function in our weak supervision framework and is utilized to infer the probability of drug response prediction for the genomic profile of patients (𝒫i(y|𝒳tj)subscript𝒫𝑖conditional𝑦superscriptsubscript𝒳𝑡𝑗{\mathcal{P}}_{i}\big{(}y|{\mathcal{X}}_{t}^{j}\big{)}caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y | caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ), where 𝒳tj𝒢patientsuperscriptsubscript𝒳𝑡𝑗subscript𝒢𝑝𝑎𝑡𝑖𝑒𝑛𝑡{\mathcal{X}}_{t}^{j}\in{\mathcal{G}}_{patient}caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ∈ caligraphic_G start_POSTSUBSCRIPT italic_p italic_a italic_t italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT). The model assigns a label 𝒚^=1^𝒚1\hat{{\bm{y}}}=1over^ start_ARG bold_italic_y end_ARG = 1, when the predicted drug response probability exceeds a threshold 𝒕+superscript𝒕{\bm{t}}^{+}bold_italic_t start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and 𝒚^=0^𝒚0\hat{{\bm{y}}}=0over^ start_ARG bold_italic_y end_ARG = 0, when the probability falls below a threshold 𝒕superscript𝒕{\bm{t}}^{-}bold_italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT. For all intermediate probabilities where the confidence in model predictions is low, it abstains from assigning any class and labels the sample as -1 (Eq 6).

𝒫i(y|𝒳tj)=subscript𝒫𝑖conditional𝑦superscriptsubscript𝒳𝑡𝑗absent\displaystyle{\mathcal{P}}_{i}\big{(}y|{\mathcal{X}}_{t}^{j}\big{)}=caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y | caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) = i(𝒵(𝒳tj)) s.t. i{1𝒪},𝒳tj𝒢patientformulae-sequencesubscript𝑖𝒵superscriptsubscript𝒳𝑡𝑗 s.t. 𝑖1𝒪superscriptsubscript𝒳𝑡𝑗subscript𝒢𝑝𝑎𝑡𝑖𝑒𝑛𝑡\displaystyle{\mathcal{M}}_{i}\big{(}{\mathcal{Z}}\big{(}{\mathcal{X}}_{t}^{j}% \big{)}\big{)}\text{ s.t. }i\in\{1\ldots{\mathcal{O}}\},{\mathcal{X}}_{t}^{j}% \in{\mathcal{G}}_{patient}caligraphic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( caligraphic_Z ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) ) s.t. italic_i ∈ { 1 … caligraphic_O } , caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ∈ caligraphic_G start_POSTSUBSCRIPT italic_p italic_a italic_t italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT
𝒚^ij=subscriptsuperscript^𝒚𝑗𝑖absent\displaystyle\hat{{\bm{y}}}^{j}_{i}=over^ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = {1,if 𝒫i(y|𝒳tj)>t+0,if 𝒫i(y|𝒳tj)<t1,otherwisecases1if subscript𝒫𝑖conditional𝑦superscriptsubscript𝒳𝑡𝑗superscript𝑡0if subscript𝒫𝑖conditional𝑦superscriptsubscript𝒳𝑡𝑗superscript𝑡1otherwise\displaystyle\begin{cases}1,&\text{if }{\mathcal{P}}_{i}\big{(}y|{\mathcal{X}}% _{t}^{j}\big{)}>t^{+}\\ 0,&\text{if }{\mathcal{P}}_{i}\big{(}y|{\mathcal{X}}_{t}^{j}\big{)}<t^{-}\\ -1,&\text{otherwise}\\ \end{cases}{ start_ROW start_CELL 1 , end_CELL start_CELL if caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y | caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) > italic_t start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL if caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y | caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) < italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL - 1 , end_CELL start_CELL otherwise end_CELL end_ROW (6)

Samples with atleast one valid prediction (not abstained) from the label functions are used subsequently. The final pseudo label (𝒚tjsubscriptsuperscript𝒚𝑗𝑡{\bm{y}}^{j}_{t}bold_italic_y start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT) for a given patient genomic profile (𝒳tjsuperscriptsubscript𝒳𝑡𝑗{\mathcal{X}}_{t}^{j}caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT) is decided by a majority vote across all non abstained predictions (𝒚^ijsubscriptsuperscript^𝒚𝑗𝑖\hat{{\bm{y}}}^{j}_{i}over^ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT). The details are in Eq. 7, where (𝟏(𝒚^ij=1)1subscriptsuperscript^𝒚𝑗𝑖1{\bm{1}}\big{(}\hat{{\bm{y}}}^{j}_{i}=1\big{)}bold_1 ( over^ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 )) and (𝟏(𝒚^ij=0)1subscriptsuperscript^𝒚𝑗𝑖0{\bm{1}}\big{(}\hat{{\bm{y}}}^{j}_{i}=0\big{)}bold_1 ( over^ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 )) are indicator functions.

𝒚tj=subscriptsuperscript𝒚𝑗𝑡absent\displaystyle{\bm{y}}^{j}_{t}=bold_italic_y start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = {1,if i=1𝒪𝟏(𝒚^ij=1)>i=1𝒪𝟏(𝒚^ij=0)0,ifi=1𝒪𝟏(𝒚^ij=1)i=1𝒪𝟏(𝒚^ij=0)cases1if superscriptsubscript𝑖1𝒪1subscriptsuperscript^𝒚𝑗𝑖1superscriptsubscript𝑖1𝒪1subscriptsuperscript^𝒚𝑗𝑖00ifsuperscriptsubscript𝑖1𝒪1subscriptsuperscript^𝒚𝑗𝑖1superscriptsubscript𝑖1𝒪1subscriptsuperscript^𝒚𝑗𝑖0\displaystyle\begin{cases}1,&\text{if }\sum\limits_{i=1}^{{\mathcal{O}}}{\bm{1% }}\big{(}\hat{{\bm{y}}}^{j}_{i}=1\big{)}>\sum\limits_{i=1}^{{\mathcal{O}}}{\bm% {1}}\big{(}\hat{{\bm{y}}}^{j}_{i}=0\big{)}\\ 0,&\text{if}\sum\limits_{i=1}^{{\mathcal{O}}}{\bm{1}}\big{(}\hat{{\bm{y}}}^{j}% _{i}=1\big{)}\leq\sum\limits_{i=1}^{{\mathcal{O}}}{\bm{1}}\big{(}\hat{{\bm{y}}% }^{j}_{i}=0\big{)}\end{cases}{ start_ROW start_CELL 1 , end_CELL start_CELL if ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_O end_POSTSUPERSCRIPT bold_1 ( over^ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ) > ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_O end_POSTSUPERSCRIPT bold_1 ( over^ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ) end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL if ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_O end_POSTSUPERSCRIPT bold_1 ( over^ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ) ≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_O end_POSTSUPERSCRIPT bold_1 ( over^ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ) end_CELL end_ROW (7)

3.4 Subset selection and Drug Response Prediction

Once the pseudo labels have been assigned to the non-abstained patient genomic profiles, they can be directly used in conjunction with the labeled cell line data for the training of the drug response prediction classifier. However, recent works (Lang et al., 2022; Shubham et al., 2023) have shown that in a weak supervision setting, a complete set of non-abstained samples generates sub-optimal performance whereas considering a subset, improves performance.

In our work, we use cut statistics (Muhlenbach et al., 2004) to select a subset of the non-abstained dataset (𝒱𝒱{\mathcal{V}}caligraphic_V) by using the domain invariant representation (𝒵𝒵{\mathcal{Z}}caligraphic_Z) and the pseudo labels (𝒚tsubscript𝒚𝑡{\bm{y}}_{t}bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT) assigned to them. Each data sample (𝒳ti,𝒚tisuperscriptsubscript𝒳𝑡𝑖subscriptsuperscript𝒚𝑖𝑡{\mathcal{X}}_{t}^{i},{\bm{y}}^{i}_{t}caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT) where (𝒳ti𝒱superscriptsubscript𝒳𝑡𝑖𝒱{\mathcal{X}}_{t}^{i}\in{\mathcal{V}}caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_V ) is assigned a normalized Z score (𝒛isubscript𝒛𝑖{\bm{z}}_{i}bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) as explained below. For each patient (𝒳ti𝒢patientsuperscriptsubscript𝒳𝑡𝑖subscript𝒢𝑝𝑎𝑡𝑖𝑒𝑛𝑡{\mathcal{X}}_{t}^{i}\in{\mathcal{G}}_{patient}caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_G start_POSTSUBSCRIPT italic_p italic_a italic_t italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT), we first find the nearest neighbors NN(𝒳tisuperscriptsubscript𝒳𝑡𝑖{\mathcal{X}}_{t}^{i}caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT) = {𝒳tl:where (𝒳tl,𝒳ti)\big{\{}{\mathcal{X}}_{t}^{l}:\text{where }\big{(}{\mathcal{X}}_{t}^{l},{% \mathcal{X}}_{t}^{i}\big{)}{ caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT : where ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) are K𝐾Kitalic_K nearest neighbors based on L2𝐿2L2italic_L 2 distance between 𝒵(𝒳tl)𝒵superscriptsubscript𝒳𝑡𝑙{\mathcal{Z}}\big{(}{\mathcal{X}}_{t}^{l}\big{)}caligraphic_Z ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ), 𝒵(𝒳ti)}{\mathcal{Z}}\big{(}{\mathcal{X}}_{t}^{i}\big{)}\big{\}}caligraphic_Z ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) }. A graph (G=(𝒱,))G=({\mathcal{V}},{\mathcal{E}}))italic_G = ( caligraphic_V , caligraphic_E ) ) is created with the number of nodes equal to the number of non-abstained patient genomic profile (𝒱𝒱{\mathcal{V}}caligraphic_V) and edges ({\mathcal{E}}caligraphic_E) defined as the nearest neighbor for each sample (NN(𝒳tisuperscriptsubscript𝒳𝑡𝑖{\mathcal{X}}_{t}^{i}caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT), 𝒳ti𝒱superscriptsubscript𝒳𝑡𝑖𝒱{\mathcal{X}}_{t}^{i}\in{\mathcal{V}}caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_V ). For every edge in the graph a weight (𝒘i,jsubscript𝒘𝑖𝑗{\bm{w}}_{i,j}bold_italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT) is assigned, so that samples with similar representation (𝒵𝒵{\mathcal{Z}}caligraphic_Z) has higher weight compared to dissimilar ones i.e., 𝒘i,j=(1+𝒵(𝒳ti)𝒵(𝒳tj))1subscript𝒘𝑖𝑗superscript1norm𝒵superscriptsubscript𝒳𝑡𝑖𝒵superscriptsubscript𝒳𝑡𝑗1{\bm{w}}_{i,j}=(1+||{\mathcal{Z}}\big{(}{\mathcal{X}}_{t}^{i}\big{)}-{\mathcal% {Z}}\big{(}{\mathcal{X}}_{t}^{j}\big{)}||)^{-1}bold_italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = ( 1 + | | caligraphic_Z ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) - caligraphic_Z ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) | | ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT where 𝒳tjNN(𝒳ti)superscriptsubscript𝒳𝑡𝑗𝑁𝑁superscriptsubscript𝒳𝑡𝑖{\mathcal{X}}_{t}^{j}\in NN\big{(}{\mathcal{X}}_{t}^{i}\big{)}caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ∈ italic_N italic_N ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ). In general a set of data points (sub-graph) with similar representation (higher 𝒘i,jsubscript𝒘𝑖𝑗{\bm{w}}_{i,j}bold_italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT) but sharing different pseudo labels are considered to be noisy and should not be considered for downstream training (Muhlenbach et al., 2004). Under given assumption, each sample 𝒳tisuperscriptsubscript𝒳𝑡𝑖{\mathcal{X}}_{t}^{i}caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is assigned a score 𝒥isuperscript𝒥𝑖{\mathcal{J}}^{i}caligraphic_J start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, a sum of weights of samples sharing different class labels (𝟏(𝒚ti𝒚tj)1subscriptsuperscript𝒚𝑖𝑡subscriptsuperscript𝒚𝑗𝑡{\bm{1}}\big{(}{\bm{y}}^{i}_{t}\neq{\bm{y}}^{j}_{t}\big{)}bold_1 ( bold_italic_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≠ bold_italic_y start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )) among the nearest neighbor. Further, under a null hypothesis of independent assignment of class labels with probability 𝒫(𝒚t)𝒫subscript𝒚𝑡{\mathcal{P}}({\bm{y}}_{t})caligraphic_P ( bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) a Z-score (zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) is calculated for 𝒥isuperscript𝒥𝑖{\mathcal{J}}^{i}caligraphic_J start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT using the mean (μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) and variance (σisubscript𝜎𝑖\sigma_{i}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) calculated according to Muhlenbach et al. (2004). 𝒫(𝒚t)𝒫subscript𝒚𝑡{\mathcal{P}}({\bm{y}}_{t})caligraphic_P ( bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is approximated by the bin counts of both positive and negative classes amongst the non-abstained samples. A smaller 𝒛isubscript𝒛𝑖{\bm{z}}_{i}bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT signifies the consistency of class labels amongst the nearest neighbors and is an indicator of less noisy pseudo labels. In our work, the non-abstained patient data is sorted based on 𝒛isubscript𝒛𝑖{\bm{z}}_{i}bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, (Eq. 8) and the top b%percent𝑏b\%italic_b % (also referred to as budget) is used to obtain 𝒟patientsubsubscriptsuperscript𝒟𝑠𝑢𝑏𝑝𝑎𝑡𝑖𝑒𝑛𝑡{\mathcal{D}}^{sub}_{patient}caligraphic_D start_POSTSUPERSCRIPT italic_s italic_u italic_b end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p italic_a italic_t italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT which is then used in conjunction with labeled cell line data to train the final classifier for drug response prediction.

𝒛isubscript𝒛𝑖\displaystyle\tiny{\bm{z}}_{i}bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =𝒥iμiσiabsentsubscript𝒥𝑖subscript𝜇𝑖subscript𝜎𝑖\displaystyle=\frac{{\mathcal{J}}_{i}-\mu_{i}}{\sigma_{i}}= divide start_ARG caligraphic_J start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG
𝒥isubscript𝒥𝑖\displaystyle{\mathcal{J}}_{i}caligraphic_J start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =jNN(𝒳ti)𝒘i,j𝟏(𝒚ti𝒚tj)absentsubscript𝑗NNsuperscriptsubscript𝒳𝑡𝑖subscript𝒘𝑖𝑗1subscriptsuperscript𝒚𝑖𝑡subscriptsuperscript𝒚𝑗𝑡\displaystyle=\sum\limits_{j\in\text{NN}\big{(}{\mathcal{X}}_{t}^{i}\big{)}}{% \bm{w}}_{i,j}{\bm{1}}\big{(}{\bm{y}}^{i}_{t}\neq{\bm{y}}^{j}_{t}\big{)}= ∑ start_POSTSUBSCRIPT italic_j ∈ NN ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT bold_1 ( bold_italic_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≠ bold_italic_y start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
μisubscript𝜇𝑖\displaystyle\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =[1𝒫(𝒚ti)]jNN(𝒳ti)𝒘i,jabsentdelimited-[]1𝒫subscriptsuperscript𝒚𝑖𝑡subscript𝑗NNsuperscriptsubscript𝒳𝑡𝑖subscript𝒘𝑖𝑗\displaystyle=\big{[}1-{\mathcal{P}}\big{(}{\bm{y}}^{i}_{t}\big{)}\big{]}\sum% \limits_{j\in\text{NN}\big{(}{\mathcal{X}}_{t}^{i}\big{)}}{\bm{w}}_{i,j}= [ 1 - caligraphic_P ( bold_italic_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] ∑ start_POSTSUBSCRIPT italic_j ∈ NN ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT
σi2superscriptsubscript𝜎𝑖2\displaystyle\sigma_{i}^{2}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝒫(𝒚ti)[1𝒫(𝒚ti)]jNN(𝒳ti)𝒘i,j2absent𝒫subscriptsuperscript𝒚𝑖𝑡delimited-[]1𝒫subscriptsuperscript𝒚𝑖𝑡subscript𝑗NNsuperscriptsubscript𝒳𝑡𝑖superscriptsubscript𝒘𝑖𝑗2\displaystyle={{\mathcal{P}}({\bm{y}}^{i}_{t})\big{[}1-{\mathcal{P}}({\bm{y}}^% {i}_{t})\big{]}}\sum\limits_{j\in\text{NN}\big{(}{\mathcal{X}}_{t}^{i}\big{)}}% {\bm{w}}_{i,j}^{2}= caligraphic_P ( bold_italic_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) [ 1 - caligraphic_P ( bold_italic_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] ∑ start_POSTSUBSCRIPT italic_j ∈ NN ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (8)

Algorithm-1 (In the appendix) describes the complete procedure of our method called the WISER (Weak supervISion and supErvised Representation learning).

Table 1: Performance comparison of predicted patient response using AUROC and AUPRC metrics of our proposed method (WISER). Data related to clinical relapse is used for all the evaluations. The result is noted in the form of (mean / std) where the score has been obtained over five fold cross validation. The best performer among all baselines is reported in bold, while the predictions that were not meaningful are denoted by ‘-’. On an average, our method outperforms others baselines on all the drugs for at least one metric. The best performer is highlighted in bold.
Methods 5-Fluorouracil Temozolomide Sorafenib Gemcitabine Cisplatin
AUROC AUPRC AUROC AUPRC AUROC AUPRC AUROC AUPRC AUROC AUPRC
WISER 0.715/0.036 0.741/0.023 0.760/0.006 0.786/0.019 0.727/0.007 0.728/0.024 0.649/0.037 0.752/0.002 0.851/0.007 0.861/0.020
CODE-AE 0.868/0.030 0.740/0.006 0.751/0.017 0.762/0.001 0.631/0.020 0.705/0.062 0.594/0.016 0.751/0.006 0.652/0.071 0.743/0.011
DAE 0.591/0.066 0.573/0.066 0.685/0.013 0.668/0.105 0.485/0.053 0.613/0.046 0.530/0.036 0.511/0.048 0.522/0.087 0.581/0.096
CORAL 0.578/0.015 0.651/0.135 0.675/0.020 0.654/0.020 0.491/0.023 0.616/0.048 0.597/0.030 0.544/0.037 0.617/0.072 0.617/0.124
VELODROME 0.598/0.054 0.403/0.000 0.701/0.028 0.668/0.000 0.505/0.029 0.749/0.000 0.547/0.030 0.434/0.000 0.583/0.029 0.442/0.000
ENET 0.435/0.092 0.454/0.070 - - - - - - 0.637/0.076 0.623/0.045
TCRP 0.596/0.080 0.546/0.073 0.675/0.009 0.662/0.012 0.441/0.053 0.521/0.054 0.462/0.057 0.502/0.055 0.414/0.048 0.432/0.037
MLP 0.569/0.050 0.599/0.042 0.646/0.022 0.624/0.038 0.444/0.035 0.501/0.035 0.467/0.036 0.498/0.049 0.459/0.070 0.496/0.070
DSN-DANN 0.635/0.065 0.596/0.101 0.683/0.015 0.690/0.040 0.533/0.050 0.628/0.069 0.555/0.070 0.582/0.044 0.585/0.103 0.608/0.133
VAEN 0.633/0.157 0.585/0.100 0.648/0.035 0.632/0.162 0.600/0.021 0.668/0.112 0.526/0.087 0.618/0.223 0.694/0.049 0.698/0.065
COXEN 0.336/0.000 0.403/0.000 0.726/0.000 0.668/0.000 0.639/0.000 0.749/0.000 0.378/0.000 0.434/0.000 0.393/0.000 0.442/0.000
COXRF 0.562/0.070 0.598/0.063 0.388/0.080 0.451/0.031 0.418/0.072 0.505/0.044 0.506/0.078 0.506/0.037 0.554/0.074 0.564/0.065
CELLIGNER 0.536/0.060 0.531/0.024 - - - - 0.575/0.029 0.529/0.053 0.497/0.042 0.550/0.033
ADAE 0.68/0.040 0.725/0.036 0.707/0.010 0.757/0.003 0.540/0.092 0.678/0.040 0.499/0.093 0.691/0.123 0.633/0.165 0.755/0.080
DSN-MMD 0.678/0.074 0.674/0.103 0.712/0.031 0.759/0.051 0.515/0.036 0.582/0.090 0.465/0.041 0.491/0.069 0.650/0.023 0.605/0.067

4 Experiments

4.1 Experiment Settings

We evaluate the proposed method in Four experimental settings - (1) Drug response prediction: In this task we compare different baselines by training a binary classifier to predict efficacy of a given drug on patients, (2) understanding the medical relevance of weak supervision and subset selection techniques in this context, (3) ablation study of the proposed method to compare the performance of the model with and without weak supervision and (4) measuring sensitivity of subset size on classification performance.

Data We have used the cancer cell lines and patient genomic profiles (comprising gene expression data from 1426 genes) as in CODE-AE (He et al., 2022). 677 labeled cancer cell line samples, from DepMap portal (Ghandi et al., 2019), and 9808 unsupervised patient samples from TCGA (Hutter & Zenklusen, 2018) were used. 179 samples of labeled TCGA genomic profiles were used for evaluation. Drug response in cell lines was based on z-score calculated on Area Under the Dose Response Curve (AUDRC) scores. Cell lines with a z-score less than 0 were considered positive respondents and greater than 0 as negative respondents to the drug. For patients, the assessment relied on cancer relapse time post-chemotherapy, categorizing values greater than the median as positive respondents and those less than median as negative respondents. The specifics of data preprocessing and related details are available in He et al. (2022). A set of 20 drugs present in both DepMap and TCGA (𝒟={d1,d2,d20}𝒟subscript𝑑1subscript𝑑2subscript𝑑20{\mathcal{D}}=\{d_{1},d_{2},\ldots d_{20}\}caligraphic_D = { italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … italic_d start_POSTSUBSCRIPT 20 end_POSTSUBSCRIPT }), were considered for the experiment. Details of drugs are provided in Appendix C. Due to the limited number of labeled patient genomic profiles, the evaluation was done only on 5-Fluorouracil (Fu), Temozolomide (Tem), Sorafenib (Sor), Gemcitabine (Gem) and Cisplatin (Cis), with drug responses available in atleast 20 patients.

Model Configuration The encoder and decoder networks, used in representation learning, consist of two linear layers of the neural network. The hidden units associated with the encoder and decoder are (512, 256) and (256, 512) dimensions respectively. Both networks use ReLU based activation units. The critic network and the classifier (used for weak supervision and downstream drug response prediction) consist of two layers of neural network with (64, 32) dimensions of hidden unit with ReLU activation for the first layer. The critic network uses linear layer as final activation, while the classifier uses a sigmoid layer. Same architecture has been used for all the baseline methods for fair comparison. Further details about training and hyper parameter tuning is provided in Appendix C.

Baselines We have compared our method with CODE-AE (He et al., 2022), VAEN (Jia et al., 2021) and DAE (Vincent et al., 2008). Further the proposed method is compared with domain adaptation techniques like Celligner (Warren et al., 2021), Velodrome (Sharifi-Noghabi et al., 2021), Deep CORAL (Sun & Saenko, 2016) and DSN (MMD and DANN variant) (Bousmalis et al., 2016). Recent methods like ADAE (Dincer et al., 2020), COXEN + Random Forest (COXRF) and COXEN (Zhu et al., 2020) were also included for comparison. To compare with algorithms which do not use representation learning, the results of TCRP (Ma et al., 2021), MLP (Sakellaropoulos et al., 2019) and ElasticNet (Kuenzi et al., 2020) were also included.

Metrics For comparison, we have used area under the receiver operating characteristics (AUROC) and area under the precision-recall curve (AUPRC) scores (He et al., 2022). The classifer used for drug response prediction was trained using 5-fold stratified validation data of cell line and tested on patient data from TCGA.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: Ablation on weak supervision and sensitivity test on subset size over the performance of the model.

4.2 Results

4.2.1 Drug response prediction

Table 1 shows a performance comparison of our method with other baselines. Our method (WISER) exhibits superior performance in terms of AUROC scores for Cisplatin, Temozolomide, Gemcitabine, and Sorafenib, surpassing baselines by  15.7%, 0.9%, 5.2% and 8.8% respectively while for AUPRC score it shows an enhancement of 0.1%, 2.4%, 0.1% and 10.6% for 5-Fluorouracil, Temozolomide, Gemcitabine and Cisplatin respectively. Comparison with other traditional methods is provided in Appendix E.

4.2.2 Medical Relevance of the Method

We next examine the medical relevance of the pseudo labels and the subset selected for the downstream prediction, generated using the best hyperparameters. We identify the genes most relevant in the generation of these pseudo labels, through the feature selection procedure of Extra-Trees Classifier (Alfian et al., 2022). This is done by fitting an Extra-Trees Classifier model on the patient genomic profiles (from the selected subset) and their pseudo labels, and selecting genes with top 50% feature importance. We compare the selected genes against the GDISC database (Spainhour et al., 2017), which has independently identified genes associated with chemotherapy response in TCGA. The authors provide information for 22 drugs, however, details for Sor were unavailable. The resulting set of significant genes and the corresponding overlap are highlighted in Table 2. Drugs with relevant information were evaluated on two metrics (1) Precision: This measures the ratio of genes marked as significant by GDISC among all the genes selected by the Extra-Trees Classifier. (2) Recall: This assesses the ratio of genes selected by the Extra-Trees Classifier among the entire set of genes marked as significant by the GDISC database for a given drug. Cisplatin, Temozolomide, Gemcitabine, 5-Fluorouracil achieve a precision of 0.860, 0.609, 0.499, 0.419 respectively. Similarly, the recall achieved by these drugs are 0.503, 0.500, 0.464, 0.459. This score correlates with the performance of the drugs in Table 1, where a higher precision and recall generate better AUROC and AUPRC, thus suggesting the faithfulness (Alvarez Melis & Jaakkola, 2018) of our explanations in terms of gene importances.

4.2.3 Ablation Studies

Table 2: Experiment to examine the medical relevance of weak supervision and subset selection. In the given experiment, the set of genes with significant drug-gene interaction (P-val<<<0.05) for the survival of patients with cancer, from GDISC, is compared with the genes considered relevant by weak supervision and subset selection. The precision and recall between the two sets is reported.
Drug Fu Tem Sor Gem Cis
Gene (P-val<<<0.05) 418 706 - 526 831
Gene (P-val\geq0.05) 521 456 - 473 143
Precision 0.419 0.609 - 0.499 0.860
Recall 0.459 0.500 - 0.464 0.503

We conducted an ablation test on the effect of weak supervision and subset selection, by directly using the representations of labeled cell line samples for the downstream drug response prediction. The results (Figure 2) were compared for the best hyperparameter configuration of each drug. The results indicate that weak supervision and subset selection (WISER) improve AUROC by 4.58% and AUPRC by 3.4% on average. Further details on the experiments are provided in Appendix D.

4.2.4 Sensitivity Analysis

Since the ablation study indicates the importance of weak supervision and subset selection, we next examine the impact of the subset budget (b) on the overall performance. This test was performed by varying b while maintaining the optimal configuration for the remaining parameters. Figure 2 summarizes the result of the experiment. For AUROC, the subset selection setting generated better results for 5-Fluorouracil (b=40%), Cisplatin (b=20%) and Gemcitabine (b=20%) than the complete non-abstained dataset (b=100%). An improvement of 10.2%, 2.4% and 7.7% are seen in these 3 drugs respectively. For AUPRC, the subset selection setting generated better results for all drugs other than Temozolomide, with budget b set to 20%, 10%, 10%, 80% for Cisplatin, Gemcitabine, Sorafenib and 5-Fluorouracil respectively. An improvement of 0.8%, 28.2%, 1.8% and 6.9% was observed for these drugs respectively. It can be seen that using subset selection leads to optimal performance compared to the complete non-abstained dataset.

5 Conclusion

Recent cancer drug response prediction methods have largely followed the paradigm of unsupervised domain-invariant representation learning followed by a downstream drug response classification step. Although supervised training could improve performance, doing so was limited by the heterogeneity in patient responses across drugs and limited availability of labeled patient data. Our approach, addresses these challenges by modeling genomic profiles as a combination of discrete drug representations, reflective of heterogeneous drug responses. We also use weak supervision and subset selection on unlabeled patient genomic profiles to improve generalization of the classifier. WISER demonstrates improved drug response prediction for several clinically significant anti-cancer drugs. To the best of our knowledge, our method is the first to use domain-invariant representation for subset selection with weak supervision, and can be applied to similar settings with large unlabeled datasets. However, the performance of our method is limited by the available labeled dataset and the set of drugs considered for discrete representation learning. Future work can explore further improvements of our approach through other sources of distant supervision, e.g., through knowledge graphs.

6 Impact Statement

This research seeks to enhance the effectiveness of personalized cancer treatment by integrating laboratory data and patient information, thereby bridging gaps between research and real-world outcomes. The study tackles the scarcity of labeled patient data through the use of weak supervision techniques, aiming to contribute to the improvement of reliable and accessible personalized cancer treatments.

References

  • Alfian et al. (2022) Alfian, G., Syafrudin, M., Fahrurrozi, I., Fitriyani, N. L., Atmaji, F. T. D., Widodo, T., Bahiyah, N., Benes, F., and Rhee, J. Predicting breast cancer from risk factors using svm and extra-trees-based feature selection method. Computers, 11(9):136, 2022.
  • Alsaggaf et al. (2024) Alsaggaf, I., Buchan, D., and Wan, C. Improving cell type identification with gaussian noise-augmented single-cell rna-seq contrastive learning. Briefings in Functional Genomics, pp.  elad059, 2024.
  • Alvarez Melis & Jaakkola (2018) Alvarez Melis, D. and Jaakkola, T. Towards robust interpretability with self-explaining neural networks. Advances in neural information processing systems, 31, 2018.
  • Arjovsky et al. (2017) Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein generative adversarial networks. In International conference on machine learning, pp.  214–223. PMLR, 2017.
  • Bahdanau et al. (2014) Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
  • Barbano et al. (2022) Barbano, C. A., Dufumier, B., Tartaglione, E., Grangetto, M., and Gori, P. Unbiased supervised contrastive learning. arXiv preprint arXiv:2211.05568, 2022.
  • Bedard et al. (2013) Bedard, P. L., Hansen, A. R., Ratain, M. J., and Siu, L. L. Tumour heterogeneity in the clinic. Nature, 501(7467):355–364, 2013.
  • Bousmalis et al. (2016) Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., and Erhan, D. Domain separation networks. Advances in neural information processing systems, 29, 2016.
  • Bruna et al. (2016) Bruna, A., Rueda, O. M., Greenwood, W., Batra, A. S., Callari, M., Batra, R. N., Pogrebniak, K., Sandoval, J., Cassidy, J. W., Tufegdzic-Vidakovic, A., et al. A biobank of breast cancer explants with preserved intra-tumor heterogeneity to screen anticancer compounds. Cell, 167(1):260–274, 2016.
  • Chen et al. (2020) Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp.  1597–1607. PMLR, 2020.
  • Dawid & Skene (1979) Dawid, A. P. and Skene, A. M. Maximum likelihood estimation of observer error-rates using the em algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28(1):20–28, 1979.
  • Dincer et al. (2020) Dincer, A. B., Janizek, J. D., and Lee, S.-I. Adversarial deconfounding autoencoder for learning robust gene expression embeddings. Bioinformatics, 36(Supplement_2):i573–i582, 2020.
  • Fu et al. (2020) Fu, D., Chen, M., Sala, F., Hooper, S., Fatahalian, K., and Ré, C. Fast and three-rious: Speeding up weak supervision with triplet methods. In ICML, pp.  3280–3291. PMLR, 2020.
  • Ganin et al. (2016) Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., and Lempitsky, V. Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
  • Ghandi et al. (2019) Ghandi, M., Huang, F. W., Jané-Valbuena, J., Kryukov, G. V., Lo, C. C., McDonald III, E. R., Barretina, J., Gelfand, E. T., Bielski, C. M., Li, H., et al. Next-generation characterization of the cancer cell line encyclopedia. Nature, 569(7757):503–508, 2019.
  • Graf et al. (2021) Graf, F., Hofer, C., Niethammer, M., and Kwitt, R. Dissecting supervised contrastive learning. In International Conference on Machine Learning, pp.  3821–3830. PMLR, 2021.
  • Graves et al. (2014) Graves, A., Wayne, G., and Danihelka, I. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
  • Gulrajani et al. (2017) Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. C. Improved training of wasserstein gans. Advances in neural information processing systems, 30, 2017.
  • He et al. (2022) He, D., Liu, Q., Wu, Y., and Xie, L. A context-aware deconfounding autoencoder for robust prediction of personalized clinical drug response from cell-line compound screening. Nature Machine Intelligence, 4(10):879–892, 2022.
  • Hermans et al. (2017) Hermans, A., Beyer, L., and Leibe, B. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737, 2017.
  • Hinton & Zemel (1993) Hinton, G. E. and Zemel, R. Autoencoders, minimum description length and helmholtz free energy. Advances in neural information processing systems, 6, 1993.
  • Hutter & Zenklusen (2018) Hutter, C. and Zenklusen, J. C. The cancer genome atlas: creating lasting value beyond its data. Cell, 173(2):283–285, 2018.
  • Jia et al. (2021) Jia, P., Hu, R., Pei, G., Dai, Y., Wang, Y.-Y., and Zhao, Z. Deep generative neural network for accurate drug response imputation. Nature communications, 12(1):1740, 2021.
  • Kenton & Toutanova (2019) Kenton, J. D. M.-W. C. and Toutanova, L. K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, pp.  2, 2019.
  • Khosla et al. (2020) Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., and Krishnan, D. Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
  • Kingma & Welling (2013) Kingma, D. P. and Welling, M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  • Kuenzi et al. (2020) Kuenzi, B. M., Park, J., Fong, S. H., Sanchez, K. S., Lee, J., Kreisberg, J. F., Ma, J., and Ideker, T. Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer cell, 38(5):672–684, 2020.
  • Lang et al. (2022) Lang, H., Vijayaraghavan, A., and Sontag, D. Training subset selection for weak supervision. Advances in Neural Information Processing Systems, 35:16023–16036, 2022.
  • Lee et al. (2021) Lee, H. H., Tang, Y., Yang, Q., Yu, X., Bao, S., Landman, B. A., and Huo, Y. Attention-guided supervised contrastive learning for semantic segmentation. arXiv preprint arXiv:2106.01596, 2021.
  • Ma et al. (2021) Ma, J., Fong, S. H., Luo, Y., Bakkenist, C. J., Shen, J. P., Mourragui, S., Wessels, L. F., Hafner, M., Sharan, R., Peng, J., et al. Few-shot learning creates predictive models of drug response that translate from high-throughput screens to individual patients. Nature Cancer, 2(2):233–244, 2021.
  • Mazzetto et al. (2021) Mazzetto, A., Cousins, C., Sam, D., Bach, S. H., and Upfal, E. Adversarial multi class learning under weak supervision with performance guarantees. In International Conference on Machine Learning, pp.  7534–7543. PMLR, 2021.
  • Muhlenbach et al. (2004) Muhlenbach, F., Lallich, S., and Zighed, D. A. Identifying and handling mislabelled instances. Journal of Intelligent Information Systems, 22(1):89–109, 2004.
  • Pan & Yang (2009) Pan, S. J. and Yang, Q. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009.
  • Ratner et al. (2017) Ratner, A., Bach, S. H., Ehrenberg, H., Fries, J., Wu, S., and Ré, C. Snorkel: Rapid training data creation with weak supervision. In Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases, volume 11, pp.  269. NIH Public Access, 2017.
  • Ratner et al. (2016) Ratner, A. J., De Sa, C. M., Wu, S., Selsam, D., and Ré, C. Data programming: Creating large training sets, quickly. NeurIPS, 29, 2016.
  • Sakellaropoulos et al. (2019) Sakellaropoulos, T., Vougas, K., Narang, S., Koinis, F., Kotsinas, A., Polyzos, A., Moss, T. J., Piha-Paul, S., Zhou, H., Kardala, E., et al. A deep learning framework for predicting response to therapy in cancer. Cell reports, 29(11):3367–3373, 2019.
  • Schroff et al. (2015) Schroff, F., Kalenichenko, D., and Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  815–823, 2015.
  • Seyhan (2019) Seyhan, A. A. Lost in translation: the valley of death across preclinical and clinical divide–identification of problems and overcoming obstacles. Translational Medicine Communications, 4(1):1–19, 2019.
  • Sharifi-Noghabi et al. (2020) Sharifi-Noghabi, H., Peng, S., Zolotareva, O., Collins, C. C., and Ester, M. Aitl: adversarial inductive transfer learning with input and output space adaptation for pharmacogenomics. Bioinformatics, 36(Supplement_1):i380–i388, 2020.
  • Sharifi-Noghabi et al. (2021) Sharifi-Noghabi, H., Harjandi, P. A., Zolotareva, O., Collins, C. C., and Ester, M. Out-of-distribution generalization from labelled and unlabelled gene expression data for drug response prediction. Nature Machine Intelligence, 3(11):962–972, 2021.
  • Shubham et al. (2023) Shubham, K., Sastry, P., and AP, P. Fusing conditional submodular gan and programmatic weak supervision. arXiv preprint arXiv:2312.10366, 2023.
  • Spainhour et al. (2017) Spainhour, J. C. G., Lim, J., and Qiu, P. Gdisc: a web portal for integrative analysis of gene–drug interaction for survival in cancer. Bioinformatics, 33(9):1426–1428, 2017.
  • Sun & Saenko (2016) Sun, B. and Saenko, K. Deep coral: Correlation alignment for deep domain adaptation. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14, pp.  443–450. Springer, 2016.
  • Van Den Oord et al. (2017) Van Den Oord, A., Vinyals, O., et al. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  • Vincent et al. (2008) Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pp.  1096–1103, 2008.
  • Wahida et al. (2023) Wahida, A., Buschhorn, L., Fröhling, S., Jost, P. J., Schneeweiss, A., Lichter, P., and Kurzrock, R. The coming decade in precision oncology: six riddles. Nature Reviews Cancer, 23(1):43–54, 2023.
  • Warren et al. (2021) Warren, A., Chen, Y., Jones, A., Shibue, T., Hahn, W. C., Boehm, J. S., Vazquez, F., Tsherniak, A., and McFarland, J. M. Global computational alignment of tumor and cell line transcriptional profiles. Nature Communications, 12(1):22, 2021.
  • WHO (2022) WHO. Cancer. https://www.who.int/news-room/fact-sheets/detail/cancer, February 2022.
  • Zbontar et al. (2021) Zbontar, J., **g, L., Misra, I., LeCun, Y., and Deny, S. Barlow twins: Self-supervised learning via redundancy reduction. In International conference on machine learning, pp.  12310–12320. PMLR, 2021.
  • Zhang et al. (2022) Zhang, J., Hsieh, C.-Y., Yu, Y., Zhang, C., and Ratner, A. A survey on programmatic weak supervision. arXiv preprint arXiv:2202.05433, 2022.
  • Zhu et al. (2020) Zhu, Y., Brettin, T., Evrard, Y. A., Xia, F., Partin, A., Shukla, M., Yoo, H., Doroshow, J. H., and Stevens, R. L. Enhanced co-expression extrapolation (coxen) gene selection method for building anti-cancer drug response prediction models. Genes, 11(9):1070, 2020.

Appendix A Appendix

Algorithm 1 WISER: Weak supervISion and supErvised Representation learning to improve drug response prediction in cancer
0: Genomic profile for cell line (𝒳csubscript𝒳𝑐{\mathcal{X}}_{c}caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT), genomic profile for patients (𝒳tsubscript𝒳𝑡{\mathcal{X}}_{t}caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT), epoch for initial training (𝒫isubscript𝒫𝑖{\mathcal{P}}_{i}caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT), epoch for domain adaptation based training (𝒫dsubscript𝒫𝑑{\mathcal{P}}_{d}caligraphic_P start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT), epoch for critic training (𝒫csubscript𝒫𝑐{\mathcal{P}}_{c}caligraphic_P start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT), batch size ({\mathcal{B}}caligraphic_B), weak supervision thresholds t+superscript𝑡t^{+}italic_t start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and tsuperscript𝑡t^{-}italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT, number of chunks 𝒪𝒪{\mathcal{O}}caligraphic_O of cell line data for training, subset selection budget b𝑏bitalic_b and nearest neighbor size K𝐾Kitalic_K.
1:##Representation Learning
2:for epoch in [0 𝒫isubscript𝒫𝑖\ldots{\mathcal{P}}_{i}… caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT] : do
3:    Sample batch of cell line and patient genomic data from the dataloader without replacement. {𝒳c(i)}i=0superscriptsubscriptsuperscriptsubscript𝒳𝑐𝑖𝑖0\{{\mathcal{X}}_{c}^{(i)}\}_{i=0}^{{\mathcal{B}}}{ caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_B end_POSTSUPERSCRIPT {𝒳t(j)}j=0superscriptsubscriptsuperscriptsubscript𝒳𝑡𝑗𝑗0\{{\mathcal{X}}_{t}^{(j)}\}_{j=0}^{{\mathcal{B}}}{ caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_B end_POSTSUPERSCRIPT
4:    Train shared encoder (𝒞𝒮subscript𝒞𝒮{\mathcal{C}}_{{\mathcal{S}}}caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT), private encoder (𝒞𝒫csuperscriptsubscript𝒞𝒫𝑐{\mathcal{C}}_{{\mathcal{P}}}^{c}caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT, 𝒞𝒫tsuperscriptsubscript𝒞𝒫𝑡{\mathcal{C}}_{{\mathcal{P}}}^{t}caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT), discrete embedding ({\mathcal{R}}caligraphic_R), and decoder (𝒟𝒟{\mathcal{D}}caligraphic_D) with sampled batch using (𝒍plsubscript𝒍𝑝𝑙{\bm{l}}_{pl}bold_italic_l start_POSTSUBSCRIPT italic_p italic_l end_POSTSUBSCRIPT) loss.
5:end for
6:𝒩𝒩{\mathcal{N}}caligraphic_N = 0
7:for epoch in [0 𝒫dsubscript𝒫𝑑\ldots{\mathcal{P}}_{d}… caligraphic_P start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT] : do
8:    Sample batch of cell line and patient genomic data from the dataloader without replacement. {𝒳c(i)}i=0superscriptsubscriptsuperscriptsubscript𝒳𝑐𝑖𝑖0\{{\mathcal{X}}_{c}^{(i)}\}_{i=0}^{{\mathcal{B}}}{ caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_B end_POSTSUPERSCRIPT {𝒳t(j)}j=0superscriptsubscriptsuperscriptsubscript𝒳𝑡𝑗𝑗0\{{\mathcal{X}}_{t}^{(j)}\}_{j=0}^{{\mathcal{B}}}{ caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_B end_POSTSUPERSCRIPT
9:    Train the critic network ({\mathcal{F}}caligraphic_F) with (𝒍criticsubscript𝒍𝑐𝑟𝑖𝑡𝑖𝑐{\bm{l}}_{critic}bold_italic_l start_POSTSUBSCRIPT italic_c italic_r italic_i italic_t italic_i italic_c end_POSTSUBSCRIPT) loss.
10:    𝒩𝒩{\mathcal{N}}caligraphic_N+=1
11:    if 𝒩%𝒫c==0{\mathcal{N}}\%{\mathcal{P}}_{c}==0caligraphic_N % caligraphic_P start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = = 0 then
12:      Sample batch of genomic data {𝒳c(i)}i=0superscriptsubscriptsuperscriptsubscript𝒳𝑐𝑖𝑖0\{{\mathcal{X}}_{c}^{(i)}\}_{i=0}^{{\mathcal{B}}}{ caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_B end_POSTSUPERSCRIPT {𝒳t(j)}j=0superscriptsubscriptsuperscriptsubscript𝒳𝑡𝑗𝑗0\{{\mathcal{X}}_{t}^{(j)}\}_{j=0}^{{\mathcal{B}}}{ caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_B end_POSTSUPERSCRIPT
13:      Train 𝒞𝒮,𝒞𝒫c,𝒞𝒫t,,𝒟subscript𝒞𝒮superscriptsubscript𝒞𝒫𝑐superscriptsubscript𝒞𝒫𝑡𝒟{\mathcal{C}}_{{\mathcal{S}}},{\mathcal{C}}_{{\mathcal{P}}}^{c},{\mathcal{C}}_% {{\mathcal{P}}}^{t},{\mathcal{R}},{\mathcal{D}}caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT , caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT , caligraphic_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , caligraphic_R , caligraphic_D with sampled batch using 𝒍totalsubscript𝒍𝑡𝑜𝑡𝑎𝑙{\bm{l}}_{total}bold_italic_l start_POSTSUBSCRIPT italic_t italic_o italic_t italic_a italic_l end_POSTSUBSCRIPT )
14:    end if
15:end for
16: Use the representation (𝒵𝒵{\mathcal{Z}}caligraphic_Z) generated by shared encoder and drug-based embeddings (𝒞𝒮 and subscript𝒞𝒮 and {\mathcal{C}}_{{\mathcal{S}}}\text{ and }{\mathcal{R}}caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT and caligraphic_R).
17:## Weak Supervision
18:for i in [1 \ldots 𝒪𝒪{\mathcal{O}}caligraphic_O]: do
19:    Train a classifier isubscript𝑖{\mathcal{M}}_{i}caligraphic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT using Dcellisubscriptsuperscript𝐷𝑖𝑐𝑒𝑙𝑙D^{i}_{cell}italic_D start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c italic_e italic_l italic_l end_POSTSUBSCRIPT where DcelliGcellsubscriptsuperscript𝐷𝑖𝑐𝑒𝑙𝑙subscript𝐺𝑐𝑒𝑙𝑙D^{i}_{cell}\subset G_{cell}italic_D start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c italic_e italic_l italic_l end_POSTSUBSCRIPT ⊂ italic_G start_POSTSUBSCRIPT italic_c italic_e italic_l italic_l end_POSTSUBSCRIPT.
20:    Infer 𝒫i(𝒚|𝒳tj)subscript𝒫𝑖conditional𝒚superscriptsubscript𝒳𝑡𝑗{\mathcal{P}}_{i}({\bm{y}}|{\mathcal{X}}_{t}^{j})caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_y | caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) using the trained classifier isubscript𝑖{\mathcal{M}}_{i}caligraphic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where 𝒳tj𝒢patientsuperscriptsubscript𝒳𝑡𝑗subscript𝒢𝑝𝑎𝑡𝑖𝑒𝑛𝑡{\mathcal{X}}_{t}^{j}\in{\mathcal{G}}_{patient}caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ∈ caligraphic_G start_POSTSUBSCRIPT italic_p italic_a italic_t italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT.
21:end for
22: Label samples based on t+superscript𝑡t^{+}italic_t start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and tsuperscript𝑡t^{-}italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT (Eq. 6).
23: Assign the final pseudo label(𝒚tjsubscriptsuperscript𝒚𝑗𝑡{\bm{y}}^{j}_{t}bold_italic_y start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT), based on Majority Voting strategy, for non-abstained samples (Eq. 7).
24:## Subset Selection and Drug Response Prediction
25: Calculate 𝒛isubscript𝒛𝑖{\bm{z}}_{i}bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for non-abstained patient samples as in Eq. 8, sort by 𝒛isubscript𝒛𝑖{\bm{z}}_{i}bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and choose top b𝑏bitalic_b % as the subset.
26: Use the patient genomic profiles associated with this subset, alongwith their pseudo labels, in conjunction with 𝒳csubscript𝒳𝑐{\mathcal{X}}_{c}caligraphic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT to train a drug response prediction classifier.

Appendix B Distinction between the two domains

Table 3 provides details of the two domains under consideration in our study. The cell line domain is notable for its abundant labeled responses to diverse drugs, whereas the patient data predominantly comprises unlabeled samples. For our experiments, we selected 20 drugs which were administered in both patients and cell lines. To evaluate our approach on patients, we considered 5 drugs with a documented response in at least 20 patients (Table 4).

Table 3: Details about the two domains in cancer drug response prediction.
Domains Unlabeled data Labeled data Drug response label Number of drugs with response Number of drugs selected in our experiments
Cell line 1305 686 Evaluated using Z-score computed on AUDRC scores. (1) Z-score less than 0 considered as positive respondents. (2) Z-score greater than 0 considered as negative respondents. 449 20
Patients 9808 179 Cancer relapse time post-chemotherapy (1) Values greater than the median considered positive respondents. (2) Values less than the median considered negative respondents. 78 5
Table 4: Distribution of testing dataset.
Drug 5-Fluorouracil Temozolomide Gemcitabine Cisplatin Sorafenib
TCGA samples 21 46 46 40 26

Appendix C Training details and hyperparameter

The training of the model happens in four stages as mentioned in Algorithm 1. For representation learning, a grid search was performed on the initial training epoch (𝒫isubscript𝒫𝑖{\mathcal{P}}_{i}caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT), domain adaptation epoch(𝒫dsubscript𝒫𝑑{\mathcal{P}}_{d}caligraphic_P start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT) and inverse temperature value (ΔΔ\Deltaroman_Δ). The value considered for the experiments were [50, 100, 300], [1000, 2000, 2500, 3000] and [0.001, 0.1, 1, 2, 2.5, 10, 100] respectively. A set of 20 drugs were used for representation learning ({\mathcal{R}}caligraphic_R) namely 5-Fluorouracil, Gemcitabine, Temozolomide, Cisplatin, Sorafenib, Sunitinib, Doxorubicin, Tamoxifen, Paclitaxel, Carmustine, Cetuximab, Methotrexate, Topotecan, Erlotinib, Irinotecan, Bicalutamide, Temsirolimus, Oxaliplatin, Docetaxel, Etoposide. For weak supervision, 5 label functions were trained on 5 different chunks (𝒪𝒪{\mathcal{O}}caligraphic_O) of labeled cell line dataset. The number of chunk for training was decided based on previous works (He et al., 2022; Ratner et al., 2017), considering the limited number of labeled cell line data and optimal performance of majority vote for less than 10 (Lfs). The value of (t+superscript𝑡t^{+}italic_t start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and tsuperscript𝑡t^{-}italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT) were determined based on grid search over [(0.7, 0.3), (0.55, 0.49), (0.51, 0.46)] respectively. Similar experiments were performed on the median score of the predicted probabilities of label functions. For subset selection, K=20 was considered in line with previous work (Lang et al., 2022). The optimal value of subset size (b) was determined by a grid search over [0.2, 0.4, 0.5, 0.6, 0.8, 1]. All the experiments were done on NVIDIA A6000 Graphic card with 20 core and 160 GB memory. Our code is available at https://github.com/kyrs/WISER.

Appendix D Experiment without weak supervision

Table 5: Performance comparison of predicted patient response using AUROC and AUPRC metrics of our proposed method without weak supervision (WISER(w\o WS)) with other transfer learning based approaches. Data related to clinical relapse is used for all the evaluations. The result is noted in the form of (mean / std) where the score has been obtained over five fold cross validation. The best performer among all baselines is reported in bold, while the predictions that were not meaningful are denoted by ‘-’. On an average, our method performs the best on 3 out of 5 drugs for atleast one metric. The best performer is highlighted in bold.
Methods 5-Fluorouracil Temozolomide Sorafenib Gemcitabine Cisplatin
AUROC AUPRC AUROC AUPRC AUROC AUPRC AUROC AUPRC AUROC AUPRC
WISER- w\o WS 0.687/0.029 0.688/0.047 0.694/0.020 0.725/0.014 0.683/0.021 0.725/0.094 0.617/0.013 0.752/0.005 0.792/0.038 0.808/0.035
CODE-AE 0.868/0.030 0.740/0.006 0.751/0.017 0.762/0.001 0.631/0.020 0.705/0.062 0.594/0.016 0.751/0.006 0.652/0.071 0.743/0.011
TCRP 0.596/0.080 0.546/0.073 0.675/0.009 0.662/0.012 0.441/0.053 0.521/0.054 0.462/0.057 0.502/0.055 0.414/0.048 0.432/0.037
CORAL 0.578/0.015 0.651/0.135 0.675/0.020 0.654/0.020 0.491/0.023 0.616/0.048 0.597/0.030 0.544/0.037 0.617/0.072 0.617/0.124
VELODROME 0.598/0.054 0.403/0.000 0.701/0.028 0.668/0.000 0.505/0.029 0.749/0.000 0.547/0.030 0.434/0.000 0.583/0.029 0.442/0.000

To assess the impact of incorporating labeled drug response data for representation learning, we conducted a separate experiment with our method, excluding weak supervision and subset selection. The results were compared with other transfer learning-based approaches, and the findings are presented in Table 5. Our method ((WISER w\o WS )) demonstrated superior performance in three out of five drugs, showcasing improved AUROC and AUPRC metrics. Specifically, it exhibited gains of approximately 14%, 2.3%, and 5.2% in AUROC for Cisplatin, Gemcitabine, and Sorafenib, respectively. Additionally, there were gains of 6.5% and 0.1% in AUPRC for Cisplatin and Gemcitabine, respectively.

D.1 Sensitivity test on hyperparameter

To analyze the impact of different hyperparameters on representation learning, we have done a sensitivity analysis of initial training epoch (𝒫isubscript𝒫𝑖{\mathcal{P}}_{i}caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT), adversarial training (𝒫dsubscript𝒫𝑑{\mathcal{P}}_{d}caligraphic_P start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT) and temperature (ΔΔ\Deltaroman_Δ) on AUROC and AUPRC performance while using best configuration for other hyper parameters .

D.1.1 Initial training epoch

Table 6 and Table  7 shows the impact of initial training epoch on the performance of the model. As per the result the Cisplatin, Gemcitabine, Temozolomide, Sorafenib, and 5-Fluorouracil achieves best AUROC score for 50, 300, 300, 50, 50 epoch respectively and best AUPRC score for 50, 100, 300, 100, 100 respectively. In general, different drugs performs differently for this hyper parameter, where training for less number of epochs is favourable for Cisplatin while training for more iteration is favoured in Temozolomide.

Table 6: Sensitivity analysis of the initial training epoch (𝒫isubscript𝒫𝑖{\mathcal{P}}_{i}caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) on AUROC scores.
Drug / Epoch 50 100 300
Fu 0.687/0.027 0.642/0.043 0.650/0.034
Sor 0.683/0.019 0.636/0.047 0.441/0.043
Tem 0.558/0.004 0.573/0.024 0.694/0.017
Gem 0.499/0.097 0.440/0.120 0.617/0.011
Cis 0.792/0.034 0.527/0.045 0.529/0.120
Table 7: Sensitivity analysis of the initial training epoch (𝒫isubscript𝒫𝑖{\mathcal{P}}_{i}caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) on AUPRC scores.
Drug / Epoch 50 100 300
Fu 0.626/0.069 0.688/0.042 0.575/0.022
Sor 0.431/0.100 0.725/0.085 0.566/0.065
Tem 0.547/0.029 0.575/0.024 0.725/0.014
Gem 0.75/0.0 0.752/0.005 0.75/0.0
Cis 0.808/0.031 0.575/0.057 0.608/0.081

D.1.2 Adversarial training Epoch

Next we analyze the impact of adversarial training epoch on the performance of the model. Table 8 and Table 9 shows the result of given experiment. Where, Cisplatin, Gemcitabine, Temozolomide, Sorafenib, and 5-Fluorouracil achieves best AUROC score for 2500, 2000, 1000, 2000, 2500 respectively and best AUPRC score for 2000, 1000,1000, 2000, 2000 respectively. In general, we see an impact of domain adaptation on the performance of the model, where in drugs like Cisplatin best results are generated for larger number of training epochs.

Table 8: Sensitivity analysis of the domain adversarial training epoch (𝒫dsubscript𝒫𝑑{\mathcal{P}}_{d}caligraphic_P start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT) on AUROC scores.
Drug / Epoch 1000 2000 2500
Fu 0.509/0.063 0.680/0.013 0.687/0.026
Sor 0.452/0.061 0.683/0.019 0.586/0.027
Tem 0.694/0.018 0.681/0.010 0.664/0.031
Gem 0.518/0.082 0.617/0.011 0.517/0.111
Cis 0.391/0.011 0.654/0.085 0.792/0.033
Table 9: Sensitivity analysis of the domain adversarial training epoch (𝒫dsubscript𝒫𝑑{\mathcal{P}}_{d}caligraphic_P start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT) on AUPRC scores.
Drug / Epoch 1000 2000 2500
Fu 0.589/0.108 0.688/0.042 0.656/0.0348
Sor 0.523/0.113 0.725/0.085 0.565/0.101
Tem 0.725/0.013 0.683/0.035 0.659/0.048
Gem 0.750/0.000 - 0.750/0.000
Cis 0.525/0.016 0.710/0.058 0.808/0.031

D.1.3 Inverse temperature

We have further conducted a sensitivity test to analyze the the inverse temperature (ΔΔ\Deltaroman_Δ). Table 10 and Table 11 summarizes the result of the given experiment. Based on the presented outcome, we found an influence of inverse temperature on the performance of all the drugs. Inverse temperature controls the weights associated with the drug embeddings ({\mathcal{R}}caligraphic_R). For 5-Fluorouracil and Temozolomide best AUROC and AUPRC scores are generated for Δ=10Δ10\Delta=10roman_Δ = 10, Cisplatin generates best results for Δ=2.5Δ2.5\Delta=2.5roman_Δ = 2.5 while for Gemcitabine, Sorafenib best AUPRC score were generated for Δ=0.001Δ0.001\Delta=0.001roman_Δ = 0.001 and AUROC score for 0.1 and 2.5 respectively.

Table 10: Sensitivity analysis of the Inverse Temperature (ΔΔ\Deltaroman_Δ) on AUROC scores.
Drug / Inv temp 0.01 0.1 1 2 2.5 10
Fu 0.487/0.094 0.483/0.231 0.500/0.079 0.486/0.031 0.478/0.091 0.687/0.026
Tem 0.492/0.07 0.534/0.103 0.366/0.016 0.445/0.035 0.489/0.071 0.694/0.017
Gem 0.465/0.035 0.617/0.011 0.390/0.015 0.324/0.02 0.330/0.014 0.365/0.016
Sor 0.483/0.141 0.456/0.127 0.473/0.045 0.638/0.06 0.638/0.051 0.586/0.027
Cis 0.526/0.057 0.558/0.120 0.541/0.047 0.502/0.043 0.791/0.033 0.485/0.15
Table 11: Sensitivity analysis of the Inverse Temperature (ΔΔ\Deltaroman_Δ) on AUPRC scores.
Drug / Inv temp 0.01 0.1 1 2 2.5 10
Fu 0.601/0.09 0.542/0.177 0.469/0.060 0.499/0.057 0.440/0.0491 0.688/0.042
Tem 0.561/0.079 0.533/0.098 0.416/0.017 0.490/0.019 0.486/0.066 0.725/0.013
Gem 0.670/0.086 0.481/0.072 0.572/0.078 0.424/0.027 0.402/0.115 0.431/0.01
Sor 0.724/0.084 0.609/0.162 0.479/0.024 0.553/0.083 0.526/0.042 0.554/0.021
Cis 0.603/0.054 0.598/0.120 0.664/0.034 0.626/0.047 0.808/0.031 0.501/0.021

Appendix E Analysis of different components

We have further analyzed the importance of different components in our method. For this, we have compared WISER with (1) WISER- w\o WS : A derivative of our work with only supervised discrete representation learning module and does not use weak supervision. (2) Code-AE : Code-AE is the closest baseline to our work which does not use weak supervision and supervised discrete representation learning. (3) Next we compare our method with other representation learning based methods without domain adaptation module i.e., Variational autoencoder (Kingma & Welling, 2013) (VAE), autoencoder (Hinton & Zemel, 1993) (AE). (4) Finally we compare our method with random forest a standard model without neural network (RF). As per the result WISER performs optimal for all the drugs on atleast one metric.

Table 12: Performance comparison of predicted patient response using AUROC and AUPRC metrics of our proposed method (WISER). Data related to clinical relapse is used for all the evaluations. The result is noted in the form of (mean / std) where the score has been obtained over five fold cross validation. On an average, our method outperforms others baselines on all the drugs for at least one metric. The best performer is highlighted in bold.
Methods 5-Fluorouracil Temozolomide Sorafenib Gemcitabine Cisplatin
AUROC AUPRC AUROC AUPRC AUROC AUPRC AUROC AUPRC AUROC AUPRC
WISER 0.715/0.036 0.741/0.023 0.760/0.006 0.786/0.019 0.727/0.007 0.728/0.024 0.649/0.037 0.752/0.002 0.851/0.007 0.861/0.020
WISER- w\o WS 0.687/0.029 0.688/0.047 0.694/0.020 0.725/0.014 0.683/0.021 0.725/0.094 0.617/0.013 0.752/0.005 0.792/0.038 0.808/0.035
CODE-AE 0.868/0.030 0.740/0.006 0.751/0.017 0.762/0.001 0.631/0.020 0.705/0.062 0.594/0.016 0.751/0.006 0.652/0.071 0.743/0.011
VAE 0.636/0.032 0.616/0.067 0.671/0.023 0.688/0.020 0.472/0.023 0.554/0.024 0.514/0.090 0.484/0.048 0.552/0.103 0.631/0.034
AE 0.636/0.019 0.576/0.046 0.659/0.048 0.610/0.030 0.528/0.061 0.597/0.101 0.553/0.029 0.553/0.050 0.623/0.042 0.607/0.067
RF 0.565/0.100 0.595/0.099 0.632/0.03 0.619/0.048 0.366/0.131 0.482/0.103 0.452/0.026 0.480/0.023 0.470/0.062 0.473/0.044

Appendix F Ablation for discrete representation

We have extended our analysis by performing ablation studies to assess the significance of the proposed loss functions used in training of domain invariant representation(Z). For this experiment we successively removed triplet loss 𝒍cnssubscript𝒍𝑐𝑛𝑠{\bm{l}}_{cns}bold_italic_l start_POSTSUBSCRIPT italic_c italic_n italic_s end_POSTSUBSCRIPT and and discrete representation loss 𝒍embedsubscript𝒍𝑒𝑚𝑏𝑒𝑑{\bm{l}}_{embed}bold_italic_l start_POSTSUBSCRIPT italic_e italic_m italic_b italic_e italic_d end_POSTSUBSCRIPT . The experiments were performed without weak supervision and subset selection strategy in the downstream drug response prediction task, so as to understand the effects of the loss terms on Z in isolation (indicated by ’Wiser(Z)’). The results (mean/std. over 5-fold cross validation) of this experiment are provided in Table 13.

Table 13: An ablation study to investigate the impact of different loss functions on AUROC and AUPRC metrics for discrete embedding. Data related to clinical relapse is used for all the evaluations. The result is noted in the form of (mean / std) where the score has been obtained over five fold cross validation. On an average, our method outperforms others baselines on all the drugs for at least one metric. The best performer is highlighted in bold.
Methods 5-Fluorouracil Temozolomide Sorafenib Gemcitabine Cisplatin
AUROC AUPRC AUROC AUPRC AUROC AUPRC AUROC AUPRC AUROC AUPRC
WISER(Z) 0.687/0.029 0.688/0.047 0.694/0.020 0.725/0.014 0.683/0.021 0.725/0.094 0.617/0.013 0.752/0.005 0.792/0.038 0.808/0.035
WISER(Z) w/o 𝒍cnssubscript𝒍𝑐𝑛𝑠{\bm{l}}_{cns}bold_italic_l start_POSTSUBSCRIPT italic_c italic_n italic_s end_POSTSUBSCRIPT 0.593/0.041 0.745/0.047 0.664/0.013 0.675/0.024 0.568/0.027 0.603/0.082 0.510/0.079 0.551/0.114 0.736/0.017 0.754/0.021
WISER(Z) w/o {𝒍cns,𝒍embedsubscript𝒍𝑐𝑛𝑠subscript𝒍𝑒𝑚𝑏𝑒𝑑{\bm{l}}_{cns},{\bm{l}}_{embed}bold_italic_l start_POSTSUBSCRIPT italic_c italic_n italic_s end_POSTSUBSCRIPT , bold_italic_l start_POSTSUBSCRIPT italic_e italic_m italic_b italic_e italic_d end_POSTSUBSCRIPT} 0.545/0.032 0.483/0.023 0.617/0.018 0.601/0.027 0.487/0.024 0.540/0.045 0.450/0.050 0.422/0.026 0.417/ 0.058 0.520/0.013

Appendix G Comparison with self supervised learning methods

We have further compared our method with other self supervised learning approaches. Based on prior literature (Alsaggaf et al., 2024), we use Gaussian noise based perturbation of the genomic samples for the domain-invariant representations. On the augmented data, we apply various SSL methods like SimCLR (Chen et al., 2020) and Barlow Twins (Zbontar et al., 2021) on CODE-AE. The results (mean/std. over 5-fold cross validation) is provided in Table 14.

Table 14: Performance comparison of predicted patient response using AUROC and AUPRC metrics of our proposed method (WISER) with other self supervised learning approaches. Data related to clinical relapse is used for all the evaluations. The result is noted in the form of (mean / std) where the score has been obtained over five fold cross validation. On an average, our method outperforms others baselines on all the drugs for at least one metric. The best performer is highlighted in bold.
Methods 5-Fluorouracil Temozolomide Sorafenib Gemcitabine Cisplatin
AUROC AUPRC AUROC AUPRC AUROC AUPRC AUROC AUPRC AUROC AUPRC
WISER 0.715/0.036 0.741/0.023 0.760/0.006 0.786/0.019 0.727/0.007 0.728/0.024 0.649/0.037 0.752/0.002 0.851/0.007 0.861/0.020
CODE-AE 0.868/0.030 0.740/0.006 0.751/0.017 0.762/0.001 0.631/0.020 0.705/0.062 0.594/0.016 0.751/0.006 0.652/0.071 0.743/0.011
CODE-AE + SIMCLR 0.663/0.051 0.699/0.129 0.707/0.007 0.733/0.024 0.479/0.024 0.610/0.027 0.490/0.029 0.609/0.027 0.469/0.05 0.518/0.070
CODE-AE + Barlow Twins 0.747/0.029 0.767/0.048 0.680/0.015 0.681/0.036 0.569/0.029 0.576/0.033 0.621/0.052 0.621/0.036 0.670/0.116 0.702/0.028

Appendix H Generalization on other drugs and datasets

To establish the generalizability of proposed method in two aspects - (1) on unknown drugs (unseen during representation learning) and (2) on a different dataset, we conducted similar experiments on the PDTC breast cancer dataset (Bruna et al., 2016) (32 samples per drug), on drugs unused in TCGA. The results(mean/std. over 5-fold cross validation) are shown in Table 15.

Table 15: Performance comparison of predicted response using AUROC and AUPRC metrics of our proposed method (WISER) on PDTC dataset. The result is noted in the form of (mean / std) where the score has been obtained over five fold cross validation. On an average, our method outperforms others baselines on two out of three drugs. The best performer is highlighted in bold.
Methods Az628 Gefitinib Axitinib
AUROC AUPRC AUROC AUPRC AUROC AUPRC
WISER 0.792/0.069 0.789/0.029 0.700/0.025 0.793/0.031 0.864/0.011 0.836/0.037
CODE-AE 0.754/0.097 0.679/0.148 0.613/0.037 0.778/0.009 0.840/0.049 0.762/0.033
Velodrome 0.513/0.015 0.625/0.064 0.446/0.091 0.495/0.052 0.786/0.041 0.841/0.014