Search | arXiv e-print repository

Incorporating Prior Knowledge in Deep Learning Models via Pathway Activity Autoencoders

Authors: Pedro Henrique da Costa Avelar, Min Wu, Sophia Tsoka

Abstract: Motivation: Despite advances in the computational analysis of high-throughput molecular profiling assays (e.g. transcriptomics), a dichotomy exists between methods that are simple and interpretable, and ones that are complex but with lower degree of interpretability. Furthermore, very few methods deal with trying to translate interpretability in biologically relevant terms, such as known pathway c… ▽ More Motivation: Despite advances in the computational analysis of high-throughput molecular profiling assays (e.g. transcriptomics), a dichotomy exists between methods that are simple and interpretable, and ones that are complex but with lower degree of interpretability. Furthermore, very few methods deal with trying to translate interpretability in biologically relevant terms, such as known pathway cascades. Biological pathways reflecting signalling events or metabolic conversions are Small improvements or modifications of existing algorithms will generally not be suitable, unless novel biological results have been predicted and verified. Determining which pathways are implicated in disease and incorporating such pathway data as prior knowledge may enhance predictive modelling and personalised strategies for diagnosis, treatment and prevention of disease. Results: We propose a novel prior-knowledge-based deep auto-encoding framework, PAAE, together with its accompanying generative variant, PAVAE, for RNA-seq data in cancer. Through comprehensive comparisons among various learning models, we show that, despite having access to a smaller set of features, our PAAE and PAVAE models achieve better out-of-set reconstruction results compared to common methodologies. Furthermore, we compare our model with equivalent baselines on a classification task and show that they achieve better results than models which have access to the full input gene set. Another result is that using vanilla variational frameworks might negatively impact both reconstruction outputs as well as classification performance. Finally, our work directly contributes by providing comprehensive interpretability analyses on our models on top of improving prognostication for translational medicine. △ Less

Submitted 9 June, 2023; originally announced June 2023.

arXiv:2206.10699 [pdf, other]

Multi-Omic Data Integration and Feature Selection for Survival-based Patient Stratification via Supervised Concrete Autoencoders

Authors: Pedro Henrique da Costa Avelar, Roman Laddach, Sophia Karagiannis, Min Wu, Sophia Tsoka

Abstract: Cancer is a complex disease with significant social and economic impact. Advancements in high-throughput molecular assays and the reduced cost for performing high-quality multi-omics measurements have fuelled insights through machine learning . Previous studies have shown promise on using multiple omic layers to predict survival and stratify cancer patients. In this paper, we developed a Supervise… ▽ More Cancer is a complex disease with significant social and economic impact. Advancements in high-throughput molecular assays and the reduced cost for performing high-quality multi-omics measurements have fuelled insights through machine learning . Previous studies have shown promise on using multiple omic layers to predict survival and stratify cancer patients. In this paper, we developed a Supervised Autoencoder (SAE) model for survival-based multi-omic integration which improves upon previous work, and report a Concrete Supervised Autoencoder model (CSAE), which uses feature selection to jointly reconstruct the input features as well as predict survival. Our experiments show that our models outperform or are on par with some of the most commonly used baselines, while either providing a better survival separation (SAE) or being more interpretable (CSAE). We also perform a feature selection stability analysis on our models and notice that there is a power-law relationship with features which are commonly associated with survival. The code for this project is available at: https://github.com/phcavelar/coxae △ Less

Submitted 27 June, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

Comments: Accepted for publication at LOD2022

arXiv:2104.01133 [pdf, other]

Weekly sequential Bayesian updating improves prediction of deaths at an early epidemic stage

Authors: Pedro Henrique da Costa Avelar, Natalia Del Coco, Luis C. Lamb, Sophia Tsoka, Jonathan Cardoso-Silva

Abstract: Background: Following the outbreak of the coronavirus epidemic in early 2020, municipalities, regional governments and policymakers worldwide had to plan their Non-Pharmaceutical Interventions (NPIs) amidst a scenario of great uncertainty. At this early stage of an epidemic, where no vaccine or medical treatment is in sight, algorithmic prediction can become a powerful tool to inform local policym… ▽ More Background: Following the outbreak of the coronavirus epidemic in early 2020, municipalities, regional governments and policymakers worldwide had to plan their Non-Pharmaceutical Interventions (NPIs) amidst a scenario of great uncertainty. At this early stage of an epidemic, where no vaccine or medical treatment is in sight, algorithmic prediction can become a powerful tool to inform local policymaking. However, when we replicated one prominent epidemiological model to inform health authorities in a region in the south of Brazil, we found that this model relied too heavily on manually predetermined covariates and was too reactive to changes in data trends. Methods: Our four proposed variations of the original method allow accessing data of daily reported infections and take into account the under-reporting of cases more explicitly. Two of the proposed versions also attempt to model the delay in test reporting. We simulated weekly forecasting of deaths from the period from 31/05/2020 until 31/01/2021. That workflow allowed us to run a lighter version of the model after the first calibration week. Google Mobility data, weekly updated, were used as covariates to the model at each simulated run. Findings: The changes made the model significantly less reactive and more rapid in adapting to scenarios after a peak in deaths is observed. Assuming that reported cases were under-reported greatly benefited the model in its stability, and modelling retroactively-added data (due to the "hot" nature of the data used) had a negligible impact on performance. Interpretation: Although not as reliable as death statistics, case statistics, when modelled in conjunction with an "overestimate" parameter, provide a good alternative for improving the forecasting of models, especially in long-range predictions and after the peak of an infection wave. △ Less

Submitted 16 June, 2022; v1 submitted 2 April, 2021; originally announced April 2021.

Showing 1–3 of 3 results for author: Avelar, P H d C