-
Tutorial on survival modeling with applications to omics data
Authors:
Zhi Zhao,
John Zobolas,
Manuela Zucknick,
Tero Aittokallio
Abstract:
Motivation: Identification of genomic, molecular and clinical markers prognostic of patient survival is important for develo** personalized disease prevention, diagnostic and treatment approaches. Modern omics technologies have made it possible to investigate the prognostic impact of markers at multiple molecular levels, including genomics, epigenomics, transcriptomics, proteomics and metabolomi…
▽ More
Motivation: Identification of genomic, molecular and clinical markers prognostic of patient survival is important for develo** personalized disease prevention, diagnostic and treatment approaches. Modern omics technologies have made it possible to investigate the prognostic impact of markers at multiple molecular levels, including genomics, epigenomics, transcriptomics, proteomics and metabolomics, and how these potential risk factors complement clinical characterization of patient outcomes for survival prognosis. However, the massive sizes of the omics data sets, along with their correlation structures, pose challenges for studying relationships between the molecular information and patients' survival outcomes. Results: We present a general workflow for survival analysis that is applicable to high-dimensional omics data as inputs when identifying survival-associated features and validating survival models. In particular, we focus on the commonly used Cox-type penalized regressions and hierarchical Bayesian models for feature selection in survival analysis, which are are especially useful for high-dimensional data, but the framework is applicable more generally. Availability and implementation: A step-by-step R tutorial using The Cancer Genome Atlas survival and omics data for the execution and evaluation of survival models has been made available at https://ocbe-uio.github.io/survomics/survomics.html.
△ Less
Submitted 4 March, 2024; v1 submitted 24 February, 2023;
originally announced February 2023.
-
Evaluation of statistical approaches for association testing in noisy drug screening data
Authors:
Petr Smirnov,
Ian Smith,
Zhaleh Safikhani,
Wail Ba-alawi,
Farnoosh Khodakarami,
Eva Lin,
Yihong Yu,
Scott Martin,
Janosch Ortmann,
Tero Aittokallio,
Marc Hafner,
Benjamin Haibe-Kains
Abstract:
dentifying associations among biological variables is a major challenge in modern quantitative biological research, particularly given the systemic and statistical noise endemic to biological systems. Drug sensitivity data has proven to be a particularly challenging field for identifying associations to inform patient treatment. To address this, we introduce two semi-parametric variations on the c…
▽ More
dentifying associations among biological variables is a major challenge in modern quantitative biological research, particularly given the systemic and statistical noise endemic to biological systems. Drug sensitivity data has proven to be a particularly challenging field for identifying associations to inform patient treatment. To address this, we introduce two semi-parametric variations on the commonly used concordance index: the robust concordance index and the kernelized concordance index (rCI, kCI), which incorporate measurements about the noise distribution from the data. We demonstrate that common statistical tests applied to the concordance index and its variations fail to control for false positives, and introduce efficient implementations to compute p-values using adaptive permutation testing. We then evaluate the statistical power of these coefficients under simulation and compare with Pearson and Spearman correlation coefficients. Finally, we evaluate the various statistics in matching drugs across pharmacogenomic datasets. We observe that the rCI and kCI are better powered than the concordance index in simulation and show some improvement on real data. Surprisingly, we observe that the Pearson correlation was the most robust to measurement noise among the different metrics.
△ Less
Submitted 30 September, 2021; v1 submitted 28 April, 2021;
originally announced April 2021.
-
Characterizing the Quality of Insight by Interactions: A Case Study
Authors:
Chen He,
Luana Micallef,
Liye He,
Gopal Peddinti,
Tero Aittokallio,
Giulio Jacucci
Abstract:
Understanding the quality of insight has become increasingly important with the trend of allowing users to post comments during visual exploration, yet approaches for qualifying insight are rare. This paper presents a case study to investigate the possibility of characterizing the quality of insight via the interactions performed. To do this, we devised the interaction of a visualization tool-Medi…
▽ More
Understanding the quality of insight has become increasingly important with the trend of allowing users to post comments during visual exploration, yet approaches for qualifying insight are rare. This paper presents a case study to investigate the possibility of characterizing the quality of insight via the interactions performed. To do this, we devised the interaction of a visualization tool-MediSyn-for insight generation. MediSyn supports five types of interactions: selecting, connecting, elaborating, exploring, and sharing. We evaluated MediSyn with 14 participants by allowing them to freely explore the data and generate insights. We then extracted seven interaction patterns from their interaction logs and correlated the patterns to four aspects of insight quality. The results show the possibility of qualifying insights via interactions. Among other findings, exploration actions can lead to unexpected insights; the drill-down pattern tends to increase the domain values of insights. A qualitative analysis shows that using domain knowledge to guide exploration can positively affect the domain value of derived insights. We discuss the study's implications, lessons learned, and future research opportunities.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
Drug response prediction by inferring pathway-response associations with Kernelized Bayesian Matrix Factorization
Authors:
Muhammad Ammad-ud-din,
Suleiman A. Khan,
Disha Malani,
Astrid Murumägi,
Olli Kallioniemi,
Tero Aittokallio,
Samuel Kaski
Abstract:
A key goal of computational personalized medicine is to systematically utilize genomic and other molecular features of samples to predict drug responses for a previously unseen sample. Such predictions are valuable for develo** hypotheses for selecting therapies tailored for individual patients. This is especially valuable in oncology, where molecular and genetic heterogeneity of the cells has a…
▽ More
A key goal of computational personalized medicine is to systematically utilize genomic and other molecular features of samples to predict drug responses for a previously unseen sample. Such predictions are valuable for develo** hypotheses for selecting therapies tailored for individual patients. This is especially valuable in oncology, where molecular and genetic heterogeneity of the cells has a major impact on the response. However, the prediction task is extremely challenging, raising the need for methods that can effectively model and predict drug responses. In this study, we propose a novel formulation of multi-task matrix factorization that allows selective data integration for predicting drug responses. To solve the modeling task, we extend the state-of-the-art kernelized Bayesian matrix factorization (KBMF) method with component-wise multiple kernel learning. In addition, our approach exploits the known pathway information in a novel and biologically meaningful fashion to learn the drug response associations. Our method quantitatively outperforms the state of the art on predicting drug responses in two publicly available cancer data sets as well as on a synthetic data set. In addition, we validated our model predictions with lab experiments using an in-house cancer cell line panel. We finally show the practical applicability of the proposed method by utilizing prior knowledge to infer pathway-drug response associations, opening up the opportunity for elucidating drug action mechanisms. We demonstrate that pathway-response associations can be learned by the proposed model for the well known EGFR and MEK inhibitors.
△ Less
Submitted 11 June, 2016;
originally announced June 2016.
-
A two-step learning approach for solving full and almost full cold start problems in dyadic prediction
Authors:
Tapio Pahikkala,
Michiel Stock,
Antti Airola,
Tero Aittokallio,
Bernard De Baets,
Willem Waegeman
Abstract:
Dyadic prediction methods operate on pairs of objects (dyads), aiming to infer labels for out-of-sample dyads. We consider the full and almost full cold start problem in dyadic prediction, a setting that occurs when both objects in an out-of-sample dyad have not been observed during training, or if one of them has been observed, but very few times. A popular approach for addressing this problem is…
▽ More
Dyadic prediction methods operate on pairs of objects (dyads), aiming to infer labels for out-of-sample dyads. We consider the full and almost full cold start problem in dyadic prediction, a setting that occurs when both objects in an out-of-sample dyad have not been observed during training, or if one of them has been observed, but very few times. A popular approach for addressing this problem is to train a model that makes predictions based on a pairwise feature representation of the dyads, or, in case of kernel methods, based on a tensor product pairwise kernel. As an alternative to such a kernel approach, we introduce a novel two-step learning algorithm that borrows ideas from the fields of pairwise learning and spectral filtering. We show theoretically that the two-step method is very closely related to the tensor product kernel approach, and experimentally that it yields a slightly better predictive performance. Moreover, unlike existing tensor product kernel methods, the two-step method allows closed-form solutions for training and parameter selection via cross-validation estimates both in the full and almost full cold start settings, making the approach much more efficient and straightforward to implement.
△ Less
Submitted 17 May, 2014;
originally announced May 2014.
-
RPA: Probabilistic analysis of probe performance and robust summarization
Authors:
Leo Lahti,
Laura L. Elo,
Tero Aittokallio,
Samuel Kaski
Abstract:
Probe-level models have led to improved performance in microarray studies but the various sources of probe-level contamination are still poorly understood. Data-driven analysis of probe performance can be used to quantify the uncertainty in individual probes and to highlight the relative contribution of different noise sources. Improved understanding of the probe-level effects can lead to improved…
▽ More
Probe-level models have led to improved performance in microarray studies but the various sources of probe-level contamination are still poorly understood. Data-driven analysis of probe performance can be used to quantify the uncertainty in individual probes and to highlight the relative contribution of different noise sources. Improved understanding of the probe-level effects can lead to improved preprocessing techniques and microarray design.
We have implemented probabilistic tools for probe performance analysis and summarization on short oligonucleotide arrays. In contrast to standard preprocessing approaches, the methods provide quantitative estimates of probe-specific noise and affinity terms and tools to investigate these parameters. Tools to incorporate prior information of the probes in the analysis are provided as well. Comparisons to known probe-level error sources and spike-in data sets validate the approach.
Implementation is freely available in R/BioConductor: http://www.bioconductor.org/packages/release/bioc/html/RPA.html
△ Less
Submitted 6 April, 2013; v1 submitted 22 September, 2011;
originally announced September 2011.
-
Improving the false nearest neighbors method with graphical analysis
Authors:
T. Aittokallio,
M. Gyllenberg,
J. Hietarinta,
T. Kuusela,
T. Multamaki
Abstract:
We introduce a graphical presentation for the false nearest neighbors (FNN) method. In the original method only the percentage of false neighbors is computed without regard to the distribution of neighboring points in the time-delay coordinates. With this new presentation it is much easier to distinguish deterministic chaos from noise. The graphical approach also serves as a tool to determine be…
▽ More
We introduce a graphical presentation for the false nearest neighbors (FNN) method. In the original method only the percentage of false neighbors is computed without regard to the distribution of neighboring points in the time-delay coordinates. With this new presentation it is much easier to distinguish deterministic chaos from noise. The graphical approach also serves as a tool to determine better conditions for detecting low dimensional chaos, and to get a better understanding on the applicability of the FNN method.
△ Less
Submitted 13 December, 1998;
originally announced December 1998.