-
Multimodal Data Integration for Precision Oncology: Challenges and Future Directions
Authors:
Huajun Zhou,
Fengtao Zhou,
Chenyu Zhao,
Yingxue Xu,
Luyang Luo,
Hao Chen
Abstract:
The essence of precision oncology lies in its commitment to tailor targeted treatments and care measures to each patient based on the individual characteristics of the tumor. The inherent heterogeneity of tumors necessitates gathering information from diverse data sources to provide valuable insights from various perspectives, fostering a holistic comprehension of the tumor. Over the past decade,…
▽ More
The essence of precision oncology lies in its commitment to tailor targeted treatments and care measures to each patient based on the individual characteristics of the tumor. The inherent heterogeneity of tumors necessitates gathering information from diverse data sources to provide valuable insights from various perspectives, fostering a holistic comprehension of the tumor. Over the past decade, multimodal data integration technology for precision oncology has made significant strides, showcasing remarkable progress in understanding the intricate details within heterogeneous data modalities. These strides have exhibited tremendous potential for improving clinical decision-making and model interpretation, contributing to the advancement of cancer care and treatment. Given the rapid progress that has been achieved, we provide a comprehensive overview of about 300 papers detailing cutting-edge multimodal data integration techniques in precision oncology. In addition, we conclude the primary clinical applications that have reaped significant benefits, including early assessment, diagnosis, prognosis, and biomarker discovery. Finally, derived from the findings of this survey, we present an in-depth analysis that explores the pivotal challenges and reveals essential pathways for future research in the field of multimodal data integration for precision oncology.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Quantifying Heterogeneous Ecosystem Services With Multi-Label Soft Classification
Authors:
Zhihui Tian,
John Upchurch,
G. Austin Simon,
José Dubeux,
Alina Zare,
Chang Zhao,
Joel B. Harley
Abstract:
Understanding and quantifying ecosystem services are crucial for sustainable environmental management, conservation efforts, and policy-making. The advancement of remote sensing technology and machine learning techniques has greatly facilitated this process. Yet, ground truth labels, such as biodiversity, are very difficult and expensive to measure. In addition, more easily obtainable proxy labels…
▽ More
Understanding and quantifying ecosystem services are crucial for sustainable environmental management, conservation efforts, and policy-making. The advancement of remote sensing technology and machine learning techniques has greatly facilitated this process. Yet, ground truth labels, such as biodiversity, are very difficult and expensive to measure. In addition, more easily obtainable proxy labels, such as land use, often fail to capture the complex heterogeneity of the ecosystem. In this paper, we demonstrate how land use proxy labels can be implemented with a soft, multi-label classifier to predict ecosystem services with complex heterogeneity.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Emergence of cooperation under punishment: A reinforcement learning perspective
Authors:
Chenyang Zhao,
Guozhong Zheng,
Chun Zhang,
Jiqiang Zhang,
Li Chen
Abstract:
Punishment is a common tactic to sustain cooperation and has been extensively studied for a long time. While most of previous game-theoretic work adopt the imitation learning where players imitate the strategies who are better off, the learning logic in the real world is often much more complex. In this work, we turn to the reinforcement learning paradigm, where individuals make their decisions ba…
▽ More
Punishment is a common tactic to sustain cooperation and has been extensively studied for a long time. While most of previous game-theoretic work adopt the imitation learning where players imitate the strategies who are better off, the learning logic in the real world is often much more complex. In this work, we turn to the reinforcement learning paradigm, where individuals make their decisions based upon their past experience and long-term returns. Specifically, we investigate the Prisoners' dilemma game with Q-learning algorithm, and cooperators probabilistically pose punishment on defectors in their neighborhood. Interestingly, we find that punishment could lead to either continuous or discontinuous cooperation phase transitions, and the nucleation process of cooperation clusters is reminiscent of the liquid-gas transition. The uncovered first-order phase transition indicates that great care needs to be taken when implementing the punishment compared to the continuous scenario.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
A Robust Deep Learning Method with Uncertainty Estimation for the Pathological Classification of Renal Cell Carcinoma based on CT Images
Authors:
Ni Yao,
Hang Hu,
Kaicong Chen,
Chen Zhao,
Yuan Guo,
Boya Li,
Jiaofen Nan,
Yanting Li,
Chuang Han,
Fubao Zhu,
Weihua Zhou,
Li Tian
Abstract:
Objectives To develop and validate a deep learning-based diagnostic model incorporating uncertainty estimation so as to facilitate radiologists in the preoperative differentiation of the pathological subtypes of renal cell carcinoma (RCC) based on CT images. Methods Data from 668 consecutive patients, pathologically proven RCC, were retrospectively collected from Center 1. By using five-fold cross…
▽ More
Objectives To develop and validate a deep learning-based diagnostic model incorporating uncertainty estimation so as to facilitate radiologists in the preoperative differentiation of the pathological subtypes of renal cell carcinoma (RCC) based on CT images. Methods Data from 668 consecutive patients, pathologically proven RCC, were retrospectively collected from Center 1. By using five-fold cross-validation, a deep learning model incorporating uncertainty estimation was developed to classify RCC subtypes into clear cell RCC (ccRCC), papillary RCC (pRCC), and chromophobe RCC (chRCC). An external validation set of 78 patients from Center 2 further evaluated the model's performance. Results In the five-fold cross-validation, the model's area under the receiver operating characteristic curve (AUC) for the classification of ccRCC, pRCC, and chRCC was 0.868 (95% CI: 0.826-0.923), 0.846 (95% CI: 0.812-0.886), and 0.839 (95% CI: 0.802-0.88), respectively. In the external validation set, the AUCs were 0.856 (95% CI: 0.838-0.882), 0.787 (95% CI: 0.757-0.818), and 0.793 (95% CI: 0.758-0.831) for ccRCC, pRCC, and chRCC, respectively. Conclusions The developed deep learning model demonstrated robust performance in predicting the pathological subtypes of RCC, while the incorporated uncertainty emphasized the importance of understanding model confidence, which is crucial for assisting clinical decision-making for patients with renal tumors. Clinical relevance statement Our deep learning approach, integrated with uncertainty estimation, offers clinicians a dual advantage: accurate RCC subtype predictions complemented by diagnostic confidence references, promoting informed decision-making for patients with RCC.
△ Less
Submitted 12 November, 2023; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics
Authors:
Chen Zhao,
Kuan-Jui Su,
Chong Wu,
Xuewei Cao,
Qiuying Sha,
Wu Li,
Zhe Luo,
Tian Qin,
Chuan Qiu,
Lan Juan Zhao,
Anqi Liu,
Lindong Jiang,
Xiao Zhang,
Hui Shen,
Weihua Zhou,
Hong-Wen Deng
Abstract:
Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information f…
▽ More
Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-view variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information. Results: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved R^2-scores > 0.01 for 71.55% of metabolites. Conclusion: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research.
△ Less
Submitted 12 March, 2024; v1 submitted 11 October, 2023;
originally announced October 2023.
-
CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data
Authors:
Chen Zhao,
Anqi Liu,
Xiao Zhang,
Xuewei Cao,
Zhengming Ding,
Qiuying Sha,
Hui Shen,
Hong-Wen Deng,
Weihua Zhou
Abstract:
Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding genetic data. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when…
▽ More
Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding genetic data. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning, which is used to maximize the mutual information between different types of omics, is employed before latent feature concatenation. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicated that the proposed CLCLSA outperformed the state-of-the-art approaches for multi-omics data classification using incomplete multi-omics data.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Multi-view information fusion using multi-view variational autoencoders to predict proximal femoral strength
Authors:
Chen Zhao,
Joyce H Keyak,
Xuewei Cao,
Qiuying Sha,
Li Wu,
Zhe Luo,
Lanjuan Zhao,
Qing Tian,
Chuan Qiu,
Ray Su,
Hui Shen,
Hong-Wen Deng,
Weihua Zhou
Abstract:
The aim of this paper is to design a deep learning-based model to predict proximal femoral strength using multi-view information fusion. Method: We developed new models using multi-view variational autoencoder (MVAE) for feature representation learning and a product of expert (PoE) model for multi-view information fusion. We applied the proposed models to an in-house Louisiana Osteoporosis Study (…
▽ More
The aim of this paper is to design a deep learning-based model to predict proximal femoral strength using multi-view information fusion. Method: We developed new models using multi-view variational autoencoder (MVAE) for feature representation learning and a product of expert (PoE) model for multi-view information fusion. We applied the proposed models to an in-house Louisiana Osteoporosis Study (LOS) cohort with 931 male subjects, including 345 African Americans and 586 Caucasians. With an analytical solution of the product of Gaussian distribution, we adopted variational inference to train the designed MVAE-PoE model to perform common latent feature extraction. We performed genome-wide association studies (GWAS) to select 256 genetic variants with the lowest p-values for each proximal femoral strength and integrated whole genome sequence (WGS) features and DXA-derived imaging features to predict proximal femoral strength. Results: The best prediction model for fall fracture load was acquired by integrating WGS features and DXA-derived imaging features. The designed models achieved the mean absolute percentage error of 18.04%, 6.84% and 7.95% for predicting proximal femoral fracture loads using linear models of fall loading, nonlinear models of fall loading, and nonlinear models of stance loading, respectively. Compared to existing multi-view information fusion methods, the proposed MVAE-PoE achieved the best performance. Conclusion: The proposed models are capable of predicting proximal femoral strength using WGS features and DXA-derived imaging features. Though this tool is not a substitute for FEA using QCT images, it would make improved assessment of hip fracture risk more widely available while avoiding the increased radiation dosage and clinical costs from QCT.
△ Less
Submitted 27 March, 2023; v1 submitted 2 October, 2022;
originally announced October 2022.
-
Active information, missing data and prevalence estimation
Authors:
Ola Hössjer,
Daniel Andrés Díaz-Pachón,
Chen Zhao,
J. Sunil Rao
Abstract:
The topic of this paper is prevalence estimation from the perspective of active information. Prevalence among tested individuals has an upward bias under the assumption that individuals' willingness to be tested for the disease increases with the strength of their symptoms. Active information due to testing bias quantifies the degree at which the willingness to be tested correlates with infection…
▽ More
The topic of this paper is prevalence estimation from the perspective of active information. Prevalence among tested individuals has an upward bias under the assumption that individuals' willingness to be tested for the disease increases with the strength of their symptoms. Active information due to testing bias quantifies the degree at which the willingness to be tested correlates with infection status. Interpreting incomplete testing as a missing data problem, the missingness mechanism impacts the degree at which the bias of the original prevalence estimate can be removed. The reduction in prevalence, when testing bias is adjusted for, translates into an active information due to bias correction, with opposite sign to active information due to testing bias. Prevalence and active information estimates are asymptotically normal, a behavior also illustrated through simulations.
△ Less
Submitted 10 June, 2022;
originally announced June 2022.
-
Multi-View Broad Learning System for Primate Oculomotor Decision Decoding
Authors:
Zhenhua Shi,
Xiaomo Chen,
Changming Zhao,
He He,
Veit Stuphorn,
Dongrui Wu
Abstract:
Multi-view learning improves the learning performance by utilizing multi-view data: data collected from multiple sources, or feature sets extracted from the same data source. This approach is suitable for primate brain state decoding using cortical neural signals. This is because the complementary components of simultaneously recorded neural signals, local field potentials (LFPs) and action potent…
▽ More
Multi-view learning improves the learning performance by utilizing multi-view data: data collected from multiple sources, or feature sets extracted from the same data source. This approach is suitable for primate brain state decoding using cortical neural signals. This is because the complementary components of simultaneously recorded neural signals, local field potentials (LFPs) and action potentials (spikes), can be treated as two views. In this paper, we extended broad learning system (BLS), a recently proposed wide neural network architecture, from single-view learning to multi-view learning, and validated its performance in decoding monkeys' oculomotor decision from medial frontal LFPs and spikes. We demonstrated that medial frontal LFPs and spikes in non-human primate do contain complementary information about the oculomotor decision, and that the proposed multi-view BLS is a more effective approach for decoding the oculomotor decision than several classical and state-of-the-art single-view and multi-view learning approaches.
△ Less
Submitted 2 July, 2020; v1 submitted 16 August, 2019;
originally announced August 2019.
-
Sequential Bayesian Detection of Spike Activities from Fluorescence Observations
Authors:
Zhuangkun Wei,
Bin Li,
Weisi Guo,
Wenxiu Hu,
Chenglin Zhao
Abstract:
Extracting and detecting spike activities from the fluorescence observations is an important step in understanding how neuron systems work. The main challenge lies in that the combination of the ambient noise with dynamic baseline fluctuation, often contaminates the observations, thereby deteriorating the reliability of spike detection. This may be even worse in the face of the nonlinear biologica…
▽ More
Extracting and detecting spike activities from the fluorescence observations is an important step in understanding how neuron systems work. The main challenge lies in that the combination of the ambient noise with dynamic baseline fluctuation, often contaminates the observations, thereby deteriorating the reliability of spike detection. This may be even worse in the face of the nonlinear biological process, the coupling interactions between spikes and baseline, and the unknown critical parameters of an underlying physiological model, in which erroneous estimations of parameters will affect the detection of spikes causing further error propagation. In this paper, we propose a random finite set (RFS) based Bayesian approach. The dynamic behaviors of spike sequence, fluctuated baseline and unknown parameters are formulated as one RFS. This RFS state is capable of distinguishing the hidden active/silent states induced by spike and non-spike activities respectively, thereby \emph{negating the interaction role} played by spikes and other factors. Then, premised on the RFS states, a Bayesian inference scheme is designed to simultaneously estimate the model parameters, baseline, and crucial spike activities. Our results demonstrate that the proposed scheme can gain an extra $12\%$ detection accuracy in comparison with the state-of-the-art MLSpike method.
△ Less
Submitted 31 January, 2019;
originally announced January 2019.
-
Probing single protein dynamics on liposome surfaces
Authors:
Dong-Fei Ma,
Chun-Hua Xu,
Wen-Qing Hou,
Chun-Yu Zhao,
Lu Ma,
Cong Liu,
Jiajie Diao,
Ying Lu,
Ming Li
Abstract:
It is crucial to measure position and conformational changes of a membrane-interacting protein relative to the membrane surface. This is however challenging because the thickness of a membrane is usually only about 4 nm. We developed a fluorescence method which makes use of the principle of FRET between a fluorophore and a cloud of quenchers encapsulated in a liposome, hence the name LipoFRET. Lip…
▽ More
It is crucial to measure position and conformational changes of a membrane-interacting protein relative to the membrane surface. This is however challenging because the thickness of a membrane is usually only about 4 nm. We developed a fluorescence method which makes use of the principle of FRET between a fluorophore and a cloud of quenchers encapsulated in a liposome, hence the name LipoFRET. LipoFRET can readily locate a fluorophore in different depths inside and at different heights above the membrane. We applied LipoFRET to study α-synuclein, a key player in the pathology of Parkinson's disease. Our approach yielded quantita-tive information about the dynamics of different regions of α-syn in lipid membranes, which has never been explored before.
△ Less
Submitted 28 May, 2018;
originally announced May 2018.
-
Quantum transport senses community structure in networks
Authors:
Chenchao Zhao,
Jun S. Song
Abstract:
Quantum time evolution exhibits rich physics, attributable to the interplay between the density and phase of a wave function. However, unlike classical heat diffusion, the wave nature of quantum mechanics has not yet been extensively explored in modern data analysis. We propose that the Laplace transform of quantum transport (QT) can be used to construct an ensemble of maps from a given complex ne…
▽ More
Quantum time evolution exhibits rich physics, attributable to the interplay between the density and phase of a wave function. However, unlike classical heat diffusion, the wave nature of quantum mechanics has not yet been extensively explored in modern data analysis. We propose that the Laplace transform of quantum transport (QT) can be used to construct an ensemble of maps from a given complex network to a circle $S^1$, such that closely-related nodes on the network are grouped into sharply concentrated clusters on $S^1$. The resulting QT clustering (QTC) algorithm is as powerful as the state-of-the-art spectral clustering in discerning complex geometric patterns and more robust when clusters show strong density variations or heterogeneity in size. The observed phenomenon of QTC can be interpreted as a collective behavior of the microscopic nodes that evolve as macroscopic cluster orbitals in an effective tight-binding model recapitulating the network. Python source code implementing the algorithm and examples are available at https://github.com/jssong-lab/QTC.
△ Less
Submitted 12 January, 2018; v1 submitted 14 November, 2017;
originally announced November 2017.
-
Exact heat kernel on a hypersphere and its applications in kernel SVM
Authors:
Chenchao Zhao,
Jun S. Song
Abstract:
Many contemporary statistical learning methods assume a Euclidean feature space. This paper presents a method for defining similarity based on hyperspherical geometry and shows that it often improves the performance of support vector machine compared to other competing similarity measures. Specifically, the idea of using heat diffusion on a hypersphere to measure similarity has been previously pro…
▽ More
Many contemporary statistical learning methods assume a Euclidean feature space. This paper presents a method for defining similarity based on hyperspherical geometry and shows that it often improves the performance of support vector machine compared to other competing similarity measures. Specifically, the idea of using heat diffusion on a hypersphere to measure similarity has been previously proposed, demonstrating promising results based on a heuristic heat kernel obtained from the zeroth order parametrix expansion; however, how well this heuristic kernel agrees with the exact hyperspherical heat kernel remains unknown. This paper presents a higher order parametrix expansion of the heat kernel on a unit hypersphere and discusses several problems associated with this expansion method. We then compare the heuristic kernel with an exact form of the heat kernel expressed in terms of a uniformly and absolutely convergent series in high-dimensional angular momentum eigenmodes. Being a natural measure of similarity between sample points dwelling on a hypersphere, the exact kernel often shows superior performance in kernel SVM classifications applied to text mining, tumor somatic mutation imputation, and stock market analysis.
△ Less
Submitted 19 November, 2017; v1 submitted 4 February, 2017;
originally announced February 2017.
-
PDBCirclePlot: A Novel Visualization Method for Protein Structures
Authors:
Francis Bell,
Chunyu Zhao,
Ahmet Sacan
Abstract:
Interactive molecular graphics applications facilitate analysis of three dimensional protein structures. Naturally, non-interactive 2-D snapshots of the protein structures do not convey the same level of geometric detail. Several 2-D visualization methods have been in use to summarize structural information, including contact maps and 2-D cartoon views. We present a new approach for 2-D visualizat…
▽ More
Interactive molecular graphics applications facilitate analysis of three dimensional protein structures. Naturally, non-interactive 2-D snapshots of the protein structures do not convey the same level of geometric detail. Several 2-D visualization methods have been in use to summarize structural information, including contact maps and 2-D cartoon views. We present a new approach for 2-D visualization of protein structures where amino acid residues are displayed on a circle and spatially close residues are depicted by links. Furthermore, residue-specific properties, such as conservation, accessibility, temperature factor, can be displayed as plots on the same circular view.
△ Less
Submitted 20 February, 2014;
originally announced February 2014.