Search | arXiv e-print repository

Unsupervised ensemble-based phenoty** helps enhance the discoverability of genes related to heart morphology

Authors: Rodrigo Bonazzola, Enzo Ferrante, Nishant Ravikumar, Yan Xia, Bernard Keavney, Sven Plein, Tanveer Syeda-Mahmood, Alejandro F Frangi

Abstract: Recent genome-wide association studies (GWAS) have been successful in identifying associations between genetic variants and simple cardiac parameters derived from cardiac magnetic resonance (CMR) images. However, the emergence of big databases including genetic data linked to CMR, facilitates investigation of more nuanced patterns of shape variability. Here, we propose a new framework for gene dis… ▽ More Recent genome-wide association studies (GWAS) have been successful in identifying associations between genetic variants and simple cardiac parameters derived from cardiac magnetic resonance (CMR) images. However, the emergence of big databases including genetic data linked to CMR, facilitates investigation of more nuanced patterns of shape variability. Here, we propose a new framework for gene discovery entitled Unsupervised Phenotype Ensembles (UPE). UPE builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner, using deep learning models trained with different hyperparameters. These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations across the ensemble. We apply our approach to the UK Biobank database to extract left-ventricular (LV) geometric features from image-derived three-dimensional meshes. We demonstrate that our approach greatly improves the discoverability of genes influencing LV shape, identifying 11 loci with study-wide significance and 8 with suggestive significance. We argue that our approach would enable more extensive discovery of gene associations with image-derived phenotypes for other organs or image modalities. △ Less

Submitted 7 January, 2023; originally announced January 2023.

Comments: 14 pages of main text, 22 pages of supplemental information

arXiv:2111.13987 [pdf, other]

Multi-modality fusion using canonical correlation analysis methods: Application in breast cancer survival prediction from histology and genomics

Authors: Vaishnavi Subramanian, Tanveer Syeda-Mahmood, Minh N. Do

Abstract: The availability of multi-modality datasets provides a unique opportunity to characterize the same object of interest using multiple viewpoints more comprehensively. In this work, we investigate the use of canonical correlation analysis (CCA) and penalized variants of CCA (pCCA) for the fusion of two modalities. We study a simple graphical model for the generation of two-modality data. We analytic… ▽ More The availability of multi-modality datasets provides a unique opportunity to characterize the same object of interest using multiple viewpoints more comprehensively. In this work, we investigate the use of canonical correlation analysis (CCA) and penalized variants of CCA (pCCA) for the fusion of two modalities. We study a simple graphical model for the generation of two-modality data. We analytically show that, with known model parameters, posterior mean estimators that jointly use both modalities outperform arbitrary linear mixing of single modality posterior estimators in latent variable prediction. Penalized extensions of CCA (pCCA) that incorporate domain knowledge can discover correlations with high-dimensional, low-sample data, whereas traditional CCA is inapplicable. To facilitate the generation of multi-dimensional embeddings with pCCA, we propose two matrix deflation schemes that enforce desirable properties exhibited by CCA. We propose a two-stage prediction pipeline using pCCA embeddings generated with deflation for latent variable prediction by combining all the above. On simulated data, our proposed model drastically reduces the mean-squared error in latent variable prediction. When applied to publicly available histopathology data and RNA-sequencing data from The Cancer Genome Atlas (TCGA) breast cancer patients, our model can outperform principal components analysis (PCA) embeddings of the same dimension in survival prediction. △ Less

Submitted 27 November, 2021; originally announced November 2021.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2103.05432 [pdf, other]

Multimodal fusion using sparse CCA for breast cancer survival prediction

Authors: Vaishnavi Subramanian, Tanveer Syeda-Mahmood, Minh N. Do

Abstract: Effective understanding of a disease such as cancer requires fusing multiple sources of information captured across physical scales by multimodal data. In this work, we propose a novel feature embedding module that derives from canonical correlation analyses to account for intra-modality and inter-modality correlations. Experiments on simulated and real data demonstrate how our proposed module can… ▽ More Effective understanding of a disease such as cancer requires fusing multiple sources of information captured across physical scales by multimodal data. In this work, we propose a novel feature embedding module that derives from canonical correlation analyses to account for intra-modality and inter-modality correlations. Experiments on simulated and real data demonstrate how our proposed module can learn well-correlated multi-dimensional embeddings. These embeddings perform competitively on one-year survival classification of TCGA-BRCA breast cancer patients, yielding average F1 scores up to 58.69% under 5-fold cross-validation. △ Less

Submitted 9 March, 2021; originally announced March 2021.

Comments: Accepted for poster presentation at International Symposium on Biomedical Imaging (ISBI) 2021. 4 pages, 1 figure, 4 tables

arXiv:2004.09673 [pdf, other]

Neural Network Segmentation of Cell Ultrastructure Using Incomplete Annotation

Authors: John Paul Francis, Hongzhi Wang, Kate White, Tanveer Syeda-Mahmood, Raymond Stevens

Abstract: The Pancreatic beta cell is an important target in diabetes research. For scalable modeling of beta cell ultrastructure, we investigate automatic segmentation of whole cell imaging data acquired through soft X-ray tomography. During the course of the study, both complete and partial ultrastructure annotations were produced manually for different subsets of the data. To more effectively use existin… ▽ More The Pancreatic beta cell is an important target in diabetes research. For scalable modeling of beta cell ultrastructure, we investigate automatic segmentation of whole cell imaging data acquired through soft X-ray tomography. During the course of the study, both complete and partial ultrastructure annotations were produced manually for different subsets of the data. To more effectively use existing annotations, we propose a method that enables the application of partially labeled data for full label segmentation. For experimental validation, we apply our method to train a convolutional neural network with a set of 12 fully annotated data and 12 partially annotated data and show promising improvement over standard training that uses fully annotated data alone. △ Less

Submitted 20 April, 2020; originally announced April 2020.

arXiv:2002.01982 [pdf, other]

Multimodal fusion of imaging and genomics for lung cancer recurrence prediction

Authors: Vaishnavi Subramanian, Minh N. Do, Tanveer Syeda-Mahmood

Abstract: Lung cancer has a high rate of recurrence in early-stage patients. Predicting the post-surgical recurrence in lung cancer patients has traditionally been approached using single modality information of genomics or radiology images. We investigate the potential of multimodal fusion for this task. By combining computed tomography (CT) images and genomics, we demonstrate improved prediction of recurr… ▽ More Lung cancer has a high rate of recurrence in early-stage patients. Predicting the post-surgical recurrence in lung cancer patients has traditionally been approached using single modality information of genomics or radiology images. We investigate the potential of multimodal fusion for this task. By combining computed tomography (CT) images and genomics, we demonstrate improved prediction of recurrence using linear Cox proportional hazards models with elastic net regularization. We work on a recent non-small cell lung cancer (NSCLC) radiogenomics dataset of 130 patients and observe an increase in concordance-index values of up to 10%. Employing non-linear methods from the neural network literature, such as multi-layer perceptrons and visual-question answering fusion modules, did not improve performance consistently. This indicates the need for larger multimodal datasets and fusion techniques better adapted to this biological setting. △ Less

Submitted 5 February, 2020; originally announced February 2020.

Comments: Accepted for presentation at International Symposium on Biomedical Imaging (ISBI) 2020 (Iowa City). 5 pages, last page references

Showing 1–5 of 5 results for author: Syeda-Mahmood, T