Search | arXiv e-print repository

Missing Data Imputation and Acquisition with Deep Hierarchical Models and Hamiltonian Monte Carlo

Authors: Ignacio Peis, Chao Ma, José Miguel Hernández-Lobato

Abstract: Variational Autoencoders (VAEs) have recently been highly successful at imputing and acquiring heterogeneous missing data. However, within this specific application domain, existing VAE methods are restricted by using only one layer of latent variables and strictly Gaussian posterior approximations. To address these limitations, we present HH-VAEM, a Hierarchical VAE model for mixed-type incomplet… ▽ More Variational Autoencoders (VAEs) have recently been highly successful at imputing and acquiring heterogeneous missing data. However, within this specific application domain, existing VAE methods are restricted by using only one layer of latent variables and strictly Gaussian posterior approximations. To address these limitations, we present HH-VAEM, a Hierarchical VAE model for mixed-type incomplete data that uses Hamiltonian Monte Carlo with automatic hyper-parameter tuning for improved approximate inference. Our experiments show that HH-VAEM outperforms existing baselines in the tasks of missing data imputation and supervised learning with missing features. Finally, we also present a sampling-based approach for efficiently computing the information gain when missing features are to be acquired with HH-VAEM. Our experiments show that this sampling-based approach is superior to alternatives based on Gaussian approximations. △ Less

Submitted 22 December, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

Comments: Published at NeurIPS 2022

arXiv:2012.08234 [pdf, other]

Unsupervised Learning of Global Factors in Deep Generative Models

Authors: Ignacio Peis, Pablo M. Olmos, Antonio Artés-Rodríguez

Abstract: We present a novel deep generative model based on non i.i.d. variational autoencoders that captures global dependencies among observations in a fully unsupervised fashion. In contrast to the recent semi-supervised alternatives for global modeling in deep generative models, our approach combines a mixture model in the local or data-dependent space and a global Gaussian latent variable, which lead u… ▽ More We present a novel deep generative model based on non i.i.d. variational autoencoders that captures global dependencies among observations in a fully unsupervised fashion. In contrast to the recent semi-supervised alternatives for global modeling in deep generative models, our approach combines a mixture model in the local or data-dependent space and a global Gaussian latent variable, which lead us to obtain three particular insights. First, the induced latent global space captures interpretable disentangled representations with no user-defined regularization in the evidence lower bound (as in $β$-VAE and its generalizations). Second, we show that the model performs domain alignment to find correlations and interpolate between different databases. Finally, we study the ability of the global space to discriminate between groups of observations with non-trivial underlying structures, such as face images with shared attributes or defined sequences of digits images. △ Less

Submitted 16 December, 2020; v1 submitted 15 December, 2020; originally announced December 2020.

arXiv:1911.03522 [pdf, other]

doi 10.1109/JBHI.2019.2919270

Deep Sequential Models for Suicidal Ideation from Multiple Source Data

Authors: Ignacio Peis, Pablo M. Olmos, Constanza Vera-Varela, María Luisa Barrigón, Philippe Courtet, Enrique Baca-García, Antonio Artés-Rodríguez

Abstract: This article presents a novel method for predicting suicidal ideation from Electronic Health Records (EHR) and Ecological Momentary Assessment (EMA) data using deep sequential models. Both EHR longitudinal data and EMA question forms are defined by asynchronous, variable length, randomly-sampled data sequences. In our method, we model each of them with a Recurrent Neural Network (RNN), and both se… ▽ More This article presents a novel method for predicting suicidal ideation from Electronic Health Records (EHR) and Ecological Momentary Assessment (EMA) data using deep sequential models. Both EHR longitudinal data and EMA question forms are defined by asynchronous, variable length, randomly-sampled data sequences. In our method, we model each of them with a Recurrent Neural Network (RNN), and both sequences are aligned by concatenating the hidden state of each of them using temporal marks. Furthermore, we incorporate attention schemes to improve performance in long sequences and time-independent pre-trained schemes to cope with very short sequences. Using a database of 1023 patients, our experimental results show that the addition of EMA records boosts the system recall to predict the suicidal ideation diagnosis from 48.13% obtained exclusively from EHR-based state-of-the-art methods to 67.78%. Additionally, our method provides interpretability through the t-SNE representation of the latent space. Further, the most relevant input features are identified and interpreted medically. △ Less

Submitted 6 November, 2019; originally announced November 2019.

Comments: Accepted for publication in IEEE Journal of Biomedical and Health Informatics (JBHI)

Journal ref: Journal of Biomedical and Health Informatics, vol.23, no. 6, 2019

Showing 1–3 of 3 results for author: Peis, I