-
Missing Data Imputation and Acquisition with Deep Hierarchical Models and Hamiltonian Monte Carlo
Authors:
Ignacio Peis,
Chao Ma,
José Miguel Hernández-Lobato
Abstract:
Variational Autoencoders (VAEs) have recently been highly successful at imputing and acquiring heterogeneous missing data. However, within this specific application domain, existing VAE methods are restricted by using only one layer of latent variables and strictly Gaussian posterior approximations. To address these limitations, we present HH-VAEM, a Hierarchical VAE model for mixed-type incomplet…
▽ More
Variational Autoencoders (VAEs) have recently been highly successful at imputing and acquiring heterogeneous missing data. However, within this specific application domain, existing VAE methods are restricted by using only one layer of latent variables and strictly Gaussian posterior approximations. To address these limitations, we present HH-VAEM, a Hierarchical VAE model for mixed-type incomplete data that uses Hamiltonian Monte Carlo with automatic hyper-parameter tuning for improved approximate inference. Our experiments show that HH-VAEM outperforms existing baselines in the tasks of missing data imputation and supervised learning with missing features. Finally, we also present a sampling-based approach for efficiently computing the information gain when missing features are to be acquired with HH-VAEM. Our experiments show that this sampling-based approach is superior to alternatives based on Gaussian approximations.
△ Less
Submitted 22 December, 2022; v1 submitted 9 February, 2022;
originally announced February 2022.
-
Unsupervised Learning of Global Factors in Deep Generative Models
Authors:
Ignacio Peis,
Pablo M. Olmos,
Antonio Artés-Rodríguez
Abstract:
We present a novel deep generative model based on non i.i.d. variational autoencoders that captures global dependencies among observations in a fully unsupervised fashion. In contrast to the recent semi-supervised alternatives for global modeling in deep generative models, our approach combines a mixture model in the local or data-dependent space and a global Gaussian latent variable, which lead u…
▽ More
We present a novel deep generative model based on non i.i.d. variational autoencoders that captures global dependencies among observations in a fully unsupervised fashion. In contrast to the recent semi-supervised alternatives for global modeling in deep generative models, our approach combines a mixture model in the local or data-dependent space and a global Gaussian latent variable, which lead us to obtain three particular insights. First, the induced latent global space captures interpretable disentangled representations with no user-defined regularization in the evidence lower bound (as in $β$-VAE and its generalizations). Second, we show that the model performs domain alignment to find correlations and interpolate between different databases. Finally, we study the ability of the global space to discriminate between groups of observations with non-trivial underlying structures, such as face images with shared attributes or defined sequences of digits images.
△ Less
Submitted 16 December, 2020; v1 submitted 15 December, 2020;
originally announced December 2020.
-
Deep Sequential Models for Suicidal Ideation from Multiple Source Data
Authors:
Ignacio Peis,
Pablo M. Olmos,
Constanza Vera-Varela,
María Luisa Barrigón,
Philippe Courtet,
Enrique Baca-García,
Antonio Artés-Rodríguez
Abstract:
This article presents a novel method for predicting suicidal ideation from Electronic Health Records (EHR) and Ecological Momentary Assessment (EMA) data using deep sequential models. Both EHR longitudinal data and EMA question forms are defined by asynchronous, variable length, randomly-sampled data sequences. In our method, we model each of them with a Recurrent Neural Network (RNN), and both se…
▽ More
This article presents a novel method for predicting suicidal ideation from Electronic Health Records (EHR) and Ecological Momentary Assessment (EMA) data using deep sequential models. Both EHR longitudinal data and EMA question forms are defined by asynchronous, variable length, randomly-sampled data sequences. In our method, we model each of them with a Recurrent Neural Network (RNN), and both sequences are aligned by concatenating the hidden state of each of them using temporal marks. Furthermore, we incorporate attention schemes to improve performance in long sequences and time-independent pre-trained schemes to cope with very short sequences. Using a database of 1023 patients, our experimental results show that the addition of EMA records boosts the system recall to predict the suicidal ideation diagnosis from 48.13% obtained exclusively from EHR-based state-of-the-art methods to 67.78%. Additionally, our method provides interpretability through the t-SNE representation of the latent space. Further, the most relevant input features are identified and interpreted medically.
△ Less
Submitted 6 November, 2019;
originally announced November 2019.