Search | arXiv e-print repository

Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks

Abstract: Deep Neural Networks (DNN) have shown great promise in many classification applications, yet are widely known to have poorly calibrated predictions when they are over-parametrized. Improving DNN calibration without comprising on model accuracy is of extreme importance and interest in safety critical applications such as in the health-care sector. In this work, we show that decoupling the training… ▽ More Deep Neural Networks (DNN) have shown great promise in many classification applications, yet are widely known to have poorly calibrated predictions when they are over-parametrized. Improving DNN calibration without comprising on model accuracy is of extreme importance and interest in safety critical applications such as in the health-care sector. In this work, we show that decoupling the training of feature extraction layers and classification layers in over-parametrized DNN architectures such as Wide Residual Networks (WRN) and Visual Transformers (ViT) significantly improves model calibration whilst retaining accuracy, and at a low training cost. In addition, we show that placing a Gaussian prior on the last hidden layer outputs of a DNN, and training the model variationally in the classification training stage, even further improves calibration. We illustrate these methods improve calibration across ViT and WRN architectures for several image classification benchmark datasets. △ Less

Submitted 6 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: Proceedings of the 41 st International Conference on Machine Learning (ICML) 2024

arXiv:2404.03549 [pdf, other]

Alzheimer's disease detection in PSG signals

Authors: Lorena Gallego-Viñarás, Juan Miguel Mira-Tomás, Anna Michela-Gaeta, Gerard Pinol-Ripoll, Ferrán Barbé, Pablo M. Olmos, Arrate Muñoz-Barrutia

Abstract: Alzheimer's disease (AD) and sleep disorders exhibit a close association, where disruptions in sleep patterns often precede the onset of Mild Cognitive Impairment (MCI) and early-stage AD. This study delves into the potential of utilizing sleep-related electroencephalography (EEG) signals acquired through polysomnography (PSG) for the early detection of AD. Our primary focus is on exploring semi-s… ▽ More Alzheimer's disease (AD) and sleep disorders exhibit a close association, where disruptions in sleep patterns often precede the onset of Mild Cognitive Impairment (MCI) and early-stage AD. This study delves into the potential of utilizing sleep-related electroencephalography (EEG) signals acquired through polysomnography (PSG) for the early detection of AD. Our primary focus is on exploring semi-supervised Deep Learning techniques for the classification of EEG signals due to the clinical scenario characterized by the limited data availability. The methodology entails testing and comparing the performance of semi-supervised SMATE and TapNet models, benchmarked against the supervised XCM model, and unsupervised Hidden Markov Models (HMMs). The study highlights the significance of spatial and temporal analysis capabilities, conducting independent analyses of each sleep stage. Results demonstrate the effectiveness of SMATE in leveraging limited labeled data, achieving stable metrics across all sleep stages, and reaching 90% accuracy in its supervised form. Comparative analyses reveal SMATE's superior performance over TapNet and HMM, while XCM excels in supervised scenarios with an accuracy range of 92 - 94%. These findings underscore the potential of semi-supervised models in early AD detection, particularly in overcoming the challenges associated with the scarcity of labeled data. Ablation tests affirm the critical role of spatio-temporal feature extraction in semi-supervised predictive performance, and t-SNE visualizations validate the model's proficiency in distinguishing AD patterns. Overall, this research contributes to the advancement of AD detection through innovative Deep Learning approaches, highlighting the crucial role of semi-supervised learning in addressing data limitations. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 12 pages, 14 figures. Submitted to IEEE Biomedical and Health Informatics for publication

MSC Class: 68T07 (Primary); 68T05; 92B20 (Secondary) ACM Class: I.2.1

arXiv:2402.16435 [pdf, other]

Training Implicit Generative Models via an Invariant Statistical Loss

Authors: José Manuel de Frutos, Pablo M. Olmos, Manuel A. Vázquez, Joaquín Míguez

Abstract: Implicit generative models have the capability to learn arbitrary complex data distributions. On the downside, training requires telling apart real data from artificially-generated ones using adversarial discriminators, leading to unstable training and mode-drop** issues. As reported by Zahee et al. (2017), even in the one-dimensional (1D) case, training a generative adversarial network (GAN) is… ▽ More Implicit generative models have the capability to learn arbitrary complex data distributions. On the downside, training requires telling apart real data from artificially-generated ones using adversarial discriminators, leading to unstable training and mode-drop** issues. As reported by Zahee et al. (2017), even in the one-dimensional (1D) case, training a generative adversarial network (GAN) is challenging and often suboptimal. In this work, we develop a discriminator-free method for training one-dimensional (1D) generative implicit models and subsequently expand this method to accommodate multivariate cases. Our loss function is a discrepancy measure between a suitably chosen transformation of the model samples and a uniform distribution; hence, it is invariant with respect to the true distribution of the data. We first formulate our method for 1D random variables, providing an effective solution for approximate reparameterization of arbitrary complex distributions. Then, we consider the temporal setting (both univariate and multivariate), in which we model the conditional distribution of each sample given the history of the process. We demonstrate through numerical simulations that this new method yields promising results, successfully learning true distributions in a variety of scenarios and mitigating some of the well-known problems that state-of-the-art implicit methods present. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024

arXiv:2401.11618 [pdf, other]

Efficient local linearity regularization to overcome catastrophic overfitting

Authors: Elias Abad Rocamora, Fanghui Liu, Grigorios G. Chrysos, Pablo M. Olmos, Volkan Cevher

Abstract: Catastrophic overfitting (CO) in single-step adversarial training (AT) results in abrupt drops in the adversarial test accuracy (even down to 0%). For models trained with multi-step AT, it has been observed that the loss function behaves locally linearly with respect to the input, this is however lost in single-step AT. To address CO in single-step AT, several methods have been proposed to enforce… ▽ More Catastrophic overfitting (CO) in single-step adversarial training (AT) results in abrupt drops in the adversarial test accuracy (even down to 0%). For models trained with multi-step AT, it has been observed that the loss function behaves locally linearly with respect to the input, this is however lost in single-step AT. To address CO in single-step AT, several methods have been proposed to enforce local linearity of the loss via regularization. However, these regularization terms considerably slow down training due to Double Backpropagation. Instead, in this work, we introduce a regularization term, called ELLE, to mitigate CO effectively and efficiently in classical AT evaluations, as well as some more difficult regimes, e.g., large adversarial perturbations and long training schedules. Our regularization term can be theoretically linked to curvature of the loss function and is computationally cheaper than previous methods by avoiding Double Backpropagation. Our thorough experimental validation demonstrates that our work does not suffer from CO, even in challenging settings where previous works suffer from it. We also notice that adapting our regularization parameter during training (ELLE-A) greatly improves the performance, specially in large $ε$ setups. Our implementation is available in https://github.com/LIONS-EPFL/ELLE . △ Less

Submitted 28 February, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

Comments: Accepted in ICLR 2024

arXiv:2310.11940 [pdf, other]

Interpretable Spectral Variational AutoEncoder (ISVAE) for time series clustering

Authors: Óscar Jiménez Rama, Fernando Moreno-Pino, David Ramírez, Pablo M. Olmos

Abstract: The best encoding is the one that is interpretable in nature. In this work, we introduce a novel model that incorporates an interpretable bottleneck-termed the Filter Bank (FB)-at the outset of a Variational Autoencoder (VAE). This arrangement compels the VAE to attend on the most informative segments of the input signal, fostering the learning of a novel encoding ${f_0}$ which boasts enhanced int… ▽ More The best encoding is the one that is interpretable in nature. In this work, we introduce a novel model that incorporates an interpretable bottleneck-termed the Filter Bank (FB)-at the outset of a Variational Autoencoder (VAE). This arrangement compels the VAE to attend on the most informative segments of the input signal, fostering the learning of a novel encoding ${f_0}$ which boasts enhanced interpretability and clusterability over traditional latent spaces. By deliberately constraining the VAE with this FB, we intentionally constrict its capacity to access broad input domain information, promoting the development of an encoding that is discernible, separable, and of reduced dimensionality. The evolutionary learning trajectory of ${f_0}$ further manifests as a dynamic hierarchical tree, offering profound insights into cluster similarities. Additionally, for handling intricate data configurations, we propose a tailored decoder structure that is symmetrically aligned with FB's architecture. Empirical evaluations highlight the superior efficacy of ISVAE, which compares favorably to state-of-the-art results in clustering metrics across real-world datasets. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2302.06223 [pdf, other]

Variational Mixture of HyperGenerators for Learning Distributions Over Functions

Authors: Batuhan Koyuncu, Pablo Sanchez-Martin, Ignacio Peis, Pablo M. Olmos, Isabel Valera

Abstract: Recent approaches build on implicit neural representations (INRs) to propose generative models over function spaces. However, they are computationally costly when dealing with inference tasks, such as missing data imputation, or directly cannot tackle them. In this work, we propose a novel deep generative model, named VAMoH. VAMoH combines the capabilities of modeling continuous functions using IN… ▽ More Recent approaches build on implicit neural representations (INRs) to propose generative models over function spaces. However, they are computationally costly when dealing with inference tasks, such as missing data imputation, or directly cannot tackle them. In this work, we propose a novel deep generative model, named VAMoH. VAMoH combines the capabilities of modeling continuous functions using INRs and the inference capabilities of Variational Autoencoders (VAEs). In addition, VAMoH relies on a normalizing flow to define the prior, and a mixture of hypernetworks to parametrize the data log-likelihood. This gives VAMoH a high expressive capability and interpretability. Through experiments on a diverse range of data types, such as images, voxels, and climate data, we show that VAMoH can effectively learn rich distributions over continuous functions. Furthermore, it can perform inference-related tasks, such as conditional super-resolution generation and in-painting, as well or better than previous approaches, while being less computationally demanding. △ Less

Submitted 20 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: Accepted at ICML 2023. Camera ready version

arXiv:2301.10156 [pdf, other]

Sleep Activity Recognition and Characterization from Multi-Source Passively Sensed Data

Authors: María Martínez-García, Fernando Moreno-Pino, Pablo M. Olmos, Antonio Artés-Rodríguez

Abstract: Sleep constitutes a key indicator of human health, performance, and quality of life. Sleep deprivation has long been related to the onset, development, and worsening of several mental and metabolic disorders, constituting an essential marker for preventing, evaluating, and treating different health conditions. Sleep Activity Recognition methods can provide indicators to assess, monitor, and charac… ▽ More Sleep constitutes a key indicator of human health, performance, and quality of life. Sleep deprivation has long been related to the onset, development, and worsening of several mental and metabolic disorders, constituting an essential marker for preventing, evaluating, and treating different health conditions. Sleep Activity Recognition methods can provide indicators to assess, monitor, and characterize subjects' sleep-wake cycles and detect behavioral changes. In this work, we propose a general method that continuously operates on passively sensed data from smartphones to characterize sleep and identify significant sleep episodes. Thanks to their ubiquity, these devices constitute an excellent alternative data source to profile subjects' biorhythms in a continuous, objective, and non-invasive manner, in contrast to traditional sleep assessment methods that usually rely on intrusive and subjective procedures. A Heterogeneous Hidden Markov Model is used to model a discrete latent variable process associated with the Sleep Activity Recognition task in a self-supervised way. We validate our results against sleep metrics reported by tested wearables, proving the effectiveness of the proposed approach and advocating its use to assess sleep without more reliable sources. △ Less

Submitted 17 January, 2023; originally announced January 2023.

Comments: v1.0

arXiv:2211.10371 [pdf, other]

Heterogeneous Hidden Markov Models for Sleep Activity Recognition from Multi-Source Passively Sensed Data

Authors: Fernando Moreno-Pino, María Martínez-García, Pablo M. Olmos, Antonio Artés-Rodríguez

Abstract: Psychiatric patients' passive activity monitoring is crucial to detect behavioural shifts in real-time, comprising a tool that helps clinicians supervise patients' evolution over time and enhance the associated treatments' outcomes. Frequently, sleep disturbances and mental health deterioration are closely related, as mental health condition worsening regularly entails shifts in the patients' circ… ▽ More Psychiatric patients' passive activity monitoring is crucial to detect behavioural shifts in real-time, comprising a tool that helps clinicians supervise patients' evolution over time and enhance the associated treatments' outcomes. Frequently, sleep disturbances and mental health deterioration are closely related, as mental health condition worsening regularly entails shifts in the patients' circadian rhythms. Therefore, Sleep Activity Recognition constitutes a behavioural marker to portray patients' activity cycles and to detect behavioural changes among them. Moreover, mobile passively sensed data captured from smartphones, thanks to these devices' ubiquity, constitute an excellent alternative to profile patients' biorhythm. In this work, we aim to identify major sleep episodes based on passively sensed data. To do so, a Heterogeneous Hidden Markov Model is proposed to model a discrete latent variable process associated with the Sleep Activity Recognition task in a self-supervised way. We validate our results against sleep metrics reported by clinically tested wearables, proving the effectiveness of the proposed approach. △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 10 pages (6 pages + 4 pages of references and appendices)

arXiv:2211.09011 [pdf, other]

Detecting train driveshaft damages using accelerometer signals and Differential Convolutional Neural Networks

Authors: Antía López Galdo, Alejandro Guerrero-López, Pablo M. Olmos, María Jesús Gómez García

Abstract: Railway axle maintenance is critical to avoid catastrophic failures. Nowadays, condition monitoring techniques are becoming more prominent in the industry to prevent enormous costs and damage to human lives. This paper proposes the development of a railway axle condition monitoring system based on advanced 2D-Convolutional Neural Network (CNN) architectures applied to time-frequency representation… ▽ More Railway axle maintenance is critical to avoid catastrophic failures. Nowadays, condition monitoring techniques are becoming more prominent in the industry to prevent enormous costs and damage to human lives. This paper proposes the development of a railway axle condition monitoring system based on advanced 2D-Convolutional Neural Network (CNN) architectures applied to time-frequency representations of vibration signals. For this purpose, several preprocessing steps and different types of Deep Learning (DL) and Machine Learning (ML) architectures are discussed to design an accurate classification system. The resultant system converts the railway axle vibration signals into time-frequency domain representations, i.e., spectrograms, and, thus, trains a two-dimensional CNN to classify them depending on their cracks. The results showed that the proposed approach outperforms several alternative methods tested. The CNN architecture has been tested in 3 different wheelset assemblies, achieving AUC scores of 0.93, 0.86, and 0.75 outperforming any other architecture and showing a high level of reliability when classifying 4 different levels of defects. △ Less

Submitted 15 November, 2022; originally announced November 2022.

arXiv:2207.09185 [pdf, other]

Multimodal hierarchical Variational AutoEncoders with Factor Analysis latent space

Authors: Alejandro Guerrero-López, Carlos Sevilla-Salcedo, Vanessa Gómez-Verdejo, Pablo M. Olmos

Abstract: Real-world databases are complex and usually require dealing with heterogeneous and mixed data types making the exploitation of shared information between views a critical issue. For this purpose, recent studies based on deep generative models merge all views into a nonlinear complex latent space, which can share information among views. However, this solution limits the model's interpretability,… ▽ More Real-world databases are complex and usually require dealing with heterogeneous and mixed data types making the exploitation of shared information between views a critical issue. For this purpose, recent studies based on deep generative models merge all views into a nonlinear complex latent space, which can share information among views. However, this solution limits the model's interpretability, flexibility, and modularity. We propose a novel method to overcome these limitations by combining multiple Variational AutoEncoders (VAE) with a Factor Analysis latent space (FA-VAE). We use VAEs to learn a private representation of each heterogeneous view in a continuous latent space. Then, we share the information between views by a low-dimensional latent space using a linear projection matrix. This way, we create a flexible and modular hierarchical dependency between private and shared information in which new views can be incorporated afterwards. Beyond that, we can condition pre-trained models, cross-generate data from different domains, and perform transfer learning between generative models. △ Less

Submitted 6 October, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

Comments: 21 pages main work, 2 pages supplementary, 14 figures

arXiv:2201.06968 [pdf, ps, other]

PyHHMM: A Python Library for Heterogeneous Hidden Markov Models

Authors: Fernando Moreno-Pino, Emese Sükei, Pablo M. Olmos, Antonio Artés-Rodríguez

Abstract: We introduce PyHHMM, an object-oriented open-source Python implementation of Heterogeneous-Hidden Markov Models (HHMMs). In addition to HMM's basic core functionalities, such as different initialization algorithms and classical observations models, i.e., continuous and multinoulli, PyHHMM distinctively emphasizes features not supported in similar available frameworks: a heterogeneous observation m… ▽ More We introduce PyHHMM, an object-oriented open-source Python implementation of Heterogeneous-Hidden Markov Models (HHMMs). In addition to HMM's basic core functionalities, such as different initialization algorithms and classical observations models, i.e., continuous and multinoulli, PyHHMM distinctively emphasizes features not supported in similar available frameworks: a heterogeneous observation model, missing data inference, different model order selection criterias, and semi-supervised training. These characteristics result in a feature-rich implementation for researchers working with sequential data. PyHHMM relies on the numpy, scipy, scikit-learn, and seaborn Python packages, and is distributed under the Apache-2.0 License. PyHHMM's source code is publicly available on Github (https://github.com/fmorenopino/HeterogeneousHMM) to facilitate adoptions and future contributions. A detailed documentation (https://pyhhmm.readthedocs.io/en/latest), which covers examples of use and models' theoretical explanation, is available. The package can be installed through the Python Package Index (PyPI), via 'pip install pyhhmm'. △ Less

Submitted 12 January, 2022; originally announced January 2022.

arXiv:2201.05040 [pdf, other]

Multi-task longitudinal forecasting with missing values on Alzheimer's Disease

Authors: Carlos Sevilla-Salcedo, Vandad Imani, Pablo M. Olmos, Vanessa Gómez-Verdejo, Jussi Tohka

Abstract: Machine learning techniques typically applied to dementia forecasting lack in their capabilities to jointly learn several tasks, handle time dependent heterogeneous data and missing values. In this paper, we propose a framework using the recently presented SSHIBA model for jointly learning different tasks on longitudinal data with missing values. The method uses Bayesian variational inference to i… ▽ More Machine learning techniques typically applied to dementia forecasting lack in their capabilities to jointly learn several tasks, handle time dependent heterogeneous data and missing values. In this paper, we propose a framework using the recently presented SSHIBA model for jointly learning different tasks on longitudinal data with missing values. The method uses Bayesian variational inference to impute missing values and combine information of several views. This way, we can combine different data-views from different time-points in a common latent space and learn the relations between each time-point while simultaneously modelling and predicting several output variables. We apply this model to predict together diagnosis, ventricle volume, and clinical scores in dementia. The results demonstrate that SSHIBA is capable of learning a good imputation of the missing values and outperforming the baselines while simultaneously predicting three different tasks. △ Less

Submitted 13 January, 2022; originally announced January 2022.

arXiv:2108.10764 [pdf, ps, other]

Regularizing Transformers With Deep Probabilistic Layers

Authors: Aurora Cobo Aguilera, Pablo Martínez Olmos, Antonio Artés-Rodríguez, Fernando Pérez-Cruz

Abstract: Language models (LM) have grown with non-stop in the last decade, from sequence-to-sequence architectures to the state-of-the-art and utter attention-based Transformers. In this work, we demonstrate how the inclusion of deep generative models within BERT can bring more versatile models, able to impute missing/noisy words with richer text or even improve BLEU score. More precisely, we use a Gaussia… ▽ More Language models (LM) have grown with non-stop in the last decade, from sequence-to-sequence architectures to the state-of-the-art and utter attention-based Transformers. In this work, we demonstrate how the inclusion of deep generative models within BERT can bring more versatile models, able to impute missing/noisy words with richer text or even improve BLEU score. More precisely, we use a Gaussian Mixture Variational Autoencoder (GMVAE) as a regularizer layer and prove its effectiveness not only in Transformers but also in the most relevant encoder-decoder based LM, seq2seq with and without attention. △ Less

Submitted 23 August, 2021; originally announced August 2021.

arXiv:2107.05984 [pdf, ps, other]

Deep Autoregressive Models with Spectral Attention

Authors: Fernando Moreno-Pino, Pablo M. Olmos, Antonio Artés-Rodríguez

Abstract: Time series forecasting is an important problem across many domains, playing a crucial role in multiple real-world applications. In this paper, we propose a forecasting architecture that combines deep autoregressive models with a Spectral Attention (SA) module, which merges global and local frequency domain information in the model's embedded space. By characterizing in the spectral domain the emb… ▽ More Time series forecasting is an important problem across many domains, playing a crucial role in multiple real-world applications. In this paper, we propose a forecasting architecture that combines deep autoregressive models with a Spectral Attention (SA) module, which merges global and local frequency domain information in the model's embedded space. By characterizing in the spectral domain the embedding of the time series as occurrences of a random process, our method can identify global trends and seasonality patterns. Two spectral attention models, global and local to the time series, integrate this information within the forecast and perform spectral filtering to remove time series's noise. The proposed architecture has a number of useful properties: it can be effectively incorporated into well-know forecast architectures, requiring a low number of parameters and producing interpretable results that improve forecasting accuracy. We test the Spectral Attention Autoregressive Model (SAAM) on several well-know forecast datasets, consistently demonstrating that our model compares favorably to state-of-the-art approaches. △ Less

Submitted 26 December, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

Comments: Errors in Eq. 2 and Eq. 3 corrected

arXiv:2103.07206 [pdf, other]

doi 10.1109/JBHI.2021.3123839

Medical data wrangling with sequential variational autoencoders

Authors: Daniel Barrejón, Pablo M. Olmos, Antonio Artés-Rodríguez

Abstract: Medical data sets are usually corrupted by noise and missing data. These missing patterns are commonly assumed to be completely random, but in medical scenarios, the reality is that these patterns occur in bursts due to sensors that are off for some time or data collected in a misaligned uneven fashion, among other causes. This paper proposes to model medical data records with heterogeneous data t… ▽ More Medical data sets are usually corrupted by noise and missing data. These missing patterns are commonly assumed to be completely random, but in medical scenarios, the reality is that these patterns occur in bursts due to sensors that are off for some time or data collected in a misaligned uneven fashion, among other causes. This paper proposes to model medical data records with heterogeneous data types and bursty missing data using sequential variational autoencoders (VAEs). In particular, we propose a new methodology, the Shi-VAE, which extends the capabilities of VAEs to sequential streams of data with missing observations. We compare our model against state-of-the-art solutions in an intensive care unit database (ICU) and a dataset of passive human monitoring. Furthermore, we find that standard error metrics such as RMSE are not conclusive enough to assess temporal models and include in our analysis the cross-correlation between the ground truth and the imputed signal. We show that Shi-VAE achieves the best performance in terms of using both metrics, with lower computational complexity than the GP-VAE model, which is the state-of-the-art method for medical records. △ Less

Submitted 8 November, 2021; v1 submitted 12 March, 2021; originally announced March 2021.

Comments: Accepted for publication in IEEE Journal of Biomedical and Health Informatics (JBHI)

arXiv:2012.08234 [pdf, other]

Unsupervised Learning of Global Factors in Deep Generative Models

Authors: Ignacio Peis, Pablo M. Olmos, Antonio Artés-Rodríguez

Abstract: We present a novel deep generative model based on non i.i.d. variational autoencoders that captures global dependencies among observations in a fully unsupervised fashion. In contrast to the recent semi-supervised alternatives for global modeling in deep generative models, our approach combines a mixture model in the local or data-dependent space and a global Gaussian latent variable, which lead u… ▽ More We present a novel deep generative model based on non i.i.d. variational autoencoders that captures global dependencies among observations in a fully unsupervised fashion. In contrast to the recent semi-supervised alternatives for global modeling in deep generative models, our approach combines a mixture model in the local or data-dependent space and a global Gaussian latent variable, which lead us to obtain three particular insights. First, the induced latent global space captures interpretable disentangled representations with no user-defined regularization in the evidence lower bound (as in $β$-VAE and its generalizations). Second, we show that the model performs domain alignment to find correlations and interpolate between different databases. Finally, we study the ability of the global space to discriminate between groups of observations with non-trivial underlying structures, such as face images with shared attributes or defined sequences of digits images. △ Less

Submitted 16 December, 2020; v1 submitted 15 December, 2020; originally announced December 2020.

arXiv:2012.02544 [pdf, other]

doi 10.1109/ACCESS.2021.3082689

Boosting offline handwritten text recognition in historical documents with few labeled lines

Authors: José Carlos Aradillas, Juan José Murillo-Fuentes, Pablo M. Olmos

Abstract: In this paper, we face the problem of offline handwritten text recognition (HTR) in historical documents when few labeled samples are available and some of them contain errors in the train set. Three main contributions are developed. First we analyze how to perform transfer learning (TL) from a massive database to a smaller historical database, analyzing which layers of the model need a fine-tunin… ▽ More In this paper, we face the problem of offline handwritten text recognition (HTR) in historical documents when few labeled samples are available and some of them contain errors in the train set. Three main contributions are developed. First we analyze how to perform transfer learning (TL) from a massive database to a smaller historical database, analyzing which layers of the model need a fine-tuning process. Second, we analyze methods to efficiently combine TL and data augmentation (DA). Finally, an algorithm to mitigate the effects of incorrect labelings in the training set is proposed. The methods are analyzed over the ICFHR 2018 competition database, Washington and Parzival. Combining all these techniques, we demonstrate a remarkable reduction of CER (up to 6% in some cases) in the test set with little complexity overhead. △ Less

Submitted 4 December, 2020; originally announced December 2020.

arXiv:2006.02734 [pdf, ps, other]

Robust Sampling in Deep Learning

Authors: Aurora Cobo Aguilera, Antonio Artés-Rodríguez, Fernando Pérez-Cruz, Pablo Martínez Olmos

Abstract: Deep learning requires regularization mechanisms to reduce overfitting and improve generalization. We address this problem by a new regularization method based on distributional robust optimization. The key idea is to modify the contribution from each sample for tightening the empirical risk bound. During the stochastic training, the selection of samples is done according to their accuracy in such… ▽ More Deep learning requires regularization mechanisms to reduce overfitting and improve generalization. We address this problem by a new regularization method based on distributional robust optimization. The key idea is to modify the contribution from each sample for tightening the empirical risk bound. During the stochastic training, the selection of samples is done according to their accuracy in such a way that the worst performed samples are the ones that contribute the most in the optimization. We study different scenarios and show the ones where it can make the convergence faster or increase the accuracy. △ Less

Submitted 5 June, 2020; v1 submitted 4 June, 2020; originally announced June 2020.

Comments: 8 pages, 3 figures

arXiv:2006.00968 [pdf, other]

Bayesian Sparse Factor Analysis with Kernelized Observations

Authors: Carlos Sevilla-Salcedo, Alejandro Guerrero-López, Pablo M. Olmos, Vanessa Gómez-Verdejo

Abstract: Multi-view problems can be faced with latent variable models since they are able to find low-dimensional projections that fairly capture the correlations among the multiple views that characterise each datum. On the other hand, high-dimensionality and non-linear issues are traditionally handled by kernel methods, inducing a (non)-linear function between the latent projection and the data itself. H… ▽ More Multi-view problems can be faced with latent variable models since they are able to find low-dimensional projections that fairly capture the correlations among the multiple views that characterise each datum. On the other hand, high-dimensionality and non-linear issues are traditionally handled by kernel methods, inducing a (non)-linear function between the latent projection and the data itself. However, they usually come with scalability issues and exposition to overfitting. Here, we propose merging both approaches into single model so that we can exploit the best features of multi-view latent models and kernel methods and, moreover, overcome their limitations. In particular, we combine probabilistic factor analysis with what we refer to as kernelized observations, in which the model focuses on reconstructing not the data itself, but its relationship with other data points measured by a kernel function. This model can combine several types of views (kernelized or not), and it can handle heterogeneous data and work in semi-supervised settings. Additionally, by including adequate priors, it can provide compact solutions for the kernelized observations -- based in a automatic selection of Bayesian Relevance Vectors (RVs) -- and can include feature selection capabilities. Using several public databases, we demonstrate the potential of our approach (and its extensions) w.r.t. common multi-view learning models such as kernel canonical correlation analysis or manifold relevance determination. △ Less

Submitted 27 January, 2021; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: Article submitted to Artificial Intelligence Journal

arXiv:2001.08975 [pdf, other]

Sparse Semi-supervised Heterogeneous Interbattery Bayesian Analysis

Authors: Carlos Sevilla-Salcedo, Vanessa Gómez-Verdejo, Pablo M. Olmos

Abstract: The Bayesian approach to feature extraction, known as factor analysis (FA), has been widely studied in machine learning to obtain a latent representation of the data. An adequate selection of the probabilities and priors of these bayesian models allows the model to better adapt to the data nature (i.e. heterogeneity, sparsity), obtaining a more representative latent space. The objective of this… ▽ More The Bayesian approach to feature extraction, known as factor analysis (FA), has been widely studied in machine learning to obtain a latent representation of the data. An adequate selection of the probabilities and priors of these bayesian models allows the model to better adapt to the data nature (i.e. heterogeneity, sparsity), obtaining a more representative latent space. The objective of this article is to propose a general FA framework capable of modelling any problem. To do so, we start from the Bayesian Inter-Battery Factor Analysis (BIBFA) model, enhancing it with new functionalities to be able to work with heterogeneous data, include feature selection, and handle missing values as well as semi-supervised problems. The performance of the proposed model, Sparse Semi-supervised Heterogeneous Interbattery Bayesian Analysis (SSHIBA) has been tested on 4 different scenarios to evaluate each one of its novelties, showing not only a great versatility and an interpretability gain, but also outperforming most of the state-of-the-art algorithms. △ Less

Submitted 24 January, 2020; originally announced January 2020.

arXiv:1911.03522 [pdf, other]

doi 10.1109/JBHI.2019.2919270

Deep Sequential Models for Suicidal Ideation from Multiple Source Data

Authors: Ignacio Peis, Pablo M. Olmos, Constanza Vera-Varela, María Luisa Barrigón, Philippe Courtet, Enrique Baca-García, Antonio Artés-Rodríguez

Abstract: This article presents a novel method for predicting suicidal ideation from Electronic Health Records (EHR) and Ecological Momentary Assessment (EMA) data using deep sequential models. Both EHR longitudinal data and EMA question forms are defined by asynchronous, variable length, randomly-sampled data sequences. In our method, we model each of them with a Recurrent Neural Network (RNN), and both se… ▽ More This article presents a novel method for predicting suicidal ideation from Electronic Health Records (EHR) and Ecological Momentary Assessment (EMA) data using deep sequential models. Both EHR longitudinal data and EMA question forms are defined by asynchronous, variable length, randomly-sampled data sequences. In our method, we model each of them with a Recurrent Neural Network (RNN), and both sequences are aligned by concatenating the hidden state of each of them using temporal marks. Furthermore, we incorporate attention schemes to improve performance in long sequences and time-independent pre-trained schemes to cope with very short sequences. Using a database of 1023 patients, our experimental results show that the addition of EMA records boosts the system recall to predict the suicidal ideation diagnosis from 48.13% obtained exclusively from EHR-based state-of-the-art methods to 67.78%. Additionally, our method provides interpretability through the t-SNE representation of the latent space. Further, the most relevant input features are identified and interpreted medically. △ Less

Submitted 6 November, 2019; originally announced November 2019.

Comments: Accepted for publication in IEEE Journal of Biomedical and Health Informatics (JBHI)

Journal ref: Journal of Biomedical and Health Informatics, vol.23, no. 6, 2019

arXiv:1911.01425 [pdf, other]

Improved BiGAN training with marginal likelihood equalization

Authors: Pablo Sánchez-Martín, Pablo M. Olmos, Fernando Perez-Cruz

Abstract: We propose a novel training procedure for improving the performance of generative adversarial networks (GANs), especially to bidirectional GANs. First, we enforce that the empirical distribution of the inverse inference network matches the prior distribution, which favors the generator network reproducibility on the seen samples. Second, we have found that the marginal log-likelihood of the sample… ▽ More We propose a novel training procedure for improving the performance of generative adversarial networks (GANs), especially to bidirectional GANs. First, we enforce that the empirical distribution of the inverse inference network matches the prior distribution, which favors the generator network reproducibility on the seen samples. Second, we have found that the marginal log-likelihood of the samples shows a severe overrepresentation of a certain type of samples. To address this issue, we propose to train the bidirectional GAN using a non-uniform sampling for the mini-batch selection, resulting in improved quality and variety in generated samples measured quantitatively and by visual inspection. We illustrate our new procedure with the well-known CIFAR10, Fashion MNIST and CelebA datasets. △ Less

Submitted 23 May, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

arXiv:1910.14110 [pdf, other]

doi 10.1109/TIT.2021.3071743

Spatially Coupled Generalized LDPC Codes: Asymptotic Analysis and Finite Length Scaling

Authors: David G. M. Mitchell, Pablo M. Olmos, Michael Lentmaier, Daniel J. Costello

Abstract: Generalized low-density parity-check (GLDPC) codes are a class of LDPC codes in which the standard single parity check (SPC) constraints are replaced by constraints defined by a linear block code. These stronger constraints typically result in improved error floor performance, due to better minimum distance and trap** set properties, at a cost of some increased decoding complexity. In this paper… ▽ More Generalized low-density parity-check (GLDPC) codes are a class of LDPC codes in which the standard single parity check (SPC) constraints are replaced by constraints defined by a linear block code. These stronger constraints typically result in improved error floor performance, due to better minimum distance and trap** set properties, at a cost of some increased decoding complexity. In this paper, we study spatially coupled generalized low-density parity-check (SC-GLDPC) codes and present a comprehensive analysis of these codes, including: (1) an iterative decoding threshold analysis of SC-GLDPC code ensembles demonstrating capacity approaching thresholds via the threshold saturation effect; (2) an asymptotic analysis of the minimum distance and free distance properties of SC-GLDPC code ensembles, demonstrating that the ensembles are asymptotically good; and (3) an analysis of the finite-length scaling behavior of both GLDPC block codes and SC-GLDPC codes based on a peeling decoder (PD) operating on a binary erasure channel (BEC). Results are compared to GLDPC block codes, and the advantages and disadvantages of SC-GLDPC codes are discussed. △ Less

Submitted 5 April, 2021; v1 submitted 30 October, 2019; originally announced October 2019.

Comments: Revised version submitted to the IEEE Transactions on Information Theory

arXiv:1910.06574 [pdf, other]

doi 10.1109/ITW.2018.8613515

On Generalized LDPC Codes for 5G Ultra Reliable Communication

Authors: Yanfang Liu, Pablo M. Olmos, David G. M. Mitchell

Abstract: Generalized low-density parity-check (GLDPC) codes, where single parity-check (SPC) constraint nodes are replaced with generalized constraint (GC) nodes, are a promising class of codes for low latency communication. In this paper, a practical construction of quasi-cyclic (QC) GLDPC codes is proposed, where the proportion of generalized constraints is determined by an asymptotic analysis. We analyz… ▽ More Generalized low-density parity-check (GLDPC) codes, where single parity-check (SPC) constraint nodes are replaced with generalized constraint (GC) nodes, are a promising class of codes for low latency communication. In this paper, a practical construction of quasi-cyclic (QC) GLDPC codes is proposed, where the proportion of generalized constraints is determined by an asymptotic analysis. We analyze the message passing process and complexity of a GLDPC code over the additive white gaussian noise (AWGN) channel and present a constraint-to-variable update rule based on the specific codewords of the component code. The block error rate (BLER) performance of the GLDPC codes, combined with a complementary outer code, is shown to outperform a variety of state-of-the-art code and decoder designs with suitable lengths and rates for the 5G Ultra Reliable Communication (URC) regime over an additive white gaussian noise (AWGN) channel with quadrature PSK (QPSK) modulation. △ Less

Submitted 15 October, 2019; originally announced October 2019.

Journal ref: 2018 IEEE Information Theory Workshop (ITW)

arXiv:1910.06569 [pdf, other]

doi 10.1109/LSP.2019.2944005

Probabilistic Time of Arrival Localization

Authors: Fernando Perez-Cruz, Pablo M. Olmos, Michael Minyi Zhang, Howard Huang

Abstract: In this paper, we take a new approach for time of arrival geo-localization. We show that the main sources of error in metropolitan areas are due to environmental imperfections that bias our solutions, and that we can rely on a probabilistic model to learn and compensate for them. The resulting localization error is validated using measurements from a live LTE cellular network to be less than 10 me… ▽ More In this paper, we take a new approach for time of arrival geo-localization. We show that the main sources of error in metropolitan areas are due to environmental imperfections that bias our solutions, and that we can rely on a probabilistic model to learn and compensate for them. The resulting localization error is validated using measurements from a live LTE cellular network to be less than 10 meters, representing an order-of-magnitude improvement. △ Less

Submitted 15 October, 2019; originally announced October 2019.

Comments: IEEE Signal Processing Letters, 2019

arXiv:1910.00853 [pdf, other]

doi 10.1109/TVT.2017.2786638

Probabilistic MIMO Symbol Detection with Expectation Consistency Approximate Inference

Authors: Javier Cépedes, Pablo M. Olmos, Matilde Sánchez-Fernández, Fernando Pérez-Cruz

Abstract: In this paper we explore low-complexity probabilistic algorithms for soft symbol detection in high-dimensional multiple-input multiple-output (MIMO) systems. We present a novel algorithm based on the Expectation Consistency (EC) framework, which describes the approximate inference problem as an optimization over a non-convex function. EC generalizes algorithms such as Belief Propagation and Expect… ▽ More In this paper we explore low-complexity probabilistic algorithms for soft symbol detection in high-dimensional multiple-input multiple-output (MIMO) systems. We present a novel algorithm based on the Expectation Consistency (EC) framework, which describes the approximate inference problem as an optimization over a non-convex function. EC generalizes algorithms such as Belief Propagation and Expectation Propagation. For the MIMO symbol detection problem, we discuss feasible methods to find stationary points of the EC function and explore their tradeoffs between accuracy and speed of convergence. The accuracy is studied, first in terms of input-output mutual information and show that the proposed EC MIMO detector greatly improves state-of-the-art methods, with a complexity order cubic in the number of transmitting antennas. Second, these gains are corroborated by combining the probabilistic output of the EC detector with a low-density parity-check (LDPC) channel code. △ Less

Submitted 2 October, 2019; originally announced October 2019.

Journal ref: IEEE Transactions on Vehicular Technology ( Volume: 67 , Issue: 4 , April 2018 )

arXiv:1901.09557 [pdf, other]

Out-of-Sample Testing for GANs

Authors: Pablo Sánchez-Martín, Pablo M. Olmos, Fernando Pérez-Cruz

Abstract: We propose a new method to evaluate GANs, namely EvalGAN. EvalGAN relies on a test set to directly measure the reconstruction quality in the original sample space (no auxiliary networks are necessary), and it also computes the (log)likelihood for the reconstructed samples in the test set. Further, EvalGAN is agnostic to the GAN algorithm and the dataset. We decided to test it on three state-of-the… ▽ More We propose a new method to evaluate GANs, namely EvalGAN. EvalGAN relies on a test set to directly measure the reconstruction quality in the original sample space (no auxiliary networks are necessary), and it also computes the (log)likelihood for the reconstructed samples in the test set. Further, EvalGAN is agnostic to the GAN algorithm and the dataset. We decided to test it on three state-of-the-art GANs over the well-known CIFAR-10 and CelebA datasets. △ Less

Submitted 28 January, 2019; originally announced January 2019.

arXiv:1807.03653 [pdf, other]

Handling Incomplete Heterogeneous Data using VAEs

Authors: Alfredo Nazabal, Pablo M. Olmos, Zoubin Ghahramani, Isabel Valera

Abstract: Variational autoencoders (VAEs), as well as other generative models, have been shown to be efficient and accurate for capturing the latent structure of vast amounts of complex high-dimensional data. However, existing VAEs can still not directly handle data that are heterogenous (mixed continuous and discrete) or incomplete (with missing data at random), which is indeed common in real-world applica… ▽ More Variational autoencoders (VAEs), as well as other generative models, have been shown to be efficient and accurate for capturing the latent structure of vast amounts of complex high-dimensional data. However, existing VAEs can still not directly handle data that are heterogenous (mixed continuous and discrete) or incomplete (with missing data at random), which is indeed common in real-world applications. In this paper, we propose a general framework to design VAEs suitable for fitting incomplete heterogenous data. The proposed HI-VAE includes likelihood models for real-valued, positive real valued, interval, categorical, ordinal and count data, and allows accurate estimation (and potentially imputation) of missing data. Furthermore, HI-VAE presents competitive predictive performance in supervised tasks, outperforming supervised models when trained on incomplete data. △ Less

Submitted 22 May, 2020; v1 submitted 10 July, 2018; originally announced July 2018.

arXiv:1804.01527 [pdf, other]

doi 10.1109/ICFHR-2018.2018.00081

Boosting Handwriting Text Recognition in Small Databases with Transfer Learning

Authors: José Carlos Aradillas, Juan José Murillo-Fuentes, Pablo M. Olmos

Abstract: In this paper we deal with the offline handwriting text recognition (HTR) problem with reduced training datasets. Recent HTR solutions based on artificial neural networks exhibit remarkable solutions in referenced databases. These deep learning neural networks are composed of both convolutional (CNN) and long short-term memory recurrent units (LSTM). In addition, connectionist temporal classificat… ▽ More In this paper we deal with the offline handwriting text recognition (HTR) problem with reduced training datasets. Recent HTR solutions based on artificial neural networks exhibit remarkable solutions in referenced databases. These deep learning neural networks are composed of both convolutional (CNN) and long short-term memory recurrent units (LSTM). In addition, connectionist temporal classification (CTC) is the key to avoid segmentation at character level, greatly facilitating the labeling task. One of the main drawbacks of the CNNLSTM-CTC (CLC) solutions is that they need a considerable part of the text to be transcribed for every type of calligraphy, typically in the order of a few thousands of lines. Furthermore, in some scenarios the text to transcribe is not that long, e.g. in the Washington database. The CLC typically overfits for this reduced number of training samples. Our proposal is based on the transfer learning (TL) from the parameters learned with a bigger database. We first investigate, for a reduced and fixed number of training samples, 350 lines, how the learning from a large database, the IAM, can be transferred to the learning of the CLC of a reduced database, Washington. We focus on which layers of the network could be not re-trained. We conclude that the best solution is to re-train the whole CLC parameters initialized to the values obtained after the training of the CLC from the larger database. We also investigate results when the training size is further reduced. The differences in the CER are more remarkable when training with just 350 lines, a CER of 3.3% is achieved with TL while we have a CER of 18.2% when training from scratch. As a byproduct, the learning times are quite reduced. Similar good results are obtained from the Parzival database when trained with this reduced number of lines and this new approach. △ Less

Submitted 4 April, 2018; originally announced April 2018.

Comments: ICFHR 2018 Conference

arXiv:1711.08188 [pdf, ps, other]

Turbo EP-based Equalization: a Filter-Type Implementation

Authors: Irene Santos, Juan José Murillo-Fuentes, Eva Arias-de-Reyna, Pablo M. Olmos

Abstract: This manuscript has been submitted to Transactions on Communications on September 7, 2017; revised on January 10, 2018 and March 27, 2018; and accepted on April 25, 2018 We propose a novel filter-type equalizer to improve the solution of the linear minimum-mean squared-error (LMMSE) turbo equalizer, with computational complexity constrained to be quadratic in the filter length. When high-order m… ▽ More This manuscript has been submitted to Transactions on Communications on September 7, 2017; revised on January 10, 2018 and March 27, 2018; and accepted on April 25, 2018 We propose a novel filter-type equalizer to improve the solution of the linear minimum-mean squared-error (LMMSE) turbo equalizer, with computational complexity constrained to be quadratic in the filter length. When high-order modulations and/or large memory channels are used the optimal BCJR equalizer is unavailable, due to its computational complexity. In this scenario, the filter-type LMMSE turbo equalization exhibits a good performance compared to other approximations. In this paper, we show that this solution can be significantly improved by using expectation propagation (EP) in the estimation of the a posteriori probabilities. First, it yields a more accurate estimation of the extrinsic distribution to be sent to the channel decoder. Second, compared to other solutions based on EP the computational complexity of the proposed solution is constrained to be quadratic in the length of the finite impulse response (FIR). In addition, we review previous EP-based turbo equalization implementations. Instead of considering default uniform priors we exploit the outputs of the decoder. Some simulation results are included to show that this new EP-based filter remarkably outperforms the turbo approach of previous versions of the EP algorithm and also improves the LMMSE solution, with and without turbo equalization. △ Less

Submitted 21 December, 2018; v1 submitted 22 November, 2017; originally announced November 2017.

arXiv:1709.00873 [pdf, other]

doi 10.1109/TIT.2019.2909917

A Probabilistic Peeling Decoder to Efficiently Analyze Generalized LDPC Codes Over the BEC

Authors: Yanfang Liu, Pablo M. Olmos, Tobias Koch

Abstract: In this paper, we analyze the tradeoff between coding rate and asymptotic performance of a class of generalized low-density parity-check (GLDPC) codes constructed by including a certain fraction of generalized constraint (GC) nodes in the graph. The rate of the GLDPC ensemble is bounded using classical results on linear block codes, namely Hamming bound and Varshamov bound. We also study the impac… ▽ More In this paper, we analyze the tradeoff between coding rate and asymptotic performance of a class of generalized low-density parity-check (GLDPC) codes constructed by including a certain fraction of generalized constraint (GC) nodes in the graph. The rate of the GLDPC ensemble is bounded using classical results on linear block codes, namely Hamming bound and Varshamov bound. We also study the impact of the decoding method used at GC nodes. To incorporate both bounded-distance (BD) and Maximum Likelihood (ML) decoding at GC nodes into our analysis without resorting on multi-edge type of degree distributions (DDs), we propose the probabilistic peeling decoding (P-PD) algorithm, which models the decoding step at every GC node as an instance of a Bernoulli random variable with a successful decoding probability that depends on both the GC block code as well as its decoding algorithm. The P-PD asymptotic performance over the BEC can be efficiently predicted using standard techniques for LDPC codes such as density evolution (DE) or the differential equation method. Furthermore, for a class of GLDPC ensembles, we demonstrate that the simulated P-PD performance accurately predicts the actual performance of the GLPDC code under ML decoding at GC nodes. We illustrate our analysis for GLDPC code ensembles with regular and irregular DDs. In all cases, we show that a large fraction of GC nodes is required to reduce the original gap to capacity, but the optimal fraction is strictly smaller than one. We then consider techniques to further reduce the gap to capacity by means of random puncturing, and the inclusion of a certain fraction of generalized variable nodes in the graph. △ Less

Submitted 12 September, 2018; v1 submitted 4 September, 2017; originally announced September 2017.

Comments: Submitted to IEEE Transactions on Information Theory, August 2017

Journal ref: IEEE Transactions on Information Theory, August 2019

arXiv:1606.02087 [pdf, ps, other]

doi 10.1109/TCOMM.2017.2737018

Continuous Transmission of Spatially-Coupled LDPC Code Chains

Authors: Pablo M. Olmos, David G. M. Mitchell, Dmitri Truhachev, Daniel J. Costello Jr

Abstract: We propose a novel encoding/transmission scheme called continuous chain (CC) transmission that is able to improve the finite-length performance of a system using spatially-coupled low-density parity-check (SC-LDPC) codes. In CC transmission, instead of transmitting a sequence of independent codewords from a terminated SC-LDPC code chain, we connect multiple chains in a layered format, where encodi… ▽ More We propose a novel encoding/transmission scheme called continuous chain (CC) transmission that is able to improve the finite-length performance of a system using spatially-coupled low-density parity-check (SC-LDPC) codes. In CC transmission, instead of transmitting a sequence of independent codewords from a terminated SC-LDPC code chain, we connect multiple chains in a layered format, where encoding, transmission, and decoding are now performed in a continuous fashion. The connections between chains are created at specific points, chosen to improve the finite-length performance of the code structure under iterative decoding. We describe the design of CC schemes for different SC-LDPC code ensembles constructed from protographs: a (J,K)-regular SC-LDPC code chain, a spatially-coupled repeat-accumulate (SC-RA) code, and a spatially-coupled accumulate-repeat-jagged-accumulate (SC- ARJA) code. In all cases, significant performance improvements are reported and, in addition, it is shown that using CC transmission only requires a small increase in decoding complexity and decoding delay with respect to a system employing a single SC-LDPC code chain for transmission. △ Less

Submitted 2 October, 2019; v1 submitted 7 June, 2016; originally announced June 2016.

Comments: arXiv admin note: text overlap with arXiv:1402.7170

Journal ref: IEEE Transactions on Communications, December 2017. Pages 5097 - 5109

arXiv:1604.05111 [pdf, other]

Finite-length scaling based on belief propagation for spatially coupled LDPC codes

Authors: Markus Stinner, Luca Barletta, Pablo M. Olmos

Abstract: The equivalence of peeling decoding (PD) and Belief Propagation (BP) for low-density parity-check (LDPC) codes over the binary erasure channel is analyzed. Modifying the scheduling for PD, it is shown that exactly the same variable nodes (VNs) are resolved in every iteration than with BP. The decrease of erased VNs during the decoding process is analyzed instead of resolvable equations. This quant… ▽ More The equivalence of peeling decoding (PD) and Belief Propagation (BP) for low-density parity-check (LDPC) codes over the binary erasure channel is analyzed. Modifying the scheduling for PD, it is shown that exactly the same variable nodes (VNs) are resolved in every iteration than with BP. The decrease of erased VNs during the decoding process is analyzed instead of resolvable equations. This quantity can also be derived with density evolution, resulting in a drastic decrease in complexity. Finally, a scaling law using this quantity is established for spatially coupled LDPC codes. △ Less

Submitted 18 April, 2016; originally announced April 2016.

arXiv:1504.04137 [pdf, ps, other]

On Distributed Storage Allocations for Memory-Limited Systems

Authors: Iryna Andriyanova, Pablo M. Olmos

Abstract: In this paper we consider distributed allocation problems with memory constraint limits. Firstly, we propose a tractable relaxation to the problem of optimal symmetric allocations from [1]. The approximated problem is based on the Q-error function, and its solution approaches the solution of the initial problem, as the number of storage nodes in the network grows. Secondly, exploiting this relaxat… ▽ More In this paper we consider distributed allocation problems with memory constraint limits. Firstly, we propose a tractable relaxation to the problem of optimal symmetric allocations from [1]. The approximated problem is based on the Q-error function, and its solution approaches the solution of the initial problem, as the number of storage nodes in the network grows. Secondly, exploiting this relaxation, we are able to formulate and to solve the problem for storage allocations for memory-limited DSS storing and arbitrary memory profiles. Finally, we discuss the extension to the case of multiple data objects, stored in the DSS. △ Less

Submitted 16 April, 2015; originally announced April 2015.

Comments: Submitted to IEEE GLOBECOM'15

arXiv:1404.5719 [pdf, other]

doi 10.1109/TIT.2015.2422816

A Scaling Law to Predict the Finite-Length Performance of Spatially-Coupled LDPC Codes

Authors: Pablo M. Olmos, Rüdiger Urbanke

Abstract: Spatially-coupled LDPC codes are known to have excellent asymptotic properties. Much less is known regarding their finite-length performance. We propose a scaling law to predict the error probability of finite-length spatially-coupled ensembles when transmission takes place over the binary erasure channel. We discuss how the parameters of the scaling law are connected to fundamental quantities app… ▽ More Spatially-coupled LDPC codes are known to have excellent asymptotic properties. Much less is known regarding their finite-length performance. We propose a scaling law to predict the error probability of finite-length spatially-coupled ensembles when transmission takes place over the binary erasure channel. We discuss how the parameters of the scaling law are connected to fundamental quantities appearing in the asymptotic analysis of these ensembles and we verify that the predictions of the scaling law fit well to the data derived from simulations over a wide range of parameters. The ultimate goal of this line of research is to develop analytic tools for the design of spatially-coupled LDPC codes under practical constraints. △ Less

Submitted 25 May, 2015; v1 submitted 23 April, 2014; originally announced April 2014.

Journal ref: IEEE Transactions on Information Theory, Volume 61 , Issue 6, June 2015, Pages 3164 - 3184

arXiv:1402.7170 [pdf, other]

Improving the Finite-Length Performance of Spatially Coupled LDPC Codes by Connecting Multiple Code Chains

Authors: Pablo M. Olmos, David G. M. Mitchell, Dmitri Truhachev, Daniel J. Costello Jr

Abstract: In this paper, we analyze the finite-length performance of codes on graphs constructed by connecting spatially coupled low-density parity-check (SC-LDPC) code chains. Successive (peeling) decoding is considered for the binary erasure channel (BEC). The evolution of the undecoded portion of the bipartite graph remaining after each iteration is analyzed as a dynamical system. When connecting short S… ▽ More In this paper, we analyze the finite-length performance of codes on graphs constructed by connecting spatially coupled low-density parity-check (SC-LDPC) code chains. Successive (peeling) decoding is considered for the binary erasure channel (BEC). The evolution of the undecoded portion of the bipartite graph remaining after each iteration is analyzed as a dynamical system. When connecting short SC-LDPC chains, we show that, in addition to superior iterative decoding thresholds, connected chain ensembles have better finite-length performance than single chain ensembles of the same rate and length. In addition, we present a novel encoding/transmission scheme to improve the performance of a system using long SC-LDPC chains, where, instead of transmitting codewords corresponding to a single SC-LDPC chain independently, we connect consecutive chains in a multi-layer format to form a connected chain ensemble. We refer to such a transmission scheme to as continuous chain (CC) transmission of SC-LDPC codes. We show that CC transmission can be implemented with no significant increase in encoding/decoding complexity or decoding delay with respect a system using a single SC-LDPC code chain for encoding. △ Less

Submitted 28 February, 2014; originally announced February 2014.

Comments: Submitted to IEEE Transactions on Information Theory, February 2014

arXiv:1401.8090 [pdf, other]

Analyzing Finite-length Protograph-based Spatially Coupled LDPC Codes

Authors: Markus Stinner Pablo M. Olmos

Abstract: The peeling decoding for spatially coupled low-density parity-check (SC-LDPC) codes is analyzed for a binary erasure channel. An analytical calculation of the mean evolution of degree-one check nodes of protograph-based SC-LDPC codes is given and an estimate for the covariance evolution of degree-one check nodes is proposed in the stable decoding phase where the decoding wave propagates along the… ▽ More The peeling decoding for spatially coupled low-density parity-check (SC-LDPC) codes is analyzed for a binary erasure channel. An analytical calculation of the mean evolution of degree-one check nodes of protograph-based SC-LDPC codes is given and an estimate for the covariance evolution of degree-one check nodes is proposed in the stable decoding phase where the decoding wave propagates along the chain of coupled codes. Both results are verified numerically. Protograph-based SC-LDPC codes turn out to have a more robust behavior than unstructured random SC-LDPC codes. Using the analytically calculated parameters, the finite- length scaling laws for these constructions are given and verified by numerical simulations. △ Less

Submitted 31 January, 2014; originally announced January 2014.

Comments: 5 pages, 6 figures, submitted to ISIT 2014

arXiv:1201.0715 [pdf, other]

doi 10.1109/TIT.2013.2245494

Tree-Structure Expectation Propagation for LDPC Decoding over the BEC

Authors: Pablo M. Olmos, Juan José Murillo-Fuentes, Fernando Pérez-Cruz

Abstract: We present the tree-structure expectation propagation (Tree-EP) algorithm to decode low-density parity-check (LDPC) codes over discrete memoryless channels (DMCs). EP generalizes belief propagation (BP) in two ways. First, it can be used with any exponential family distribution over the cliques in the graph. Second, it can impose additional constraints on the marginal distributions. We use this se… ▽ More We present the tree-structure expectation propagation (Tree-EP) algorithm to decode low-density parity-check (LDPC) codes over discrete memoryless channels (DMCs). EP generalizes belief propagation (BP) in two ways. First, it can be used with any exponential family distribution over the cliques in the graph. Second, it can impose additional constraints on the marginal distributions. We use this second property to impose pair-wise marginal constraints over pairs of variables connected to a check node of the LDPC code's Tanner graph. Thanks to these additional constraints, the Tree-EP marginal estimates for each variable in the graph are more accurate than those provided by BP. We also reformulate the Tree-EP algorithm for the binary erasure channel (BEC) as a peeling-type algorithm (TEP) and we show that the algorithm has the same computational complexity as BP and it decodes a higher fraction of errors. We describe the TEP decoding process by a set of differential equations that represents the expected residual graph evolution as a function of the code parameters. The solution of these equations is used to predict the TEP decoder performance in both the asymptotic regime and the finite-length regime over the BEC. While the asymptotic threshold of the TEP decoder is the same as the BP decoder for regular and optimized codes, we propose a scaling law (SL) for finite-length LDPC codes, which accurately approximates the TEP improved performance and facilitates its optimization. △ Less

Submitted 13 August, 2012; v1 submitted 3 January, 2012; originally announced January 2012.

Journal ref: IEEE Transactions on Information Theory 2013

arXiv:1107.2229 [pdf, other]

Scaling Behavior of Convolutional LDPC Ensembles over the BEC

Authors: Pablo M. Olmos, Rüdiger Urbanke

Abstract: We study the scaling behavior of coupled sparse graph codes over the binary erasure channel. In particular, let 2L+1 be the length of the coupled chain, let M be the number of variables in each of the 2L + 1 local copies, let l be the number of iterations, let Pb denote the bit error probability, and let ε denote the channel parameter. We are interested in how these quantities scale when we let th… ▽ More We study the scaling behavior of coupled sparse graph codes over the binary erasure channel. In particular, let 2L+1 be the length of the coupled chain, let M be the number of variables in each of the 2L + 1 local copies, let l be the number of iterations, let Pb denote the bit error probability, and let ε denote the channel parameter. We are interested in how these quantities scale when we let the blocklength (2L + 1)M tend to infinity. Based on empirical evidence we show that the threshold saturation phenomenon is rather stable with respect to the scaling of the various parameters and we formulate some general rules of thumb which can serve as a guide for the design of coding systems based on coupled graphs. △ Less

Submitted 12 July, 2011; originally announced July 2011.

arXiv:1009.4287

Tree-Structure Expectation Propagation for LDPC Decoding in Erasure Channels

Authors: Pablo M. Olmos, Juan José Murillo-Fuentes, Fernando Pérez-Cruz

Abstract: In this paper we present a new algorithm, denoted as TEP, to decode low-density parity-check (LDPC) codes over the Binary Erasure Channel (BEC). The TEP decoder is derived applying the expectation propagation (EP) algorithm with a tree- structured approximation. Expectation Propagation (EP) is a generalization to Belief Propagation (BP) in two ways. First, it can be used with any exponential famil… ▽ More In this paper we present a new algorithm, denoted as TEP, to decode low-density parity-check (LDPC) codes over the Binary Erasure Channel (BEC). The TEP decoder is derived applying the expectation propagation (EP) algorithm with a tree- structured approximation. Expectation Propagation (EP) is a generalization to Belief Propagation (BP) in two ways. First, it can be used with any exponential family distribution over the cliques in the graph. Second, it can impose additional constraints on the marginal distributions. We use this second property to impose pair-wise marginal constraints in some check nodes of the LDPC code's Tanner graph. The algorithm has the same computational complexity than BP, but it can decode a higher fraction of errors when applied over the BEC. In this paper, we focus on the asymptotic performance of the TEP decoder, as the block size tends to infinity. We describe the TEP decoder by a set of differential equations that represents the residual graph evolution during the decoding process. The solution of these equations yields the capacity of this decoder for a given LDPC ensemble over the BEC. We show that the achieved capacity with the TEP is higher than the BP capacity, at the same computational complexity. △ Less

Submitted 4 January, 2012; v1 submitted 22 September, 2010; originally announced September 2010.

Comments: This paper has been withdrawn to be replaced by a corrected version under a different title: "Tree-Structure Expectation Propagation for LDPC Decoding over the BEC"

arXiv:1006.1535 [pdf, other]

doi 10.1109/ISIT.2010.5513636

Tree-structure Expectation Propagation for Decoding LDPC codes over Binary Erasure Channels

Authors: Pablo M. Olmos, Juan José Murillo-Fuentes

Abstract: Expectation Propagation is a generalization to Belief Propagation (BP) in two ways. First, it can be used with any exponential family distribution over the cliques in the graph. Second, it can impose additional constraints on the marginal distributions. We use this second property to impose pair-wise marginal distribution constraints in some check nodes of the LDPC Tanner graph. These additional c… ▽ More Expectation Propagation is a generalization to Belief Propagation (BP) in two ways. First, it can be used with any exponential family distribution over the cliques in the graph. Second, it can impose additional constraints on the marginal distributions. We use this second property to impose pair-wise marginal distribution constraints in some check nodes of the LDPC Tanner graph. These additional constraints allow decoding the received codeword when the BP decoder gets stuck. In this paper, we first present the new decoding algorithm, whose complexity is identical to the BP decoder, and we then prove that it is able to decode codewords with a larger fraction of erasures, as the block size tends to infinity. The proposed algorithm can be also understood as a simplification of the Maxwell decoder, but without its computational complexity. We also illustrate that the new algorithm outperforms the BP decoder for finite block-size △ Less

Submitted 8 June, 2010; originally announced June 2010.

Showing 1–41 of 41 results for author: Olmos, P M