-
Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks
Authors:
Mikkel Jordahn,
Pablo M. Olmos
Abstract:
Deep Neural Networks (DNN) have shown great promise in many classification applications, yet are widely known to have poorly calibrated predictions when they are over-parametrized. Improving DNN calibration without comprising on model accuracy is of extreme importance and interest in safety critical applications such as in the health-care sector. In this work, we show that decoupling the training…
▽ More
Deep Neural Networks (DNN) have shown great promise in many classification applications, yet are widely known to have poorly calibrated predictions when they are over-parametrized. Improving DNN calibration without comprising on model accuracy is of extreme importance and interest in safety critical applications such as in the health-care sector. In this work, we show that decoupling the training of feature extraction layers and classification layers in over-parametrized DNN architectures such as Wide Residual Networks (WRN) and Visual Transformers (ViT) significantly improves model calibration whilst retaining accuracy, and at a low training cost. In addition, we show that placing a Gaussian prior on the last hidden layer outputs of a DNN, and training the model variationally in the classification training stage, even further improves calibration. We illustrate these methods improve calibration across ViT and WRN architectures for several image classification benchmark datasets.
△ Less
Submitted 6 May, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Alzheimer's disease detection in PSG signals
Authors:
Lorena Gallego-Viñarás,
Juan Miguel Mira-Tomás,
Anna Michela-Gaeta,
Gerard Pinol-Ripoll,
Ferrán Barbé,
Pablo M. Olmos,
Arrate Muñoz-Barrutia
Abstract:
Alzheimer's disease (AD) and sleep disorders exhibit a close association, where disruptions in sleep patterns often precede the onset of Mild Cognitive Impairment (MCI) and early-stage AD. This study delves into the potential of utilizing sleep-related electroencephalography (EEG) signals acquired through polysomnography (PSG) for the early detection of AD. Our primary focus is on exploring semi-s…
▽ More
Alzheimer's disease (AD) and sleep disorders exhibit a close association, where disruptions in sleep patterns often precede the onset of Mild Cognitive Impairment (MCI) and early-stage AD. This study delves into the potential of utilizing sleep-related electroencephalography (EEG) signals acquired through polysomnography (PSG) for the early detection of AD. Our primary focus is on exploring semi-supervised Deep Learning techniques for the classification of EEG signals due to the clinical scenario characterized by the limited data availability. The methodology entails testing and comparing the performance of semi-supervised SMATE and TapNet models, benchmarked against the supervised XCM model, and unsupervised Hidden Markov Models (HMMs). The study highlights the significance of spatial and temporal analysis capabilities, conducting independent analyses of each sleep stage. Results demonstrate the effectiveness of SMATE in leveraging limited labeled data, achieving stable metrics across all sleep stages, and reaching 90% accuracy in its supervised form. Comparative analyses reveal SMATE's superior performance over TapNet and HMM, while XCM excels in supervised scenarios with an accuracy range of 92 - 94%. These findings underscore the potential of semi-supervised models in early AD detection, particularly in overcoming the challenges associated with the scarcity of labeled data. Ablation tests affirm the critical role of spatio-temporal feature extraction in semi-supervised predictive performance, and t-SNE visualizations validate the model's proficiency in distinguishing AD patterns. Overall, this research contributes to the advancement of AD detection through innovative Deep Learning approaches, highlighting the crucial role of semi-supervised learning in addressing data limitations.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Training Implicit Generative Models via an Invariant Statistical Loss
Authors:
José Manuel de Frutos,
Pablo M. Olmos,
Manuel A. Vázquez,
Joaquín Míguez
Abstract:
Implicit generative models have the capability to learn arbitrary complex data distributions. On the downside, training requires telling apart real data from artificially-generated ones using adversarial discriminators, leading to unstable training and mode-drop** issues. As reported by Zahee et al. (2017), even in the one-dimensional (1D) case, training a generative adversarial network (GAN) is…
▽ More
Implicit generative models have the capability to learn arbitrary complex data distributions. On the downside, training requires telling apart real data from artificially-generated ones using adversarial discriminators, leading to unstable training and mode-drop** issues. As reported by Zahee et al. (2017), even in the one-dimensional (1D) case, training a generative adversarial network (GAN) is challenging and often suboptimal. In this work, we develop a discriminator-free method for training one-dimensional (1D) generative implicit models and subsequently expand this method to accommodate multivariate cases. Our loss function is a discrepancy measure between a suitably chosen transformation of the model samples and a uniform distribution; hence, it is invariant with respect to the true distribution of the data. We first formulate our method for 1D random variables, providing an effective solution for approximate reparameterization of arbitrary complex distributions. Then, we consider the temporal setting (both univariate and multivariate), in which we model the conditional distribution of each sample given the history of the process. We demonstrate through numerical simulations that this new method yields promising results, successfully learning true distributions in a variety of scenarios and mitigating some of the well-known problems that state-of-the-art implicit methods present.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Efficient local linearity regularization to overcome catastrophic overfitting
Authors:
Elias Abad Rocamora,
Fanghui Liu,
Grigorios G. Chrysos,
Pablo M. Olmos,
Volkan Cevher
Abstract:
Catastrophic overfitting (CO) in single-step adversarial training (AT) results in abrupt drops in the adversarial test accuracy (even down to 0%). For models trained with multi-step AT, it has been observed that the loss function behaves locally linearly with respect to the input, this is however lost in single-step AT. To address CO in single-step AT, several methods have been proposed to enforce…
▽ More
Catastrophic overfitting (CO) in single-step adversarial training (AT) results in abrupt drops in the adversarial test accuracy (even down to 0%). For models trained with multi-step AT, it has been observed that the loss function behaves locally linearly with respect to the input, this is however lost in single-step AT. To address CO in single-step AT, several methods have been proposed to enforce local linearity of the loss via regularization. However, these regularization terms considerably slow down training due to Double Backpropagation. Instead, in this work, we introduce a regularization term, called ELLE, to mitigate CO effectively and efficiently in classical AT evaluations, as well as some more difficult regimes, e.g., large adversarial perturbations and long training schedules. Our regularization term can be theoretically linked to curvature of the loss function and is computationally cheaper than previous methods by avoiding Double Backpropagation. Our thorough experimental validation demonstrates that our work does not suffer from CO, even in challenging settings where previous works suffer from it. We also notice that adapting our regularization parameter during training (ELLE-A) greatly improves the performance, specially in large $ε$ setups. Our implementation is available in https://github.com/LIONS-EPFL/ELLE .
△ Less
Submitted 28 February, 2024; v1 submitted 21 January, 2024;
originally announced January 2024.
-
Interpretable Spectral Variational AutoEncoder (ISVAE) for time series clustering
Authors:
Óscar Jiménez Rama,
Fernando Moreno-Pino,
David Ramírez,
Pablo M. Olmos
Abstract:
The best encoding is the one that is interpretable in nature. In this work, we introduce a novel model that incorporates an interpretable bottleneck-termed the Filter Bank (FB)-at the outset of a Variational Autoencoder (VAE). This arrangement compels the VAE to attend on the most informative segments of the input signal, fostering the learning of a novel encoding ${f_0}$ which boasts enhanced int…
▽ More
The best encoding is the one that is interpretable in nature. In this work, we introduce a novel model that incorporates an interpretable bottleneck-termed the Filter Bank (FB)-at the outset of a Variational Autoencoder (VAE). This arrangement compels the VAE to attend on the most informative segments of the input signal, fostering the learning of a novel encoding ${f_0}$ which boasts enhanced interpretability and clusterability over traditional latent spaces. By deliberately constraining the VAE with this FB, we intentionally constrict its capacity to access broad input domain information, promoting the development of an encoding that is discernible, separable, and of reduced dimensionality. The evolutionary learning trajectory of ${f_0}$ further manifests as a dynamic hierarchical tree, offering profound insights into cluster similarities. Additionally, for handling intricate data configurations, we propose a tailored decoder structure that is symmetrically aligned with FB's architecture. Empirical evaluations highlight the superior efficacy of ISVAE, which compares favorably to state-of-the-art results in clustering metrics across real-world datasets.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Variational Mixture of HyperGenerators for Learning Distributions Over Functions
Authors:
Batuhan Koyuncu,
Pablo Sanchez-Martin,
Ignacio Peis,
Pablo M. Olmos,
Isabel Valera
Abstract:
Recent approaches build on implicit neural representations (INRs) to propose generative models over function spaces. However, they are computationally costly when dealing with inference tasks, such as missing data imputation, or directly cannot tackle them. In this work, we propose a novel deep generative model, named VAMoH. VAMoH combines the capabilities of modeling continuous functions using IN…
▽ More
Recent approaches build on implicit neural representations (INRs) to propose generative models over function spaces. However, they are computationally costly when dealing with inference tasks, such as missing data imputation, or directly cannot tackle them. In this work, we propose a novel deep generative model, named VAMoH. VAMoH combines the capabilities of modeling continuous functions using INRs and the inference capabilities of Variational Autoencoders (VAEs). In addition, VAMoH relies on a normalizing flow to define the prior, and a mixture of hypernetworks to parametrize the data log-likelihood. This gives VAMoH a high expressive capability and interpretability. Through experiments on a diverse range of data types, such as images, voxels, and climate data, we show that VAMoH can effectively learn rich distributions over continuous functions. Furthermore, it can perform inference-related tasks, such as conditional super-resolution generation and in-painting, as well or better than previous approaches, while being less computationally demanding.
△ Less
Submitted 20 July, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Sleep Activity Recognition and Characterization from Multi-Source Passively Sensed Data
Authors:
María Martínez-García,
Fernando Moreno-Pino,
Pablo M. Olmos,
Antonio Artés-Rodríguez
Abstract:
Sleep constitutes a key indicator of human health, performance, and quality of life. Sleep deprivation has long been related to the onset, development, and worsening of several mental and metabolic disorders, constituting an essential marker for preventing, evaluating, and treating different health conditions. Sleep Activity Recognition methods can provide indicators to assess, monitor, and charac…
▽ More
Sleep constitutes a key indicator of human health, performance, and quality of life. Sleep deprivation has long been related to the onset, development, and worsening of several mental and metabolic disorders, constituting an essential marker for preventing, evaluating, and treating different health conditions. Sleep Activity Recognition methods can provide indicators to assess, monitor, and characterize subjects' sleep-wake cycles and detect behavioral changes. In this work, we propose a general method that continuously operates on passively sensed data from smartphones to characterize sleep and identify significant sleep episodes. Thanks to their ubiquity, these devices constitute an excellent alternative data source to profile subjects' biorhythms in a continuous, objective, and non-invasive manner, in contrast to traditional sleep assessment methods that usually rely on intrusive and subjective procedures. A Heterogeneous Hidden Markov Model is used to model a discrete latent variable process associated with the Sleep Activity Recognition task in a self-supervised way. We validate our results against sleep metrics reported by tested wearables, proving the effectiveness of the proposed approach and advocating its use to assess sleep without more reliable sources.
△ Less
Submitted 17 January, 2023;
originally announced January 2023.
-
Heterogeneous Hidden Markov Models for Sleep Activity Recognition from Multi-Source Passively Sensed Data
Authors:
Fernando Moreno-Pino,
María Martínez-García,
Pablo M. Olmos,
Antonio Artés-Rodríguez
Abstract:
Psychiatric patients' passive activity monitoring is crucial to detect behavioural shifts in real-time, comprising a tool that helps clinicians supervise patients' evolution over time and enhance the associated treatments' outcomes. Frequently, sleep disturbances and mental health deterioration are closely related, as mental health condition worsening regularly entails shifts in the patients' circ…
▽ More
Psychiatric patients' passive activity monitoring is crucial to detect behavioural shifts in real-time, comprising a tool that helps clinicians supervise patients' evolution over time and enhance the associated treatments' outcomes. Frequently, sleep disturbances and mental health deterioration are closely related, as mental health condition worsening regularly entails shifts in the patients' circadian rhythms. Therefore, Sleep Activity Recognition constitutes a behavioural marker to portray patients' activity cycles and to detect behavioural changes among them. Moreover, mobile passively sensed data captured from smartphones, thanks to these devices' ubiquity, constitute an excellent alternative to profile patients' biorhythm.
In this work, we aim to identify major sleep episodes based on passively sensed data. To do so, a Heterogeneous Hidden Markov Model is proposed to model a discrete latent variable process associated with the Sleep Activity Recognition task in a self-supervised way. We validate our results against sleep metrics reported by clinically tested wearables, proving the effectiveness of the proposed approach.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
Detecting train driveshaft damages using accelerometer signals and Differential Convolutional Neural Networks
Authors:
Antía López Galdo,
Alejandro Guerrero-López,
Pablo M. Olmos,
María Jesús Gómez García
Abstract:
Railway axle maintenance is critical to avoid catastrophic failures. Nowadays, condition monitoring techniques are becoming more prominent in the industry to prevent enormous costs and damage to human lives. This paper proposes the development of a railway axle condition monitoring system based on advanced 2D-Convolutional Neural Network (CNN) architectures applied to time-frequency representation…
▽ More
Railway axle maintenance is critical to avoid catastrophic failures. Nowadays, condition monitoring techniques are becoming more prominent in the industry to prevent enormous costs and damage to human lives. This paper proposes the development of a railway axle condition monitoring system based on advanced 2D-Convolutional Neural Network (CNN) architectures applied to time-frequency representations of vibration signals. For this purpose, several preprocessing steps and different types of Deep Learning (DL) and Machine Learning (ML) architectures are discussed to design an accurate classification system. The resultant system converts the railway axle vibration signals into time-frequency domain representations, i.e., spectrograms, and, thus, trains a two-dimensional CNN to classify them depending on their cracks. The results showed that the proposed approach outperforms several alternative methods tested. The CNN architecture has been tested in 3 different wheelset assemblies, achieving AUC scores of 0.93, 0.86, and 0.75 outperforming any other architecture and showing a high level of reliability when classifying 4 different levels of defects.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Multimodal hierarchical Variational AutoEncoders with Factor Analysis latent space
Authors:
Alejandro Guerrero-López,
Carlos Sevilla-Salcedo,
Vanessa Gómez-Verdejo,
Pablo M. Olmos
Abstract:
Real-world databases are complex and usually require dealing with heterogeneous and mixed data types making the exploitation of shared information between views a critical issue. For this purpose, recent studies based on deep generative models merge all views into a nonlinear complex latent space, which can share information among views. However, this solution limits the model's interpretability,…
▽ More
Real-world databases are complex and usually require dealing with heterogeneous and mixed data types making the exploitation of shared information between views a critical issue. For this purpose, recent studies based on deep generative models merge all views into a nonlinear complex latent space, which can share information among views. However, this solution limits the model's interpretability, flexibility, and modularity. We propose a novel method to overcome these limitations by combining multiple Variational AutoEncoders (VAE) with a Factor Analysis latent space (FA-VAE). We use VAEs to learn a private representation of each heterogeneous view in a continuous latent space. Then, we share the information between views by a low-dimensional latent space using a linear projection matrix. This way, we create a flexible and modular hierarchical dependency between private and shared information in which new views can be incorporated afterwards. Beyond that, we can condition pre-trained models, cross-generate data from different domains, and perform transfer learning between generative models.
△ Less
Submitted 6 October, 2022; v1 submitted 19 July, 2022;
originally announced July 2022.
-
PyHHMM: A Python Library for Heterogeneous Hidden Markov Models
Authors:
Fernando Moreno-Pino,
Emese Sükei,
Pablo M. Olmos,
Antonio Artés-Rodríguez
Abstract:
We introduce PyHHMM, an object-oriented open-source Python implementation of Heterogeneous-Hidden Markov Models (HHMMs). In addition to HMM's basic core functionalities, such as different initialization algorithms and classical observations models, i.e., continuous and multinoulli, PyHHMM distinctively emphasizes features not supported in similar available frameworks: a heterogeneous observation m…
▽ More
We introduce PyHHMM, an object-oriented open-source Python implementation of Heterogeneous-Hidden Markov Models (HHMMs). In addition to HMM's basic core functionalities, such as different initialization algorithms and classical observations models, i.e., continuous and multinoulli, PyHHMM distinctively emphasizes features not supported in similar available frameworks: a heterogeneous observation model, missing data inference, different model order selection criterias, and semi-supervised training. These characteristics result in a feature-rich implementation for researchers working with sequential data. PyHHMM relies on the numpy, scipy, scikit-learn, and seaborn Python packages, and is distributed under the Apache-2.0 License. PyHHMM's source code is publicly available on Github (https://github.com/fmorenopino/HeterogeneousHMM) to facilitate adoptions and future contributions. A detailed documentation (https://pyhhmm.readthedocs.io/en/latest), which covers examples of use and models' theoretical explanation, is available. The package can be installed through the Python Package Index (PyPI), via 'pip install pyhhmm'.
△ Less
Submitted 12 January, 2022;
originally announced January 2022.
-
Multi-task longitudinal forecasting with missing values on Alzheimer's Disease
Authors:
Carlos Sevilla-Salcedo,
Vandad Imani,
Pablo M. Olmos,
Vanessa Gómez-Verdejo,
Jussi Tohka
Abstract:
Machine learning techniques typically applied to dementia forecasting lack in their capabilities to jointly learn several tasks, handle time dependent heterogeneous data and missing values. In this paper, we propose a framework using the recently presented SSHIBA model for jointly learning different tasks on longitudinal data with missing values. The method uses Bayesian variational inference to i…
▽ More
Machine learning techniques typically applied to dementia forecasting lack in their capabilities to jointly learn several tasks, handle time dependent heterogeneous data and missing values. In this paper, we propose a framework using the recently presented SSHIBA model for jointly learning different tasks on longitudinal data with missing values. The method uses Bayesian variational inference to impute missing values and combine information of several views. This way, we can combine different data-views from different time-points in a common latent space and learn the relations between each time-point while simultaneously modelling and predicting several output variables. We apply this model to predict together diagnosis, ventricle volume, and clinical scores in dementia. The results demonstrate that SSHIBA is capable of learning a good imputation of the missing values and outperforming the baselines while simultaneously predicting three different tasks.
△ Less
Submitted 13 January, 2022;
originally announced January 2022.
-
Regularizing Transformers With Deep Probabilistic Layers
Authors:
Aurora Cobo Aguilera,
Pablo Martínez Olmos,
Antonio Artés-Rodríguez,
Fernando Pérez-Cruz
Abstract:
Language models (LM) have grown with non-stop in the last decade, from sequence-to-sequence architectures to the state-of-the-art and utter attention-based Transformers. In this work, we demonstrate how the inclusion of deep generative models within BERT can bring more versatile models, able to impute missing/noisy words with richer text or even improve BLEU score. More precisely, we use a Gaussia…
▽ More
Language models (LM) have grown with non-stop in the last decade, from sequence-to-sequence architectures to the state-of-the-art and utter attention-based Transformers. In this work, we demonstrate how the inclusion of deep generative models within BERT can bring more versatile models, able to impute missing/noisy words with richer text or even improve BLEU score. More precisely, we use a Gaussian Mixture Variational Autoencoder (GMVAE) as a regularizer layer and prove its effectiveness not only in Transformers but also in the most relevant encoder-decoder based LM, seq2seq with and without attention.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
Deep Autoregressive Models with Spectral Attention
Authors:
Fernando Moreno-Pino,
Pablo M. Olmos,
Antonio Artés-Rodríguez
Abstract:
Time series forecasting is an important problem across many domains, playing a crucial role in multiple real-world applications. In this paper, we propose a forecasting architecture that combines deep autoregressive models with a Spectral Attention (SA) module, which merges global and local frequency domain information in the model's embedded space. By characterizing in the spectral domain the emb…
▽ More
Time series forecasting is an important problem across many domains, playing a crucial role in multiple real-world applications. In this paper, we propose a forecasting architecture that combines deep autoregressive models with a Spectral Attention (SA) module, which merges global and local frequency domain information in the model's embedded space. By characterizing in the spectral domain the embedding of the time series as occurrences of a random process, our method can identify global trends and seasonality patterns. Two spectral attention models, global and local to the time series, integrate this information within the forecast and perform spectral filtering to remove time series's noise. The proposed architecture has a number of useful properties: it can be effectively incorporated into well-know forecast architectures, requiring a low number of parameters and producing interpretable results that improve forecasting accuracy. We test the Spectral Attention Autoregressive Model (SAAM) on several well-know forecast datasets, consistently demonstrating that our model compares favorably to state-of-the-art approaches.
△ Less
Submitted 26 December, 2021; v1 submitted 13 July, 2021;
originally announced July 2021.
-
Medical data wrangling with sequential variational autoencoders
Authors:
Daniel Barrejón,
Pablo M. Olmos,
Antonio Artés-Rodríguez
Abstract:
Medical data sets are usually corrupted by noise and missing data. These missing patterns are commonly assumed to be completely random, but in medical scenarios, the reality is that these patterns occur in bursts due to sensors that are off for some time or data collected in a misaligned uneven fashion, among other causes. This paper proposes to model medical data records with heterogeneous data t…
▽ More
Medical data sets are usually corrupted by noise and missing data. These missing patterns are commonly assumed to be completely random, but in medical scenarios, the reality is that these patterns occur in bursts due to sensors that are off for some time or data collected in a misaligned uneven fashion, among other causes. This paper proposes to model medical data records with heterogeneous data types and bursty missing data using sequential variational autoencoders (VAEs). In particular, we propose a new methodology, the Shi-VAE, which extends the capabilities of VAEs to sequential streams of data with missing observations. We compare our model against state-of-the-art solutions in an intensive care unit database (ICU) and a dataset of passive human monitoring. Furthermore, we find that standard error metrics such as RMSE are not conclusive enough to assess temporal models and include in our analysis the cross-correlation between the ground truth and the imputed signal. We show that Shi-VAE achieves the best performance in terms of using both metrics, with lower computational complexity than the GP-VAE model, which is the state-of-the-art method for medical records.
△ Less
Submitted 8 November, 2021; v1 submitted 12 March, 2021;
originally announced March 2021.
-
Unsupervised Learning of Global Factors in Deep Generative Models
Authors:
Ignacio Peis,
Pablo M. Olmos,
Antonio Artés-Rodríguez
Abstract:
We present a novel deep generative model based on non i.i.d. variational autoencoders that captures global dependencies among observations in a fully unsupervised fashion. In contrast to the recent semi-supervised alternatives for global modeling in deep generative models, our approach combines a mixture model in the local or data-dependent space and a global Gaussian latent variable, which lead u…
▽ More
We present a novel deep generative model based on non i.i.d. variational autoencoders that captures global dependencies among observations in a fully unsupervised fashion. In contrast to the recent semi-supervised alternatives for global modeling in deep generative models, our approach combines a mixture model in the local or data-dependent space and a global Gaussian latent variable, which lead us to obtain three particular insights. First, the induced latent global space captures interpretable disentangled representations with no user-defined regularization in the evidence lower bound (as in $β$-VAE and its generalizations). Second, we show that the model performs domain alignment to find correlations and interpolate between different databases. Finally, we study the ability of the global space to discriminate between groups of observations with non-trivial underlying structures, such as face images with shared attributes or defined sequences of digits images.
△ Less
Submitted 16 December, 2020; v1 submitted 15 December, 2020;
originally announced December 2020.
-
Boosting offline handwritten text recognition in historical documents with few labeled lines
Authors:
José Carlos Aradillas,
Juan José Murillo-Fuentes,
Pablo M. Olmos
Abstract:
In this paper, we face the problem of offline handwritten text recognition (HTR) in historical documents when few labeled samples are available and some of them contain errors in the train set. Three main contributions are developed. First we analyze how to perform transfer learning (TL) from a massive database to a smaller historical database, analyzing which layers of the model need a fine-tunin…
▽ More
In this paper, we face the problem of offline handwritten text recognition (HTR) in historical documents when few labeled samples are available and some of them contain errors in the train set. Three main contributions are developed. First we analyze how to perform transfer learning (TL) from a massive database to a smaller historical database, analyzing which layers of the model need a fine-tuning process. Second, we analyze methods to efficiently combine TL and data augmentation (DA). Finally, an algorithm to mitigate the effects of incorrect labelings in the training set is proposed. The methods are analyzed over the ICFHR 2018 competition database, Washington and Parzival. Combining all these techniques, we demonstrate a remarkable reduction of CER (up to 6% in some cases) in the test set with little complexity overhead.
△ Less
Submitted 4 December, 2020;
originally announced December 2020.
-
Robust Sampling in Deep Learning
Authors:
Aurora Cobo Aguilera,
Antonio Artés-Rodríguez,
Fernando Pérez-Cruz,
Pablo Martínez Olmos
Abstract:
Deep learning requires regularization mechanisms to reduce overfitting and improve generalization. We address this problem by a new regularization method based on distributional robust optimization. The key idea is to modify the contribution from each sample for tightening the empirical risk bound. During the stochastic training, the selection of samples is done according to their accuracy in such…
▽ More
Deep learning requires regularization mechanisms to reduce overfitting and improve generalization. We address this problem by a new regularization method based on distributional robust optimization. The key idea is to modify the contribution from each sample for tightening the empirical risk bound. During the stochastic training, the selection of samples is done according to their accuracy in such a way that the worst performed samples are the ones that contribute the most in the optimization. We study different scenarios and show the ones where it can make the convergence faster or increase the accuracy.
△ Less
Submitted 5 June, 2020; v1 submitted 4 June, 2020;
originally announced June 2020.
-
Bayesian Sparse Factor Analysis with Kernelized Observations
Authors:
Carlos Sevilla-Salcedo,
Alejandro Guerrero-López,
Pablo M. Olmos,
Vanessa Gómez-Verdejo
Abstract:
Multi-view problems can be faced with latent variable models since they are able to find low-dimensional projections that fairly capture the correlations among the multiple views that characterise each datum. On the other hand, high-dimensionality and non-linear issues are traditionally handled by kernel methods, inducing a (non)-linear function between the latent projection and the data itself. H…
▽ More
Multi-view problems can be faced with latent variable models since they are able to find low-dimensional projections that fairly capture the correlations among the multiple views that characterise each datum. On the other hand, high-dimensionality and non-linear issues are traditionally handled by kernel methods, inducing a (non)-linear function between the latent projection and the data itself. However, they usually come with scalability issues and exposition to overfitting. Here, we propose merging both approaches into single model so that we can exploit the best features of multi-view latent models and kernel methods and, moreover, overcome their limitations.
In particular, we combine probabilistic factor analysis with what we refer to as kernelized observations, in which the model focuses on reconstructing not the data itself, but its relationship with other data points measured by a kernel function. This model can combine several types of views (kernelized or not), and it can handle heterogeneous data and work in semi-supervised settings. Additionally, by including adequate priors, it can provide compact solutions for the kernelized observations -- based in a automatic selection of Bayesian Relevance Vectors (RVs) -- and can include feature selection capabilities. Using several public databases, we demonstrate the potential of our approach (and its extensions) w.r.t. common multi-view learning models such as kernel canonical correlation analysis or manifold relevance determination.
△ Less
Submitted 27 January, 2021; v1 submitted 1 June, 2020;
originally announced June 2020.
-
Sparse Semi-supervised Heterogeneous Interbattery Bayesian Analysis
Authors:
Carlos Sevilla-Salcedo,
Vanessa Gómez-Verdejo,
Pablo M. Olmos
Abstract:
The Bayesian approach to feature extraction, known as factor analysis (FA), has been widely studied in machine learning to obtain a latent representation of the data. An adequate selection of the probabilities and priors of these bayesian models allows the model to better adapt to the data nature (i.e. heterogeneity, sparsity), obtaining a more representative latent space.
The objective of this…
▽ More
The Bayesian approach to feature extraction, known as factor analysis (FA), has been widely studied in machine learning to obtain a latent representation of the data. An adequate selection of the probabilities and priors of these bayesian models allows the model to better adapt to the data nature (i.e. heterogeneity, sparsity), obtaining a more representative latent space.
The objective of this article is to propose a general FA framework capable of modelling any problem. To do so, we start from the Bayesian Inter-Battery Factor Analysis (BIBFA) model, enhancing it with new functionalities to be able to work with heterogeneous data, include feature selection, and handle missing values as well as semi-supervised problems.
The performance of the proposed model, Sparse Semi-supervised Heterogeneous Interbattery Bayesian Analysis (SSHIBA) has been tested on 4 different scenarios to evaluate each one of its novelties, showing not only a great versatility and an interpretability gain, but also outperforming most of the state-of-the-art algorithms.
△ Less
Submitted 24 January, 2020;
originally announced January 2020.
-
Deep Sequential Models for Suicidal Ideation from Multiple Source Data
Authors:
Ignacio Peis,
Pablo M. Olmos,
Constanza Vera-Varela,
María Luisa Barrigón,
Philippe Courtet,
Enrique Baca-García,
Antonio Artés-Rodríguez
Abstract:
This article presents a novel method for predicting suicidal ideation from Electronic Health Records (EHR) and Ecological Momentary Assessment (EMA) data using deep sequential models. Both EHR longitudinal data and EMA question forms are defined by asynchronous, variable length, randomly-sampled data sequences. In our method, we model each of them with a Recurrent Neural Network (RNN), and both se…
▽ More
This article presents a novel method for predicting suicidal ideation from Electronic Health Records (EHR) and Ecological Momentary Assessment (EMA) data using deep sequential models. Both EHR longitudinal data and EMA question forms are defined by asynchronous, variable length, randomly-sampled data sequences. In our method, we model each of them with a Recurrent Neural Network (RNN), and both sequences are aligned by concatenating the hidden state of each of them using temporal marks. Furthermore, we incorporate attention schemes to improve performance in long sequences and time-independent pre-trained schemes to cope with very short sequences. Using a database of 1023 patients, our experimental results show that the addition of EMA records boosts the system recall to predict the suicidal ideation diagnosis from 48.13% obtained exclusively from EHR-based state-of-the-art methods to 67.78%. Additionally, our method provides interpretability through the t-SNE representation of the latent space. Further, the most relevant input features are identified and interpreted medically.
△ Less
Submitted 6 November, 2019;
originally announced November 2019.
-
Improved BiGAN training with marginal likelihood equalization
Authors:
Pablo Sánchez-Martín,
Pablo M. Olmos,
Fernando Perez-Cruz
Abstract:
We propose a novel training procedure for improving the performance of generative adversarial networks (GANs), especially to bidirectional GANs. First, we enforce that the empirical distribution of the inverse inference network matches the prior distribution, which favors the generator network reproducibility on the seen samples. Second, we have found that the marginal log-likelihood of the sample…
▽ More
We propose a novel training procedure for improving the performance of generative adversarial networks (GANs), especially to bidirectional GANs. First, we enforce that the empirical distribution of the inverse inference network matches the prior distribution, which favors the generator network reproducibility on the seen samples. Second, we have found that the marginal log-likelihood of the samples shows a severe overrepresentation of a certain type of samples. To address this issue, we propose to train the bidirectional GAN using a non-uniform sampling for the mini-batch selection, resulting in improved quality and variety in generated samples measured quantitatively and by visual inspection. We illustrate our new procedure with the well-known CIFAR10, Fashion MNIST and CelebA datasets.
△ Less
Submitted 23 May, 2020; v1 submitted 4 November, 2019;
originally announced November 2019.
-
Spatially Coupled Generalized LDPC Codes: Asymptotic Analysis and Finite Length Scaling
Authors:
David G. M. Mitchell,
Pablo M. Olmos,
Michael Lentmaier,
Daniel J. Costello
Abstract:
Generalized low-density parity-check (GLDPC) codes are a class of LDPC codes in which the standard single parity check (SPC) constraints are replaced by constraints defined by a linear block code. These stronger constraints typically result in improved error floor performance, due to better minimum distance and trap** set properties, at a cost of some increased decoding complexity. In this paper…
▽ More
Generalized low-density parity-check (GLDPC) codes are a class of LDPC codes in which the standard single parity check (SPC) constraints are replaced by constraints defined by a linear block code. These stronger constraints typically result in improved error floor performance, due to better minimum distance and trap** set properties, at a cost of some increased decoding complexity. In this paper, we study spatially coupled generalized low-density parity-check (SC-GLDPC) codes and present a comprehensive analysis of these codes, including: (1) an iterative decoding threshold analysis of SC-GLDPC code ensembles demonstrating capacity approaching thresholds via the threshold saturation effect; (2) an asymptotic analysis of the minimum distance and free distance properties of SC-GLDPC code ensembles, demonstrating that the ensembles are asymptotically good; and (3) an analysis of the finite-length scaling behavior of both GLDPC block codes and SC-GLDPC codes based on a peeling decoder (PD) operating on a binary erasure channel (BEC). Results are compared to GLDPC block codes, and the advantages and disadvantages of SC-GLDPC codes are discussed.
△ Less
Submitted 5 April, 2021; v1 submitted 30 October, 2019;
originally announced October 2019.
-
On Generalized LDPC Codes for 5G Ultra Reliable Communication
Authors:
Yanfang Liu,
Pablo M. Olmos,
David G. M. Mitchell
Abstract:
Generalized low-density parity-check (GLDPC) codes, where single parity-check (SPC) constraint nodes are replaced with generalized constraint (GC) nodes, are a promising class of codes for low latency communication. In this paper, a practical construction of quasi-cyclic (QC) GLDPC codes is proposed, where the proportion of generalized constraints is determined by an asymptotic analysis. We analyz…
▽ More
Generalized low-density parity-check (GLDPC) codes, where single parity-check (SPC) constraint nodes are replaced with generalized constraint (GC) nodes, are a promising class of codes for low latency communication. In this paper, a practical construction of quasi-cyclic (QC) GLDPC codes is proposed, where the proportion of generalized constraints is determined by an asymptotic analysis. We analyze the message passing process and complexity of a GLDPC code over the additive white gaussian noise (AWGN) channel and present a constraint-to-variable update rule based on the specific codewords of the component code. The block error rate (BLER) performance of the GLDPC codes, combined with a complementary outer code, is shown to outperform a variety of state-of-the-art code and decoder designs with suitable lengths and rates for the 5G Ultra Reliable Communication (URC) regime over an additive white gaussian noise (AWGN) channel with quadrature PSK (QPSK) modulation.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
Probabilistic Time of Arrival Localization
Authors:
Fernando Perez-Cruz,
Pablo M. Olmos,
Michael Minyi Zhang,
Howard Huang
Abstract:
In this paper, we take a new approach for time of arrival geo-localization. We show that the main sources of error in metropolitan areas are due to environmental imperfections that bias our solutions, and that we can rely on a probabilistic model to learn and compensate for them. The resulting localization error is validated using measurements from a live LTE cellular network to be less than 10 me…
▽ More
In this paper, we take a new approach for time of arrival geo-localization. We show that the main sources of error in metropolitan areas are due to environmental imperfections that bias our solutions, and that we can rely on a probabilistic model to learn and compensate for them. The resulting localization error is validated using measurements from a live LTE cellular network to be less than 10 meters, representing an order-of-magnitude improvement.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
Probabilistic MIMO Symbol Detection with Expectation Consistency Approximate Inference
Authors:
Javier Cépedes,
Pablo M. Olmos,
Matilde Sánchez-Fernández,
Fernando Pérez-Cruz
Abstract:
In this paper we explore low-complexity probabilistic algorithms for soft symbol detection in high-dimensional multiple-input multiple-output (MIMO) systems. We present a novel algorithm based on the Expectation Consistency (EC) framework, which describes the approximate inference problem as an optimization over a non-convex function. EC generalizes algorithms such as Belief Propagation and Expect…
▽ More
In this paper we explore low-complexity probabilistic algorithms for soft symbol detection in high-dimensional multiple-input multiple-output (MIMO) systems. We present a novel algorithm based on the Expectation Consistency (EC) framework, which describes the approximate inference problem as an optimization over a non-convex function. EC generalizes algorithms such as Belief Propagation and Expectation Propagation. For the MIMO symbol detection problem, we discuss feasible methods to find stationary points of the EC function and explore their tradeoffs between accuracy and speed of convergence. The accuracy is studied, first in terms of input-output mutual information and show that the proposed EC MIMO detector greatly improves state-of-the-art methods, with a complexity order cubic in the number of transmitting antennas. Second, these gains are corroborated by combining the probabilistic output of the EC detector with a low-density parity-check (LDPC) channel code.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
Out-of-Sample Testing for GANs
Authors:
Pablo Sánchez-Martín,
Pablo M. Olmos,
Fernando Pérez-Cruz
Abstract:
We propose a new method to evaluate GANs, namely EvalGAN. EvalGAN relies on a test set to directly measure the reconstruction quality in the original sample space (no auxiliary networks are necessary), and it also computes the (log)likelihood for the reconstructed samples in the test set. Further, EvalGAN is agnostic to the GAN algorithm and the dataset. We decided to test it on three state-of-the…
▽ More
We propose a new method to evaluate GANs, namely EvalGAN. EvalGAN relies on a test set to directly measure the reconstruction quality in the original sample space (no auxiliary networks are necessary), and it also computes the (log)likelihood for the reconstructed samples in the test set. Further, EvalGAN is agnostic to the GAN algorithm and the dataset. We decided to test it on three state-of-the-art GANs over the well-known CIFAR-10 and CelebA datasets.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.
-
Handling Incomplete Heterogeneous Data using VAEs
Authors:
Alfredo Nazabal,
Pablo M. Olmos,
Zoubin Ghahramani,
Isabel Valera
Abstract:
Variational autoencoders (VAEs), as well as other generative models, have been shown to be efficient and accurate for capturing the latent structure of vast amounts of complex high-dimensional data. However, existing VAEs can still not directly handle data that are heterogenous (mixed continuous and discrete) or incomplete (with missing data at random), which is indeed common in real-world applica…
▽ More
Variational autoencoders (VAEs), as well as other generative models, have been shown to be efficient and accurate for capturing the latent structure of vast amounts of complex high-dimensional data. However, existing VAEs can still not directly handle data that are heterogenous (mixed continuous and discrete) or incomplete (with missing data at random), which is indeed common in real-world applications. In this paper, we propose a general framework to design VAEs suitable for fitting incomplete heterogenous data. The proposed HI-VAE includes likelihood models for real-valued, positive real valued, interval, categorical, ordinal and count data, and allows accurate estimation (and potentially imputation) of missing data. Furthermore, HI-VAE presents competitive predictive performance in supervised tasks, outperforming supervised models when trained on incomplete data.
△ Less
Submitted 22 May, 2020; v1 submitted 10 July, 2018;
originally announced July 2018.
-
Boosting Handwriting Text Recognition in Small Databases with Transfer Learning
Authors:
José Carlos Aradillas,
Juan José Murillo-Fuentes,
Pablo M. Olmos
Abstract:
In this paper we deal with the offline handwriting text recognition (HTR) problem with reduced training datasets. Recent HTR solutions based on artificial neural networks exhibit remarkable solutions in referenced databases. These deep learning neural networks are composed of both convolutional (CNN) and long short-term memory recurrent units (LSTM). In addition, connectionist temporal classificat…
▽ More
In this paper we deal with the offline handwriting text recognition (HTR) problem with reduced training datasets. Recent HTR solutions based on artificial neural networks exhibit remarkable solutions in referenced databases. These deep learning neural networks are composed of both convolutional (CNN) and long short-term memory recurrent units (LSTM). In addition, connectionist temporal classification (CTC) is the key to avoid segmentation at character level, greatly facilitating the labeling task. One of the main drawbacks of the CNNLSTM-CTC (CLC) solutions is that they need a considerable part of the text to be transcribed for every type of calligraphy, typically in the order of a few thousands of lines. Furthermore, in some scenarios the text to transcribe is not that long, e.g. in the Washington database. The CLC typically overfits for this reduced number of training samples. Our proposal is based on the transfer learning (TL) from the parameters learned with a bigger database. We first investigate, for a reduced and fixed number of training samples, 350 lines, how the learning from a large database, the IAM, can be transferred to the learning of the CLC of a reduced database, Washington. We focus on which layers of the network could be not re-trained. We conclude that the best solution is to re-train the whole CLC parameters initialized to the values obtained after the training of the CLC from the larger database. We also investigate results when the training size is further reduced. The differences in the CER are more remarkable when training with just 350 lines, a CER of 3.3% is achieved with TL while we have a CER of 18.2% when training from scratch. As a byproduct, the learning times are quite reduced. Similar good results are obtained from the Parzival database when trained with this reduced number of lines and this new approach.
△ Less
Submitted 4 April, 2018;
originally announced April 2018.
-
Turbo EP-based Equalization: a Filter-Type Implementation
Authors:
Irene Santos,
Juan José Murillo-Fuentes,
Eva Arias-de-Reyna,
Pablo M. Olmos
Abstract:
This manuscript has been submitted to Transactions on Communications on September 7, 2017; revised on January 10, 2018 and March 27, 2018; and accepted on April 25, 2018
We propose a novel filter-type equalizer to improve the solution of the linear minimum-mean squared-error (LMMSE) turbo equalizer, with computational complexity constrained to be quadratic in the filter length. When high-order m…
▽ More
This manuscript has been submitted to Transactions on Communications on September 7, 2017; revised on January 10, 2018 and March 27, 2018; and accepted on April 25, 2018
We propose a novel filter-type equalizer to improve the solution of the linear minimum-mean squared-error (LMMSE) turbo equalizer, with computational complexity constrained to be quadratic in the filter length. When high-order modulations and/or large memory channels are used the optimal BCJR equalizer is unavailable, due to its computational complexity. In this scenario, the filter-type LMMSE turbo equalization exhibits a good performance compared to other approximations. In this paper, we show that this solution can be significantly improved by using expectation propagation (EP) in the estimation of the a posteriori probabilities. First, it yields a more accurate estimation of the extrinsic distribution to be sent to the channel decoder. Second, compared to other solutions based on EP the computational complexity of the proposed solution is constrained to be quadratic in the length of the finite impulse response (FIR). In addition, we review previous EP-based turbo equalization implementations. Instead of considering default uniform priors we exploit the outputs of the decoder. Some simulation results are included to show that this new EP-based filter remarkably outperforms the turbo approach of previous versions of the EP algorithm and also improves the LMMSE solution, with and without turbo equalization.
△ Less
Submitted 21 December, 2018; v1 submitted 22 November, 2017;
originally announced November 2017.
-
A Probabilistic Peeling Decoder to Efficiently Analyze Generalized LDPC Codes Over the BEC
Authors:
Yanfang Liu,
Pablo M. Olmos,
Tobias Koch
Abstract:
In this paper, we analyze the tradeoff between coding rate and asymptotic performance of a class of generalized low-density parity-check (GLDPC) codes constructed by including a certain fraction of generalized constraint (GC) nodes in the graph. The rate of the GLDPC ensemble is bounded using classical results on linear block codes, namely Hamming bound and Varshamov bound. We also study the impac…
▽ More
In this paper, we analyze the tradeoff between coding rate and asymptotic performance of a class of generalized low-density parity-check (GLDPC) codes constructed by including a certain fraction of generalized constraint (GC) nodes in the graph. The rate of the GLDPC ensemble is bounded using classical results on linear block codes, namely Hamming bound and Varshamov bound. We also study the impact of the decoding method used at GC nodes. To incorporate both bounded-distance (BD) and Maximum Likelihood (ML) decoding at GC nodes into our analysis without resorting on multi-edge type of degree distributions (DDs), we propose the probabilistic peeling decoding (P-PD) algorithm, which models the decoding step at every GC node as an instance of a Bernoulli random variable with a successful decoding probability that depends on both the GC block code as well as its decoding algorithm. The P-PD asymptotic performance over the BEC can be efficiently predicted using standard techniques for LDPC codes such as density evolution (DE) or the differential equation method. Furthermore, for a class of GLDPC ensembles, we demonstrate that the simulated P-PD performance accurately predicts the actual performance of the GLPDC code under ML decoding at GC nodes. We illustrate our analysis for GLDPC code ensembles with regular and irregular DDs. In all cases, we show that a large fraction of GC nodes is required to reduce the original gap to capacity, but the optimal fraction is strictly smaller than one. We then consider techniques to further reduce the gap to capacity by means of random puncturing, and the inclusion of a certain fraction of generalized variable nodes in the graph.
△ Less
Submitted 12 September, 2018; v1 submitted 4 September, 2017;
originally announced September 2017.
-
Continuous Transmission of Spatially-Coupled LDPC Code Chains
Authors:
Pablo M. Olmos,
David G. M. Mitchell,
Dmitri Truhachev,
Daniel J. Costello Jr
Abstract:
We propose a novel encoding/transmission scheme called continuous chain (CC) transmission that is able to improve the finite-length performance of a system using spatially-coupled low-density parity-check (SC-LDPC) codes. In CC transmission, instead of transmitting a sequence of independent codewords from a terminated SC-LDPC code chain, we connect multiple chains in a layered format, where encodi…
▽ More
We propose a novel encoding/transmission scheme called continuous chain (CC) transmission that is able to improve the finite-length performance of a system using spatially-coupled low-density parity-check (SC-LDPC) codes. In CC transmission, instead of transmitting a sequence of independent codewords from a terminated SC-LDPC code chain, we connect multiple chains in a layered format, where encoding, transmission, and decoding are now performed in a continuous fashion. The connections between chains are created at specific points, chosen to improve the finite-length performance of the code structure under iterative decoding. We describe the design of CC schemes for different SC-LDPC code ensembles constructed from protographs: a (J,K)-regular SC-LDPC code chain, a spatially-coupled repeat-accumulate (SC-RA) code, and a spatially-coupled accumulate-repeat-jagged-accumulate (SC- ARJA) code. In all cases, significant performance improvements are reported and, in addition, it is shown that using CC transmission only requires a small increase in decoding complexity and decoding delay with respect to a system employing a single SC-LDPC code chain for transmission.
△ Less
Submitted 2 October, 2019; v1 submitted 7 June, 2016;
originally announced June 2016.
-
Finite-length scaling based on belief propagation for spatially coupled LDPC codes
Authors:
Markus Stinner,
Luca Barletta,
Pablo M. Olmos
Abstract:
The equivalence of peeling decoding (PD) and Belief Propagation (BP) for low-density parity-check (LDPC) codes over the binary erasure channel is analyzed. Modifying the scheduling for PD, it is shown that exactly the same variable nodes (VNs) are resolved in every iteration than with BP. The decrease of erased VNs during the decoding process is analyzed instead of resolvable equations. This quant…
▽ More
The equivalence of peeling decoding (PD) and Belief Propagation (BP) for low-density parity-check (LDPC) codes over the binary erasure channel is analyzed. Modifying the scheduling for PD, it is shown that exactly the same variable nodes (VNs) are resolved in every iteration than with BP. The decrease of erased VNs during the decoding process is analyzed instead of resolvable equations. This quantity can also be derived with density evolution, resulting in a drastic decrease in complexity. Finally, a scaling law using this quantity is established for spatially coupled LDPC codes.
△ Less
Submitted 18 April, 2016;
originally announced April 2016.
-
On Distributed Storage Allocations for Memory-Limited Systems
Authors:
Iryna Andriyanova,
Pablo M. Olmos
Abstract:
In this paper we consider distributed allocation problems with memory constraint limits. Firstly, we propose a tractable relaxation to the problem of optimal symmetric allocations from [1]. The approximated problem is based on the Q-error function, and its solution approaches the solution of the initial problem, as the number of storage nodes in the network grows. Secondly, exploiting this relaxat…
▽ More
In this paper we consider distributed allocation problems with memory constraint limits. Firstly, we propose a tractable relaxation to the problem of optimal symmetric allocations from [1]. The approximated problem is based on the Q-error function, and its solution approaches the solution of the initial problem, as the number of storage nodes in the network grows. Secondly, exploiting this relaxation, we are able to formulate and to solve the problem for storage allocations for memory-limited DSS storing and arbitrary memory profiles. Finally, we discuss the extension to the case of multiple data objects, stored in the DSS.
△ Less
Submitted 16 April, 2015;
originally announced April 2015.
-
A Scaling Law to Predict the Finite-Length Performance of Spatially-Coupled LDPC Codes
Authors:
Pablo M. Olmos,
Rüdiger Urbanke
Abstract:
Spatially-coupled LDPC codes are known to have excellent asymptotic properties. Much less is known regarding their finite-length performance. We propose a scaling law to predict the error probability of finite-length spatially-coupled ensembles when transmission takes place over the binary erasure channel. We discuss how the parameters of the scaling law are connected to fundamental quantities app…
▽ More
Spatially-coupled LDPC codes are known to have excellent asymptotic properties. Much less is known regarding their finite-length performance. We propose a scaling law to predict the error probability of finite-length spatially-coupled ensembles when transmission takes place over the binary erasure channel. We discuss how the parameters of the scaling law are connected to fundamental quantities appearing in the asymptotic analysis of these ensembles and we verify that the predictions of the scaling law fit well to the data derived from simulations over a wide range of parameters. The ultimate goal of this line of research is to develop analytic tools for the design of spatially-coupled LDPC codes under practical constraints.
△ Less
Submitted 25 May, 2015; v1 submitted 23 April, 2014;
originally announced April 2014.
-
Improving the Finite-Length Performance of Spatially Coupled LDPC Codes by Connecting Multiple Code Chains
Authors:
Pablo M. Olmos,
David G. M. Mitchell,
Dmitri Truhachev,
Daniel J. Costello Jr
Abstract:
In this paper, we analyze the finite-length performance of codes on graphs constructed by connecting spatially coupled low-density parity-check (SC-LDPC) code chains. Successive (peeling) decoding is considered for the binary erasure channel (BEC). The evolution of the undecoded portion of the bipartite graph remaining after each iteration is analyzed as a dynamical system. When connecting short S…
▽ More
In this paper, we analyze the finite-length performance of codes on graphs constructed by connecting spatially coupled low-density parity-check (SC-LDPC) code chains. Successive (peeling) decoding is considered for the binary erasure channel (BEC). The evolution of the undecoded portion of the bipartite graph remaining after each iteration is analyzed as a dynamical system. When connecting short SC-LDPC chains, we show that, in addition to superior iterative decoding thresholds, connected chain ensembles have better finite-length performance than single chain ensembles of the same rate and length. In addition, we present a novel encoding/transmission scheme to improve the performance of a system using long SC-LDPC chains, where, instead of transmitting codewords corresponding to a single SC-LDPC chain independently, we connect consecutive chains in a multi-layer format to form a connected chain ensemble. We refer to such a transmission scheme to as continuous chain (CC) transmission of SC-LDPC codes. We show that CC transmission can be implemented with no significant increase in encoding/decoding complexity or decoding delay with respect a system using a single SC-LDPC code chain for encoding.
△ Less
Submitted 28 February, 2014;
originally announced February 2014.
-
Analyzing Finite-length Protograph-based Spatially Coupled LDPC Codes
Authors:
Markus Stinner Pablo M. Olmos
Abstract:
The peeling decoding for spatially coupled low-density parity-check (SC-LDPC) codes is analyzed for a binary erasure channel. An analytical calculation of the mean evolution of degree-one check nodes of protograph-based SC-LDPC codes is given and an estimate for the covariance evolution of degree-one check nodes is proposed in the stable decoding phase where the decoding wave propagates along the…
▽ More
The peeling decoding for spatially coupled low-density parity-check (SC-LDPC) codes is analyzed for a binary erasure channel. An analytical calculation of the mean evolution of degree-one check nodes of protograph-based SC-LDPC codes is given and an estimate for the covariance evolution of degree-one check nodes is proposed in the stable decoding phase where the decoding wave propagates along the chain of coupled codes. Both results are verified numerically. Protograph-based SC-LDPC codes turn out to have a more robust behavior than unstructured random SC-LDPC codes. Using the analytically calculated parameters, the finite- length scaling laws for these constructions are given and verified by numerical simulations.
△ Less
Submitted 31 January, 2014;
originally announced January 2014.
-
Tree-Structure Expectation Propagation for LDPC Decoding over the BEC
Authors:
Pablo M. Olmos,
Juan José Murillo-Fuentes,
Fernando Pérez-Cruz
Abstract:
We present the tree-structure expectation propagation (Tree-EP) algorithm to decode low-density parity-check (LDPC) codes over discrete memoryless channels (DMCs). EP generalizes belief propagation (BP) in two ways. First, it can be used with any exponential family distribution over the cliques in the graph. Second, it can impose additional constraints on the marginal distributions. We use this se…
▽ More
We present the tree-structure expectation propagation (Tree-EP) algorithm to decode low-density parity-check (LDPC) codes over discrete memoryless channels (DMCs). EP generalizes belief propagation (BP) in two ways. First, it can be used with any exponential family distribution over the cliques in the graph. Second, it can impose additional constraints on the marginal distributions. We use this second property to impose pair-wise marginal constraints over pairs of variables connected to a check node of the LDPC code's Tanner graph. Thanks to these additional constraints, the Tree-EP marginal estimates for each variable in the graph are more accurate than those provided by BP. We also reformulate the Tree-EP algorithm for the binary erasure channel (BEC) as a peeling-type algorithm (TEP) and we show that the algorithm has the same computational complexity as BP and it decodes a higher fraction of errors. We describe the TEP decoding process by a set of differential equations that represents the expected residual graph evolution as a function of the code parameters. The solution of these equations is used to predict the TEP decoder performance in both the asymptotic regime and the finite-length regime over the BEC. While the asymptotic threshold of the TEP decoder is the same as the BP decoder for regular and optimized codes, we propose a scaling law (SL) for finite-length LDPC codes, which accurately approximates the TEP improved performance and facilitates its optimization.
△ Less
Submitted 13 August, 2012; v1 submitted 3 January, 2012;
originally announced January 2012.
-
Scaling Behavior of Convolutional LDPC Ensembles over the BEC
Authors:
Pablo M. Olmos,
Rüdiger Urbanke
Abstract:
We study the scaling behavior of coupled sparse graph codes over the binary erasure channel. In particular, let 2L+1 be the length of the coupled chain, let M be the number of variables in each of the 2L + 1 local copies, let l be the number of iterations, let Pb denote the bit error probability, and let ε denote the channel parameter. We are interested in how these quantities scale when we let th…
▽ More
We study the scaling behavior of coupled sparse graph codes over the binary erasure channel. In particular, let 2L+1 be the length of the coupled chain, let M be the number of variables in each of the 2L + 1 local copies, let l be the number of iterations, let Pb denote the bit error probability, and let ε denote the channel parameter. We are interested in how these quantities scale when we let the blocklength (2L + 1)M tend to infinity. Based on empirical evidence we show that the threshold saturation phenomenon is rather stable with respect to the scaling of the various parameters and we formulate some general rules of thumb which can serve as a guide for the design of coding systems based on coupled graphs.
△ Less
Submitted 12 July, 2011;
originally announced July 2011.
-
Tree-Structure Expectation Propagation for LDPC Decoding in Erasure Channels
Authors:
Pablo M. Olmos,
Juan José Murillo-Fuentes,
Fernando Pérez-Cruz
Abstract:
In this paper we present a new algorithm, denoted as TEP, to decode low-density parity-check (LDPC) codes over the Binary Erasure Channel (BEC). The TEP decoder is derived applying the expectation propagation (EP) algorithm with a tree- structured approximation. Expectation Propagation (EP) is a generalization to Belief Propagation (BP) in two ways. First, it can be used with any exponential famil…
▽ More
In this paper we present a new algorithm, denoted as TEP, to decode low-density parity-check (LDPC) codes over the Binary Erasure Channel (BEC). The TEP decoder is derived applying the expectation propagation (EP) algorithm with a tree- structured approximation. Expectation Propagation (EP) is a generalization to Belief Propagation (BP) in two ways. First, it can be used with any exponential family distribution over the cliques in the graph. Second, it can impose additional constraints on the marginal distributions. We use this second property to impose pair-wise marginal constraints in some check nodes of the LDPC code's Tanner graph. The algorithm has the same computational complexity than BP, but it can decode a higher fraction of errors when applied over the BEC. In this paper, we focus on the asymptotic performance of the TEP decoder, as the block size tends to infinity. We describe the TEP decoder by a set of differential equations that represents the residual graph evolution during the decoding process. The solution of these equations yields the capacity of this decoder for a given LDPC ensemble over the BEC. We show that the achieved capacity with the TEP is higher than the BP capacity, at the same computational complexity.
△ Less
Submitted 4 January, 2012; v1 submitted 22 September, 2010;
originally announced September 2010.
-
Tree-structure Expectation Propagation for Decoding LDPC codes over Binary Erasure Channels
Authors:
Pablo M. Olmos,
Juan José Murillo-Fuentes
Abstract:
Expectation Propagation is a generalization to Belief Propagation (BP) in two ways. First, it can be used with any exponential family distribution over the cliques in the graph. Second, it can impose additional constraints on the marginal distributions. We use this second property to impose pair-wise marginal distribution constraints in some check nodes of the LDPC Tanner graph. These additional c…
▽ More
Expectation Propagation is a generalization to Belief Propagation (BP) in two ways. First, it can be used with any exponential family distribution over the cliques in the graph. Second, it can impose additional constraints on the marginal distributions. We use this second property to impose pair-wise marginal distribution constraints in some check nodes of the LDPC Tanner graph. These additional constraints allow decoding the received codeword when the BP decoder gets stuck. In this paper, we first present the new decoding algorithm, whose complexity is identical to the BP decoder, and we then prove that it is able to decode codewords with a larger fraction of erasures, as the block size tends to infinity. The proposed algorithm can be also understood as a simplification of the Maxwell decoder, but without its computational complexity. We also illustrate that the new algorithm outperforms the BP decoder for finite block-size
△ Less
Submitted 8 June, 2010;
originally announced June 2010.