Search | arXiv e-print repository

Target Layer Regularization for Continual Learning Using Cramer-Wold Generator

Authors: Marcin Mazur, Łukasz Pustelnik, Szymon Knop, Patryk Pagacz, Przemysław Spurek

Abstract: We propose an effective regularization strategy (CW-TaLaR) for solving continual learning problems. It uses a penalizing term expressed by the Cramer-Wold distance between two probability distributions defined on a target layer of an underlying neural network that is shared by all tasks, and the simple architecture of the Cramer-Wold generator for modeling output data representation. Our strategy… ▽ More We propose an effective regularization strategy (CW-TaLaR) for solving continual learning problems. It uses a penalizing term expressed by the Cramer-Wold distance between two probability distributions defined on a target layer of an underlying neural network that is shared by all tasks, and the simple architecture of the Cramer-Wold generator for modeling output data representation. Our strategy preserves target layer distribution while learning a new task but does not require remembering previous tasks' datasets. We perform experiments involving several common supervised frameworks, which prove the competitiveness of the CW-TaLaR method in comparison to a few existing state-of-the-art continual learning models. △ Less

Submitted 15 November, 2021; originally announced November 2021.

Comments: The paper is under consideration at Computer Vision and Image Understanding

arXiv:2009.07327 [pdf, other]

Generative models with kernel distance in data space

Authors: Szymon Knop, Marcin Mazur, Przemysław Spurek, Jacek Tabor, Igor Podolak

Abstract: Generative models dealing with modeling a~joint data distribution are generally either autoencoder or GAN based. Both have their pros and cons, generating blurry images or being unstable in training or prone to mode collapse phenomenon, respectively. The objective of this paper is to construct a~model situated between above architectures, one that does not inherit their main weaknesses. The propos… ▽ More Generative models dealing with modeling a~joint data distribution are generally either autoencoder or GAN based. Both have their pros and cons, generating blurry images or being unstable in training or prone to mode collapse phenomenon, respectively. The objective of this paper is to construct a~model situated between above architectures, one that does not inherit their main weaknesses. The proposed LCW generator (Latent Cramer-Wold generator) resembles a classical GAN in transforming Gaussian noise into data space. What is of utmost importance, instead of a~discriminator, LCW generator uses kernel distance. No adversarial training is utilized, hence the name generator. It is trained in two phases. First, an autoencoder based architecture, using kernel measures, is built to model a manifold of data. We propose a Latent Trick map** a Gaussian to latent in order to get the final model. This results in very competitive FID values. △ Less

Submitted 15 September, 2020; originally announced September 2020.

arXiv:2002.07897 [pdf, other]

LocoGAN -- Locally Convolutional GAN

Authors: Łukasz Struski, Szymon Knop, Jacek Tabor, Wiktor Daniec, Przemysław Spurek

Abstract: In the paper we construct a fully convolutional GAN model: LocoGAN, which latent space is given by noise-like images of possibly different resolutions. The learning is local, i.e. we process not the whole noise-like image, but the sub-images of a fixed size. As a consequence LocoGAN can produce images of arbitrary dimensions e.g. LSUN bedroom data set. Another advantage of our approach comes from… ▽ More In the paper we construct a fully convolutional GAN model: LocoGAN, which latent space is given by noise-like images of possibly different resolutions. The learning is local, i.e. we process not the whole noise-like image, but the sub-images of a fixed size. As a consequence LocoGAN can produce images of arbitrary dimensions e.g. LSUN bedroom data set. Another advantage of our approach comes from the fact that we use the position channels, which allows the generation of fully periodic (e.g. cylindrical panoramic images) or almost periodic ,,infinitely long" images (e.g. wall-papers). △ Less

Submitted 2 November, 2023; v1 submitted 18 February, 2020; originally announced February 2020.

arXiv:1905.12947 [pdf, other]

One-element Batch Training by Moving Window

Authors: Przemysław Spurek, Szymon Knop, Jacek Tabor, Igor Podolak, Bartosz Wójcik

Abstract: Several deep models, esp. the generative, compare the samples from two distributions (e.g. WAE like AutoEncoder models, set-processing deep networks, etc) in their cost functions. Using all these methods one cannot train the model directly taking small size (in extreme -- one element) batches, due to the fact that samples are to be compared. We propose a generic approach to training such models… ▽ More Several deep models, esp. the generative, compare the samples from two distributions (e.g. WAE like AutoEncoder models, set-processing deep networks, etc) in their cost functions. Using all these methods one cannot train the model directly taking small size (in extreme -- one element) batches, due to the fact that samples are to be compared. We propose a generic approach to training such models using one-element mini-batches. The idea is based on splitting the batch in latent into parts: previous, i.e. historical, elements used for latent space distribution matching and the current ones, used both for latent distribution computation and the minimization process. Due to the smaller memory requirements, this allows to train networks on higher resolution images then in the classical approach. △ Less

Submitted 31 May, 2019; v1 submitted 30 May, 2019; originally announced May 2019.

arXiv:1901.10417 [pdf, other]

Sliced generative models

Authors: Szymon Knop, Marcin Mazur, Jacek Tabor, Igor Podolak, Przemysław Spurek

Abstract: In this paper we discuss a class of AutoEncoder based generative models based on one dimensional sliced approach. The idea is based on the reduction of the discrimination between samples to one-dimensional case. Our experiments show that methods can be divided into two groups. First consists of methods which are a modification of standard normality tests, while the second is based on classical dis… ▽ More In this paper we discuss a class of AutoEncoder based generative models based on one dimensional sliced approach. The idea is based on the reduction of the discrimination between samples to one-dimensional case. Our experiments show that methods can be divided into two groups. First consists of methods which are a modification of standard normality tests, while the second is based on classical distances between samples. It turns out that both groups are correct generative models, but the second one gives a slightly faster decrease rate of Fréchet Inception Distance (FID). △ Less

Submitted 29 January, 2019; originally announced January 2019.

Comments: 11 pages, 4 figures, conference

arXiv:1805.09235 [pdf, other]

Cramer-Wold AutoEncoder

Authors: Szymon Knop, Jacek Tabor, Przemysław Spurek, Igor Podolak, Marcin Mazur, Stanisław Jastrzębski

Abstract: We propose a new generative model, Cramer-Wold Autoencoder (CWAE). Following WAE, we directly encourage normality of the latent space. Our paper uses also the recent idea from Sliced WAE (SWAE) model, which uses one-dimensional projections as a method of verifying closeness of two distributions. The crucial new ingredient is the introduction of a new (Cramer-Wold) metric in the space of densities,… ▽ More We propose a new generative model, Cramer-Wold Autoencoder (CWAE). Following WAE, we directly encourage normality of the latent space. Our paper uses also the recent idea from Sliced WAE (SWAE) model, which uses one-dimensional projections as a method of verifying closeness of two distributions. The crucial new ingredient is the introduction of a new (Cramer-Wold) metric in the space of densities, which replaces the Wasserstein metric used in SWAE. We show that the Cramer-Wold metric between Gaussian mixtures is given by a simple analytic formula, which results in the removal of sampling necessary to estimate the cost function in WAE and SWAE models. As a consequence, while drastically simplifying the optimization procedure, CWAE produces samples of a matching perceptual quality to other SOTA models. △ Less

Submitted 2 July, 2019; v1 submitted 23 May, 2018; originally announced May 2018.

Journal ref: Journal of Machine Learning Research, 21, 164, 1-28 2020

Showing 1–6 of 6 results for author: Knop, S