Search | arXiv e-print repository

arXiv:1911.10885 [pdf, other]

Improving VAE generations of multimodal data through data-dependent conditional priors

Authors: Frantzeska Lavda, Magda Gregorová, Alexandros Kalousis

Abstract: One of the major shortcomings of variational autoencoders is the inability to produce generations from the individual modalities of data originating from mixture distributions. This is primarily due to the use of a simple isotropic Gaussian as the prior for the latent code in the ancestral sampling procedure for the data generations. We propose a novel formulation of variational autoencoders, cond… ▽ More One of the major shortcomings of variational autoencoders is the inability to produce generations from the individual modalities of data originating from mixture distributions. This is primarily due to the use of a simple isotropic Gaussian as the prior for the latent code in the ancestral sampling procedure for the data generations. We propose a novel formulation of variational autoencoders, conditional prior VAE (CP-VAE), which learns to differentiate between the individual mixture components and therefore allows for generations from the distributional data clusters. We assume a two-level generative process with a continuous (Gaussian) latent variable sampled conditionally on a discrete (categorical) latent component. The new variational objective naturally couples the learning of the posterior and prior conditionals, and the learning of the latent categories encoding the multimodality of the original data in an unsupervised manner. The data-dependent conditional priors are then used to sample the continuous latent code when generating new samples from the individual mixture components corresponding to the multimodal structure of the original data. Our experimental results illustrate the generative performance of our new model comparing to multiple baselines. △ Less

Submitted 25 November, 2019; originally announced November 2019.

arXiv:1903.10978 [pdf]

Sparse Learning for Variable Selection with Structures and Nonlinearities

Authors: Magda Gregorova

Abstract: In this thesis we discuss machine learning methods performing automated variable selection for learning sparse predictive models. There are multiple reasons for promoting sparsity in the predictive models. By relying on a limited set of input variables the models naturally counteract the overfitting problem ubiquitous in learning from finite sets of training points. Sparse models are cheaper to us… ▽ More In this thesis we discuss machine learning methods performing automated variable selection for learning sparse predictive models. There are multiple reasons for promoting sparsity in the predictive models. By relying on a limited set of input variables the models naturally counteract the overfitting problem ubiquitous in learning from finite sets of training points. Sparse models are cheaper to use for predictions, they usually require lower computational resources and by relying on smaller sets of inputs can possibly reduce costs for data collection and storage. Sparse models can also contribute to better understanding of the investigated phenomenons as they are easier to interpret than full models. △ Less

Submitted 26 March, 2019; originally announced March 2019.

Comments: PhD thesis

arXiv:1810.10612 [pdf, other]

Continual Classification Learning Using Generative Models

Authors: Frantzeska Lavda, Jason Ramapuram, Magda Gregorova, Alexandros Kalousis

Abstract: Continual learning is the ability to sequentially learn over time by accommodating knowledge while retaining previously learned experiences. Neural networks can learn multiple tasks when trained on them jointly, but cannot maintain performance on previously learned tasks when tasks are presented one at a time. This problem is called catastrophic forgetting. In this work, we propose a classificatio… ▽ More Continual learning is the ability to sequentially learn over time by accommodating knowledge while retaining previously learned experiences. Neural networks can learn multiple tasks when trained on them jointly, but cannot maintain performance on previously learned tasks when tasks are presented one at a time. This problem is called catastrophic forgetting. In this work, we propose a classification model that learns continuously from sequentially observed tasks, while preventing catastrophic forgetting. We build on the lifelong generative capabilities of [10] and extend it to the classification setting by deriving a new variational bound on the joint log likelihood, $\log p(x; y)$. △ Less

Submitted 24 October, 2018; originally announced October 2018.

Comments: 5 pages, 4 figures, under review in Continual learning Workshop NIPS 2018

arXiv:1805.06258 [pdf, other]

Structured nonlinear variable selection

Authors: Magda Gregorová, Alexandros Kalousis, Stéphane Marchand-Maillet

Abstract: We investigate structured sparsity methods for variable selection in regression problems where the target depends nonlinearly on the inputs. We focus on general nonlinear functions not limiting a priori the function space to additive models. We propose two new regularizers based on partial derivatives as nonlinear equivalents of group lasso and elastic net. We formulate the problem within the fram… ▽ More We investigate structured sparsity methods for variable selection in regression problems where the target depends nonlinearly on the inputs. We focus on general nonlinear functions not limiting a priori the function space to additive models. We propose two new regularizers based on partial derivatives as nonlinear equivalents of group lasso and elastic net. We formulate the problem within the framework of learning in reproducing kernel Hilbert spaces and show how the variational problem can be reformulated into a more practical finite dimensional equivalent. We develop a new algorithm derived from the ADMM principles that relies solely on closed forms of the proximal operators. We explore the empirical properties of our new algorithm for Nonlinear Variable Selection based on Derivatives (NVSD) on a set of experiments and confirm favourable properties of our structured-sparsity models and the algorithm in terms of both prediction and variable selection accuracy. △ Less

Submitted 16 May, 2018; originally announced May 2018.

Comments: Accepted to UAI2018

arXiv:1804.07169 [pdf, ps, other]

Large-scale Nonlinear Variable Selection via Kernel Random Features

Authors: Magda Gregorová, Jason Ramapuram, Alexandros Kalousis, Stéphane Marchand-Maillet

Abstract: We propose a new method for input variable selection in nonlinear regression. The method is embedded into a kernel regression machine that can model general nonlinear functions, not being a priori limited to additive models. This is the first kernel-based variable selection method applicable to large datasets. It sidesteps the typical poor scaling properties of kernel methods by map** the inputs… ▽ More We propose a new method for input variable selection in nonlinear regression. The method is embedded into a kernel regression machine that can model general nonlinear functions, not being a priori limited to additive models. This is the first kernel-based variable selection method applicable to large datasets. It sidesteps the typical poor scaling properties of kernel methods by map** the inputs into a relatively low-dimensional space of random features. The algorithm discovers the variables relevant for the regression task together with learning the prediction model through learning the appropriate nonlinear random feature maps. We demonstrate the outstanding performance of our method on a set of large-scale synthetic and real datasets. △ Less

Submitted 1 September, 2018; v1 submitted 19 April, 2018; originally announced April 2018.

Comments: Final version for proceedings of ECML/PKDD 2018

arXiv:1710.00569 [pdf, ps, other]

Learning Predictive Leading Indicators for Forecasting Time Series Systems with Unknown Clusters of Forecast Tasks

Authors: Magda Gregorova, Alexandros Kalousis, Stephane Marchand-Maillet

Abstract: We present a new method for forecasting systems of multiple interrelated time series. The method learns the forecast models together with discovering leading indicators from within the system that serve as good predictors improving the forecast accuracy and a cluster structure of the predictive tasks around these. The method is based on the classical linear vector autoregressive model (VAR) and li… ▽ More We present a new method for forecasting systems of multiple interrelated time series. The method learns the forecast models together with discovering leading indicators from within the system that serve as good predictors improving the forecast accuracy and a cluster structure of the predictive tasks around these. The method is based on the classical linear vector autoregressive model (VAR) and links the discovery of the leading indicators to inferring sparse graphs of Granger causality. We formulate a new constrained optimisation problem to promote the desired sparse structures across the models and the sharing of information amongst the learning tasks in a multi-task manner. We propose an algorithm for solving the problem and document on a battery of synthetic and real-data experiments the advantages of our new method over baseline VAR models as well as the state-of-the-art sparse VAR learning methods. △ Less

Submitted 2 October, 2017; originally announced October 2017.

Comments: Accepted for ACML2017; includes appendices

arXiv:1706.08811 [pdf, ps, other]

Forecasting and Granger Modelling with Non-linear Dynamical Dependencies

Authors: Magda Gregorová, Alexandros Kalousis, Stéphane Marchand-Maillet

Abstract: Traditional linear methods for forecasting multivariate time series are not able to satisfactorily model the non-linear dependencies that may exist in non-Gaussian series. We build on the theory of learning vector-valued functions in the reproducing kernel Hilbert space and develop a method for learning prediction functions that accommodate such non-linearities. The method not only learns the pred… ▽ More Traditional linear methods for forecasting multivariate time series are not able to satisfactorily model the non-linear dependencies that may exist in non-Gaussian series. We build on the theory of learning vector-valued functions in the reproducing kernel Hilbert space and develop a method for learning prediction functions that accommodate such non-linearities. The method not only learns the predictive function but also the matrix-valued kernel underlying the function search space directly from the data. Our approach is based on learning multiple matrix-valued kernels, each of those composed of a set of input kernels and a set of output kernels learned in the cone of positive semi-definite matrices. In addition to superior predictive performance in the presence of strong non-linearities, our method also recovers the hidden dynamic relationships between the series and thus is a new alternative to existing graphical Granger techniques. △ Less

Submitted 27 June, 2017; originally announced June 2017.

Comments: Accepted for ECML-PKDD 2017

arXiv:1705.09847 [pdf, other]

doi 10.1016/j.neucom.2020.02.115

Lifelong Generative Modeling

Authors: Jason Ramapuram, Magda Gregorova, Alexandros Kalousis

Abstract: Lifelong learning is the problem of learning multiple consecutive tasks in a sequential manner, where knowledge gained from previous tasks is retained and used to aid future learning over the lifetime of the learner. It is essential towards the development of intelligent machines that can adapt to their surroundings. In this work we focus on a lifelong learning approach to unsupervised generative… ▽ More Lifelong learning is the problem of learning multiple consecutive tasks in a sequential manner, where knowledge gained from previous tasks is retained and used to aid future learning over the lifetime of the learner. It is essential towards the development of intelligent machines that can adapt to their surroundings. In this work we focus on a lifelong learning approach to unsupervised generative modeling, where we continuously incorporate newly observed distributions into a learned model. We do so through a student-teacher Variational Autoencoder architecture which allows us to learn and preserve all the distributions seen so far, without the need to retain the past data nor the past models. Through the introduction of a novel cross-model regularizer, inspired by a Bayesian update rule, the student model leverages the information learned by the teacher, which acts as a probabilistic knowledge store. The regularizer reduces the effect of catastrophic interference that appears when we learn over sequences of distributions. We validate our model's performance on sequential variants of MNIST, FashionMNIST, PermutedMNIST, SVHN and Celeb-A and demonstrate that our model mitigates the effects of catastrophic interference faced by neural networks in sequential learning scenarios. △ Less

Submitted 8 September, 2020; v1 submitted 27 May, 2017; originally announced May 2017.

Comments: 32 pages

Journal ref: Neurocomputing 2020, Volume 404, Pages 381-400

arXiv:1507.01978 [pdf, other]

Learning Leading Indicators for Time Series Predictions

Authors: Magda Gregorova, Alexandros Kalousis, Stéphane Marchand-Maillet

Abstract: We consider the problem of learning models for forecasting multiple time-series systems together with discovering the leading indicators that serve as good predictors for the system. We model the systems by linear vector autoregressive models (VAR) and link the discovery of leading indicators to inferring sparse graphs of Granger-causality. We propose new problem formulations and develop two new m… ▽ More We consider the problem of learning models for forecasting multiple time-series systems together with discovering the leading indicators that serve as good predictors for the system. We model the systems by linear vector autoregressive models (VAR) and link the discovery of leading indicators to inferring sparse graphs of Granger-causality. We propose new problem formulations and develop two new methods to learn such models, gradually increasing the complexity of assumptions and approaches. While the first method assumes common structures across the whole system, our second method uncovers model clusters based on the Granger-causality and leading indicators together with learning the model parameters. We study the performance of our methods on a comprehensive set of experiments and confirm their efficacy and their advantages over state-of-the-art sparse VAR and graphical Granger learning methods. △ Less

Submitted 2 November, 2016; v1 submitted 7 July, 2015; originally announced July 2015.

Comments: Changed title plus minor updates in the text

Showing 1–9 of 9 results for author: Gregorova, M