-
Deep Variational Sufficient Dimensionality Reduction
Authors:
Ershad Banijamali,
Amir-Hossein Karimi,
Ali Ghodsi
Abstract:
We consider the problem of sufficient dimensionality reduction (SDR), where the high-dimensional observation is transformed to a low-dimensional sub-space in which the information of the observations regarding the label variable is preserved. We propose DVSDR, a deep variational approach for sufficient dimensionality reduction. The deep structure in our model has a bottleneck that represent the lo…
▽ More
We consider the problem of sufficient dimensionality reduction (SDR), where the high-dimensional observation is transformed to a low-dimensional sub-space in which the information of the observations regarding the label variable is preserved. We propose DVSDR, a deep variational approach for sufficient dimensionality reduction. The deep structure in our model has a bottleneck that represent the low-dimensional embedding of the data. We explain the SDR problem using graphical models and use the framework of variational autoencoders to maximize the lower bound of the log-likelihood of the joint distribution of the observation and label. We show that such a maximization problem can be interpreted as solving the SDR problem. DVSDR can be easily adopted to semi-supervised learning setting. In our experiment we show that DVSDR performs competitively on classification tasks while being able to generate novel data samples.
△ Less
Submitted 18 December, 2018;
originally announced December 2018.
-
Optimizing over a Restricted Policy Class in Markov Decision Processes
Authors:
Ershad Banijamali,
Yasin Abbasi-Yadkori,
Mohammad Ghavamzadeh,
Nikos Vlassis
Abstract:
We address the problem of finding an optimal policy in a Markov decision process under a restricted policy class defined by the convex hull of a set of base policies. This problem is of great interest in applications in which a number of reasonably good (or safe) policies are already known and we are only interested in optimizing in their convex hull. We show that this problem is NP-hard to solve…
▽ More
We address the problem of finding an optimal policy in a Markov decision process under a restricted policy class defined by the convex hull of a set of base policies. This problem is of great interest in applications in which a number of reasonably good (or safe) policies are already known and we are only interested in optimizing in their convex hull. We show that this problem is NP-hard to solve exactly as well as to approximate to arbitrary accuracy. However, under a condition that is akin to the occupancy measures of the base policies having large overlap, we show that there exists an efficient algorithm that finds a policy that is almost as good as the best convex combination of the base policies. The running time of the proposed algorithm is linear in the number of states and polynomial in the number of base policies. In practice, we demonstrate an efficient implementation for large state problems. Compared to traditional policy gradient methods, the proposed approach has the advantage that, apart from the computation of occupancy measures of some base policies, the iterative method need not interact with the environment during the optimization process. This is especially important in complex systems where estimating the value of a policy can be a time consuming process.
△ Less
Submitted 26 February, 2018;
originally announced February 2018.
-
JADE: Joint Autoencoders for Dis-Entanglement
Authors:
Ershad Banijamali,
Amir-Hossein Karimi,
Alexander Wong,
Ali Ghodsi
Abstract:
The problem of feature disentanglement has been explored in the literature, for the purpose of image and video processing and text analysis. State-of-the-art methods for disentangling feature representations rely on the presence of many labeled samples. In this work, we present a novel method for disentangling factors of variation in data-scarce regimes. Specifically, we explore the application of…
▽ More
The problem of feature disentanglement has been explored in the literature, for the purpose of image and video processing and text analysis. State-of-the-art methods for disentangling feature representations rely on the presence of many labeled samples. In this work, we present a novel method for disentangling factors of variation in data-scarce regimes. Specifically, we explore the application of feature disentangling for the problem of supervised classification in a setting where few labeled samples exist, and there are no unlabeled samples for use in unsupervised training. Instead, a similar datasets exists which shares at least one direction of variation with the sample-constrained datasets. We train our model end-to-end using the framework of variational autoencoders and are able to experimentally demonstrate that using an auxiliary dataset with similar variation factors contribute positively to classification performance, yielding competitive results with the state-of-the-art in unsupervised learning.
△ Less
Submitted 24 November, 2017;
originally announced November 2017.
-
Fast Spectral Clustering Using Autoencoders and Landmarks
Authors:
Ershad Banijamali,
Ali Ghodsi
Abstract:
In this paper, we introduce an algorithm for performing spectral clustering efficiently. Spectral clustering is a powerful clustering algorithm that suffers from high computational complexity, due to eigen decomposition. In this work, we first build the adjacency matrix of the corresponding graph of the dataset. To build this matrix, we only consider a limited number of points, called landmarks, a…
▽ More
In this paper, we introduce an algorithm for performing spectral clustering efficiently. Spectral clustering is a powerful clustering algorithm that suffers from high computational complexity, due to eigen decomposition. In this work, we first build the adjacency matrix of the corresponding graph of the dataset. To build this matrix, we only consider a limited number of points, called landmarks, and compute the similarity of all data points with the landmarks. Then, we present a definition of the Laplacian matrix of the graph that enable us to perform eigen decomposition efficiently, using a deep autoencoder. The overall complexity of the algorithm for eigen decomposition is $O(np)$, where $n$ is the number of data points and $p$ is the number of landmarks. At last, we evaluate the performance of the algorithm in different experiments.
△ Less
Submitted 7 April, 2017;
originally announced April 2017.
-
Generative Mixture of Networks
Authors:
Ershad Banijamali,
Ali Ghodsi,
Pascal Poupart
Abstract:
A generative model based on training deep architectures is proposed. The model consists of K networks that are trained together to learn the underlying distribution of a given data set. The process starts with dividing the input data into K clusters and feeding each of them into a separate network. After few iterations of training networks separately, we use an EM-like algorithm to train the netwo…
▽ More
A generative model based on training deep architectures is proposed. The model consists of K networks that are trained together to learn the underlying distribution of a given data set. The process starts with dividing the input data into K clusters and feeding each of them into a separate network. After few iterations of training networks separately, we use an EM-like algorithm to train the networks together and update the clusters of the data. We call this model Mixture of Networks. The provided model is a platform that can be used for any deep structure and be trained by any conventional objective function for distribution modeling. As the components of the model are neural networks, it has high capability in characterizing complicated data distributions as well as clustering data. We apply the algorithm on MNIST hand-written digits and Yale face datasets. We also demonstrate the clustering ability of the model using some real-world and toy examples.
△ Less
Submitted 10 February, 2017;
originally announced February 2017.