Search | arXiv e-print repository

On Leveraging Variational Graph Embeddings for Open World Compositional Zero-Shot Learning

Authors: Muhammad Umer Anwaar, Zhihui Pan, Martin Kleinsteuber

Abstract: Humans are able to identify and categorize novel compositions of known concepts. The task in Compositional Zero-Shot learning (CZSL) is to learn composition of primitive concepts, i.e. objects and states, in such a way that even their novel compositions can be zero-shot classified. In this work, we do not assume any prior knowledge on the feasibility of novel compositions i.e.open-world setting, w… ▽ More Humans are able to identify and categorize novel compositions of known concepts. The task in Compositional Zero-Shot learning (CZSL) is to learn composition of primitive concepts, i.e. objects and states, in such a way that even their novel compositions can be zero-shot classified. In this work, we do not assume any prior knowledge on the feasibility of novel compositions i.e.open-world setting, where infeasible compositions dominate the search space. We propose a Compositional Variational Graph Autoencoder (CVGAE) approach for learning the variational embeddings of the primitive concepts (nodes) as well as feasibility of their compositions (via edges). Such modelling makes CVGAE scalable to real-world application scenarios. This is in contrast to SOTA method, CGE, which is computationally very expensive. e.g.for benchmark C-GQA dataset, CGE requires 3.94 x 10^5 nodes, whereas CVGAE requires only 1323 nodes. We learn a map** of the graph and image embeddings onto a common embedding space. CVGAE adopts a deep metric learning approach and learns a similarity metric in this space via bi-directional contrastive loss between projected graph and image embeddings. We validate the effectiveness of our approach on three benchmark datasets.We also demonstrate via an image retrieval task that the representations learnt by CVGAE are better suited for compositional generalization. △ Less

Submitted 23 April, 2022; originally announced April 2022.

Comments: Submitted to a conference

arXiv:2110.15742 [pdf, other]

Barlow Graph Auto-Encoder for Unsupervised Network Embedding

Authors: Rayyan Ahmad Khan, Martin Kleinsteuber

Abstract: Network embedding has emerged as a promising research field for network analysis. Recently, an approach, named Barlow Twins, has been proposed for self-supervised learning in computer vision by applying the redundancy-reduction principle to the embedding vectors corresponding to two distorted versions of the image samples. Motivated by this, we propose Barlow Graph Auto-Encoder, a simple yet effec… ▽ More Network embedding has emerged as a promising research field for network analysis. Recently, an approach, named Barlow Twins, has been proposed for self-supervised learning in computer vision by applying the redundancy-reduction principle to the embedding vectors corresponding to two distorted versions of the image samples. Motivated by this, we propose Barlow Graph Auto-Encoder, a simple yet effective architecture for learning network embedding. It aims to maximize the similarity between the embedding vectors of immediate and larger neighborhoods of a node, while minimizing the redundancy between the components of these projections. In addition, we also present the variation counterpart named as Barlow Variational Graph Auto-Encoder. Our approach yields promising results for inductive link prediction and is also on par with state of the art for clustering and downstream node classification, as demonstrated by extensive comparisons with several well-known techniques on three benchmark citation datasets. △ Less

Submitted 13 December, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

arXiv:2108.03953 [pdf, other]

A Framework for Joint Unsupervised Learning of Cluster-Aware Embedding for Heterogeneous Networks

Authors: Rayyan Ahmad Khan, Martin Kleinsteuber

Abstract: Heterogeneous Information Network (HIN) embedding refers to the low-dimensional projections of the HIN nodes that preserve the HIN structure and semantics. HIN embedding has emerged as a promising research field for network analysis as it enables downstream tasks such as clustering and node classification. In this work, we propose \ours for joint learning of cluster embeddings as well as cluster-a… ▽ More Heterogeneous Information Network (HIN) embedding refers to the low-dimensional projections of the HIN nodes that preserve the HIN structure and semantics. HIN embedding has emerged as a promising research field for network analysis as it enables downstream tasks such as clustering and node classification. In this work, we propose \ours for joint learning of cluster embeddings as well as cluster-aware HIN embedding. We assume that the connected nodes are highly likely to fall in the same cluster, and adopt a variational approach to preserve the information in the pairwise relations in a cluster-aware manner. In addition, we deploy contrastive modules to simultaneously utilize the information in multiple meta-paths, thereby alleviating the meta-path selection problem - a challenge faced by many of the famous HIN embedding approaches. The HIN embedding, thus learned, not only improves the clustering performance but also preserves pairwise proximity as well as the high-order HIN structure. We show the effectiveness of our approach by comparing it with many competitive baselines on three real-world datasets on clustering and downstream node classification. △ Less

Submitted 9 August, 2021; originally announced August 2021.

arXiv:2101.03885 [pdf, other]

Variational Embeddings for Community Detection and Node Representation

Authors: Rayyan Ahmad Khan, Muhammad Umer Anwaar, Omran Kaddah, Martin Kleinsteuber

Abstract: In this paper, we study how to simultaneously learn two highly correlated tasks of graph analysis, i.e., community detection and node representation learning. We propose an efficient generative model called VECoDeR for jointly learning Variational Embeddings for Community Detection and node Representation. VECoDeR assumes that every node can be a member of one or more communities. The node embeddi… ▽ More In this paper, we study how to simultaneously learn two highly correlated tasks of graph analysis, i.e., community detection and node representation learning. We propose an efficient generative model called VECoDeR for jointly learning Variational Embeddings for Community Detection and node Representation. VECoDeR assumes that every node can be a member of one or more communities. The node embeddings are learned in such a way that connected nodes are not only "closer" to each other but also share similar community assignments. A joint learning framework leverages community-aware node embeddings for better community detection. We demonstrate on several graph datasets that VECoDeR effectively out-performs many competitive baselines on all three tasks i.e. node classification, overlap** community detection and non-overlap** community detection. We also show that VECoDeR is computationally efficient and has quite robust performance with varying hyperparameters. △ Less

Submitted 11 January, 2021; originally announced January 2021.

arXiv:2010.11793 [pdf, other]

Metapath- and Entity-aware Graph Neural Network for Recommendation

Authors: Muhammad Umer Anwaar, Zhiwei Han, Shyam Arumugaswamy, Rayyan Ahmad Khan, Thomas Weber, Tianming Qiu, Hao Shen, Yuanting Liu, Martin Kleinsteuber

Abstract: In graph neural networks (GNNs), message passing iteratively aggregates nodes' information from their direct neighbors while neglecting the sequential nature of multi-hop node connections. Such sequential node connections e.g., metapaths, capture critical insights for downstream tasks. Concretely, in recommender systems (RSs), disregarding these insights leads to inadequate distillation of collabo… ▽ More In graph neural networks (GNNs), message passing iteratively aggregates nodes' information from their direct neighbors while neglecting the sequential nature of multi-hop node connections. Such sequential node connections e.g., metapaths, capture critical insights for downstream tasks. Concretely, in recommender systems (RSs), disregarding these insights leads to inadequate distillation of collaborative signals. In this paper, we employ collaborative subgraphs (CSGs) and metapaths to form metapath-aware subgraphs, which explicitly capture sequential semantics in graph structures. We propose meta\textbf{P}ath and \textbf{E}ntity-\textbf{A}ware \textbf{G}raph \textbf{N}eural \textbf{N}etwork (PEAGNN), which trains multilayer GNNs to perform metapath-aware information aggregation on such subgraphs. This aggregated information from different metapaths is then fused using attention mechanism. Finally, PEAGNN gives us the representations for node and subgraph, which can be used to train MLP for predicting score for target user-item pairs. To leverage the local structure of CSGs, we present entity-awareness that acts as a contrastive regularizer on node embedding. Moreover, PEAGNN can be combined with prominent layers such as GAT, GCN and GraphSage. Our empirical evaluation shows that our proposed technique outperforms competitive baselines on several datasets for recommendation tasks. Further analysis demonstrates that PEAGNN also learns meaningful metapath combinations from a given set of metapaths. △ Less

Submitted 1 April, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

arXiv:2006.11149 [pdf, other]

Compositional Learning of Image-Text Query for Image Retrieval

Authors: Muhammad Umer Anwaar, Egor Labintcev, Martin Kleinsteuber

Abstract: In this paper, we investigate the problem of retrieving images from a database based on a multi-modal (image-text) query. Specifically, the query text prompts some modification in the query image and the task is to retrieve images with the desired modifications. For instance, a user of an E-Commerce platform is interested in buying a dress, which should look similar to her friend's dress, but the… ▽ More In this paper, we investigate the problem of retrieving images from a database based on a multi-modal (image-text) query. Specifically, the query text prompts some modification in the query image and the task is to retrieve images with the desired modifications. For instance, a user of an E-Commerce platform is interested in buying a dress, which should look similar to her friend's dress, but the dress should be of white color with a ribbon sash. In this case, we would like the algorithm to retrieve some dresses with desired modifications in the query dress. We propose an autoencoder based model, ComposeAE, to learn the composition of image and text query for retrieving images. We adopt a deep metric learning approach and learn a metric that pushes composition of source image and text query closer to the target images. We also propose a rotational symmetry constraint on the optimization problem. Our approach is able to outperform the state-of-the-art method TIRG \cite{TIRG} on three benchmark datasets, namely: MIT-States, Fashion200k and Fashion IQ. In order to ensure fair comparison, we introduce strong baselines by enhancing TIRG method. To ensure reproducibility of the results, we publish our code here: \url{https://github.com/ecom-research/ComposeAE}. △ Less

Submitted 31 May, 2021; v1 submitted 19 June, 2020; originally announced June 2020.

Comments: Published at IEEE WACV 2021

arXiv:2004.01468 [pdf, other]

Epitomic Variational Graph Autoencoder

Authors: Rayyan Ahmad Khan, Muhammad Umer Anwaar, Martin Kleinsteuber

Abstract: Variational autoencoder (VAE) is a widely used generative model for learning latent representations. Burda et al. in their seminal paper showed that learning capacity of VAE is limited by over-pruning. It is a phenomenon where a significant number of latent variables fail to capture any information about the input data and the corresponding hidden units become inactive. This adversely affects lear… ▽ More Variational autoencoder (VAE) is a widely used generative model for learning latent representations. Burda et al. in their seminal paper showed that learning capacity of VAE is limited by over-pruning. It is a phenomenon where a significant number of latent variables fail to capture any information about the input data and the corresponding hidden units become inactive. This adversely affects learning diverse and interpretable latent representations. As variational graph autoencoder (VGAE) extends VAE for graph-structured data, it inherits the over-pruning problem. In this paper, we adopt a model based approach and propose epitomic VGAE (EVGAE),a generative variational framework for graph datasets which successfully mitigates the over-pruning problem and also boosts the generative ability of VGAE. We consider EVGAE to consist of multiple sparse VGAE models, called epitomes, that are groups of latent variables sharing the latent space. This approach aids in increasing active units as epitomes compete to learn better representation of the graph data. We verify our claims via experiments on three benchmark datasets. Our experiments show that EVGAE has a better generative ability than VGAE. Moreover, EVGAE outperforms VGAE on link prediction task in citation networks. △ Less

Submitted 7 August, 2020; v1 submitted 3 April, 2020; originally announced April 2020.

arXiv:1907.10409 [pdf, other]

Mend The Learning Approach, Not the Data: Insights for Ranking E-Commerce Products

Authors: Muhammad Umer Anwaar, Dmytro Rybalko, Martin Kleinsteuber

Abstract: Improved search quality enhances users' satisfaction, which directly impacts sales growth of an E-Commerce (E-Com) platform. Traditional Learning to Rank (LTR) algorithms require relevance judgments on products. In E-Com, getting such judgments poses an immense challenge. In the literature, it is proposed to employ user feedback (such as clicks, add-to-basket (AtB) clicks and orders) to generate r… ▽ More Improved search quality enhances users' satisfaction, which directly impacts sales growth of an E-Commerce (E-Com) platform. Traditional Learning to Rank (LTR) algorithms require relevance judgments on products. In E-Com, getting such judgments poses an immense challenge. In the literature, it is proposed to employ user feedback (such as clicks, add-to-basket (AtB) clicks and orders) to generate relevance judgments. It is done in two steps: first, query-product pair data are aggregated from the logs and then order rate etc are calculated for each pair in the logs. In this paper, we advocate counterfactual risk minimization (CRM) approach which circumvents the need of relevance judgements, data aggregation and is better suited for learning from logged data, i.e. contextual bandit feedback. Due to unavailability of public E-Com LTR dataset, we provide \textit{Mercateo dataset} from our platform. It contains more than 10 million AtB click logs and 1 million order logs from a catalogue of about 3.5 million products associated with 3060 queries. To the best of our knowledge, this is the first work which examines effectiveness of CRM approach in learning ranking model from real-world logged data. Our empirical evaluation shows that our CRM approach learns effectively from logged data and beats a strong baseline ranker ($λ$-MART) by a huge margin. Our method outperforms full-information loss (e.g. cross-entropy) on various deep neural network models. These findings demonstrate that by adopting CRM approach, E-Com platforms can get better product search quality compared to full-information approach. The code and dataset can be accessed at: https://github.com/ecom-research/CRM-LTR. △ Less

Submitted 9 July, 2020; v1 submitted 24 July, 2019; originally announced July 2019.

Comments: Accepted for ECML-PKDD 2020

arXiv:1810.03523 [pdf, other]

Trace Quotient with Sparsity Priors for Learning Low Dimensional Image Representations

Authors: Xian Wei, Hao Shen, Martin Kleinsteuber

Abstract: This work studies the problem of learning appropriate low dimensional image representations. We propose a generic algorithmic framework, which leverages two classic representation learning paradigms, i.e., sparse representation and the trace quotient criterion. The former is a well-known powerful tool to identify underlying self-explanatory factors of data, while the latter is known for disentangl… ▽ More This work studies the problem of learning appropriate low dimensional image representations. We propose a generic algorithmic framework, which leverages two classic representation learning paradigms, i.e., sparse representation and the trace quotient criterion. The former is a well-known powerful tool to identify underlying self-explanatory factors of data, while the latter is known for disentangling underlying low dimensional discriminative factors in data. Our developed solutions disentangle sparse representations of images by employing the trace quotient criterion. We construct a unified cost function, coined as the SPARse LOW dimensional representation (SparLow) function, for jointly learning both a sparsifying dictionary and a dimensionality reduction transformation. The SparLow function is widely applicable for develo** various algorithms in three classic machine learning scenarios, namely, unsupervised, supervised, and semi-supervised learning. In order to develop efficient joint learning algorithms for maximizing the SparLow function, we deploy a framework of sparse coding with appropriate convex priors to ensure the sparse representations to be locally differentiable. Moreover, we develop an efficient geometric conjugate gradient algorithm to maximize the SparLow function on its underlying Riemannian manifold. Performance of the proposed SparLow algorithmic framework is investigated on several image processing tasks, such as 3D data visualization, face/digit recognition, and object/scene categorization. △ Less

Submitted 8 October, 2018; originally announced October 2018.

Comments: 17 pages

MSC Class: 14J60 ACM Class: F.2.2

arXiv:1803.04459 [pdf, ps, other]

Extended Affinity Propagation: Global Discovery and Local Insights

Authors: Rayyan Ahmad Khan, Rana Ali Amjad, Martin Kleinsteuber

Abstract: We propose a new clustering algorithm, Extended Affinity Propagation, based on pairwise similarities. Extended Affinity Propagation is developed by modifying Affinity Propagation such that the desirable features of Affinity Propagation, e.g., exemplars, reasonable computational complexity and no need to specify number of clusters, are preserved while the shortcomings, e.g., the lack of global stru… ▽ More We propose a new clustering algorithm, Extended Affinity Propagation, based on pairwise similarities. Extended Affinity Propagation is developed by modifying Affinity Propagation such that the desirable features of Affinity Propagation, e.g., exemplars, reasonable computational complexity and no need to specify number of clusters, are preserved while the shortcomings, e.g., the lack of global structure discovery, that limit the applicability of Affinity Propagation are overcome. Extended Affinity Propagation succeeds not only in achieving this goal but can also provide various additional insights into the internal structure of the individual clusters, e.g., refined confidence values, relative cluster densities and local cluster strength in different regions of a cluster, which are valuable for an analyst. We briefly discuss how these insights can help in easily tuning the hyperparameters. We also illustrate these desirable features and the performance of Extended Affinity Propagation on various synthetic and real world datasets. △ Less

Submitted 15 April, 2019; v1 submitted 12 March, 2018; originally announced March 2018.

Comments: Submitted to TKDE

arXiv:1708.00180 [pdf, other]

doi 10.1109/TIP.2018.2792904

Model-based learning of local image features for unsupervised texture segmentation

Authors: Martin Kiechle, Martin Storath, Andreas Weinmann, Martin Kleinsteuber

Abstract: Features that capture well the textural patterns of a certain class of images are crucial for the performance of texture segmentation methods. The manual selection of features or designing new ones can be a tedious task. Therefore, it is desirable to automatically adapt the features to a certain image or class of images. Typically, this requires a large set of training images with similar textures… ▽ More Features that capture well the textural patterns of a certain class of images are crucial for the performance of texture segmentation methods. The manual selection of features or designing new ones can be a tedious task. Therefore, it is desirable to automatically adapt the features to a certain image or class of images. Typically, this requires a large set of training images with similar textures and ground truth segmentation. In this work, we propose a framework to learn features for texture segmentation when no such training data is available. The cost function for our learning process is constructed to match a commonly used segmentation model, the piecewise constant Mumford-Shah model. This means that the features are learned such that they provide an approximately piecewise constant feature image with a small jump set. Based on this idea, we develop a two-stage algorithm which first learns suitable convolutional features and then performs a segmentation. We note that the features can be learned from a small set of images, from a single image, or even from image patches. The proposed method achieves a competitive rank in the Prague texture segmentation benchmark, and it is effective for segmenting histological images. △ Less

Submitted 1 August, 2017; originally announced August 2017.

arXiv:1706.04388 [pdf, other]

doi 10.1109/TCSVT.2017.2715851

Alignment Distances on Systems of Bags

Authors: Alexander Sagel, Martin Kleinsteuber

Abstract: Recent research in image and video recognition indicates that many visual processes can be thought of as being generated by a time-varying generative model. A nearby descriptive model for visual processes is thus a statistical distribution that varies over time. Specifically, modeling visual processes as streams of histograms generated by a kernelized linear dynamic system turns out to be efficien… ▽ More Recent research in image and video recognition indicates that many visual processes can be thought of as being generated by a time-varying generative model. A nearby descriptive model for visual processes is thus a statistical distribution that varies over time. Specifically, modeling visual processes as streams of histograms generated by a kernelized linear dynamic system turns out to be efficient. We refer to such a model as a System of Bags. In this work, we investigate Systems of Bags with special emphasis on dynamic scenes and dynamic textures. Parameters of linear dynamic systems suffer from ambiguities. In order to cope with these ambiguities in the kernelized setting, we develop a kernelized version of the alignment distance. For its computation, we use a Jacobi-type method and prove its convergence to a set of critical points. We employ it as a dissimilarity measure on Systems of Bags. As such, it outperforms other known dissimilarity measures for kernelized linear dynamic systems, in particular the Martin Distance and the Maximum Singular Value Distance, in every tested classification setting. A considerable margin can be observed in settings, where classification is performed with respect to an abstract mean of video sets. For this scenario, the presented approach can outperform state-of-the-art techniques, such as Dynamic Fractal Spectrum or Orthogonal Tensor Dictionary Learning. △ Less

Submitted 14 June, 2017; originally announced June 2017.

arXiv:1608.05493 [pdf, ps, other]

doi 10.1109/TNSM.2016.2598788

Network Volume Anomaly Detection and Identification in Large-scale Networks based on Online Time-structured Traffic Tensor Tracking

Authors: Hiroyuki Kasai, Wolfgang Kellerer, Martin Kleinsteuber

Abstract: This paper addresses network anomography, that is, the problem of inferring network-level anomalies from indirect link measurements. This problem is cast as a low-rank subspace tracking problem for normal flows under incomplete observations, and an outlier detection problem for abnormal flows. Since traffic data is large-scale time-structured data accompanied with noise and outliers under partial… ▽ More This paper addresses network anomography, that is, the problem of inferring network-level anomalies from indirect link measurements. This problem is cast as a low-rank subspace tracking problem for normal flows under incomplete observations, and an outlier detection problem for abnormal flows. Since traffic data is large-scale time-structured data accompanied with noise and outliers under partial observations, an efficient modeling method is essential. To this end, this paper proposes an online subspace tracking of a Hankelized time-structured traffic tensor for normal flows based on the Candecomp/PARAFAC decomposition exploiting the recursive least squares (RLS) algorithm. We estimate abnormal flows as outlier sparse flows via sparsity maximization in the underlying under-constrained linear-inverse problem. A major advantage is that our algorithm estimates normal flows by low-dimensional matrices with time-directional features as well as the spatial correlation of multiple links without using the past observed measurements and the past model parameters. Extensive numerical evaluations show that the proposed algorithm achieves faster convergence per iteration of model approximation, and better volume anomaly detection performance compared to state-of-the-art algorithms. △ Less

Submitted 19 August, 2016; originally announced August 2016.

Comments: IEEE Transactions on Network and Service Management

Journal ref: IEEE Transactions on Network and Service Management, vol.13, no.3, pp.636-650, 2016

arXiv:1503.02398 [pdf, other]

doi 10.1109/TSP.2015.2481875

Learning Co-Sparse Analysis Operators with Separable Structures

Authors: Matthias Seibert, Julian Wörmann, Rémi Gribonval, Martin Kleinsteuber

Abstract: In the co-sparse analysis model a set of filters is applied to a signal out of the signal class of interest yielding sparse filter responses. As such, it may serve as a prior in inverse problems, or for structural analysis of signals that are known to belong to the signal class. The more the model is adapted to the class, the more reliable it is for these purposes. The task of learning such operat… ▽ More In the co-sparse analysis model a set of filters is applied to a signal out of the signal class of interest yielding sparse filter responses. As such, it may serve as a prior in inverse problems, or for structural analysis of signals that are known to belong to the signal class. The more the model is adapted to the class, the more reliable it is for these purposes. The task of learning such operators for a given class is therefore a crucial problem. In many applications, it is also required that the filter responses are obtained in a timely manner, which can be achieved by filters with a separable structure. Not only can operators of this sort be efficiently used for computing the filter responses, but they also have the advantage that less training samples are required to obtain a reliable estimate of the operator. The first contribution of this work is to give theoretical evidence for this claim by providing an upper bound for the sample complexity of the learning process. The second is a stochastic gradient descent (SGD) method designed to learn an analysis operator with separable structures, which includes a novel and efficient step size selection rule. Numerical experiments are provided that link the sample complexity to the convergence speed of the SGD algorithm. △ Less

Submitted 11 September, 2015; v1 submitted 9 March, 2015; originally announced March 2015.

Comments: 11 pages double column, 4 figures, 3 tables

arXiv:1406.6538 [pdf, other]

A Bimodal Co-Sparse Analysis Model for Image Processing

Authors: Martin Kiechle, Tim Habigt, Simon Hawe, Martin Kleinsteuber

Abstract: The success of many computer vision tasks lies in the ability to exploit the interdependency between different image modalities such as intensity and depth. Fusing corresponding information can be achieved on several levels, and one promising approach is the integration at a low level. Moreover, sparse signal models have successfully been used in many vision applications. Within this area of resea… ▽ More The success of many computer vision tasks lies in the ability to exploit the interdependency between different image modalities such as intensity and depth. Fusing corresponding information can be achieved on several levels, and one promising approach is the integration at a low level. Moreover, sparse signal models have successfully been used in many vision applications. Within this area of research, the so called co-sparse analysis model has attracted considerably less attention than its well-known counterpart, the sparse synthesis model, although it has been proven to be very useful in various image processing applications. In this paper, we propose a co-sparse analysis model that is able to capture the interdependency of two image modalities. It is based on the assumption that a pair of analysis operators exists, so that the co-supports of the corresponding bimodal image structures are correlated. We propose an algorithm that is able to learn such a coupled pair of operators from registered and noise-free training data. Furthermore, we explain how this model can be applied to solve linear inverse problems in image processing and how it can be used for image registration tasks. This paper extends the work of some of the authors by two major contributions. Firstly, a modification of the learning process is proposed that a priori guarantees unit norm and zero-mean of the rows of the operator. This accounts for the intuition that contrast in image modalities carries the most information. Secondly, the model is used in a novel bimodal image registration algorithm which estimates the transformation parameters of unregistered images of different modalities. △ Less

Submitted 25 June, 2014; originally announced June 2014.

arXiv:1406.1621 [pdf, other]

Separable Cosparse Analysis Operator Learning

Authors: Matthias Seibert, Julian Wörmann, Rémi Gribonval, Martin Kleinsteuber

Abstract: The ability of having a sparse representation for a certain class of signals has many applications in data analysis, image processing, and other research fields. Among sparse representations, the cosparse analysis model has recently gained increasing interest. Many signals exhibit a multidimensional structure, e.g. images or three-dimensional MRI scans. Most data analysis and learning algorithms u… ▽ More The ability of having a sparse representation for a certain class of signals has many applications in data analysis, image processing, and other research fields. Among sparse representations, the cosparse analysis model has recently gained increasing interest. Many signals exhibit a multidimensional structure, e.g. images or three-dimensional MRI scans. Most data analysis and learning algorithms use vectorized signals and thereby do not account for this underlying structure. The drawback of not taking the inherent structure into account is a dramatic increase in computational cost. We propose an algorithm for learning a cosparse Analysis Operator that adheres to the preexisting structure of the data, and thus allows for a very efficient implementation. This is achieved by enforcing a separable structure on the learned operator. Our learning algorithm is able to deal with multidimensional data of arbitrary order. We evaluate our method on volumetric data at the example of three-dimensional MRI scans. △ Less

Submitted 6 June, 2014; originally announced June 2014.

Comments: 5 pages, 3 figures, accepted at EUSIPCO 2014

arXiv:1403.1501 [pdf, other]

Sparse DOA Estimation of Wideband Sound Sources Using Circular Harmonics

Authors: Clemens Hage, Tim Habigt, Martin Kleinsteuber

Abstract: Sparse signal models are in the focus of recent developments in narrowband DOA estimation. Applying these methods to localizing audio sources, however, is challenging due to the wideband nature of the signals. The common approach of processing all frequency bands separately and fusing the results is costly and can introduce errors in the solution. We show how these problems can be overcome by deco… ▽ More Sparse signal models are in the focus of recent developments in narrowband DOA estimation. Applying these methods to localizing audio sources, however, is challenging due to the wideband nature of the signals. The common approach of processing all frequency bands separately and fusing the results is costly and can introduce errors in the solution. We show how these problems can be overcome by decomposing the wavefield of a circular microphone array and using circular harmonic coefficients instead of time-frequency data for sparse DOA estimation. As a result, we present the super-resolution localization method WASCHL (Wideband Audio Sparse Circular Harmonics Localizer) that is inherently frequency-coherent and highly efficient from a computational point of view. △ Less

Submitted 6 March, 2014; originally announced March 2014.

arXiv:1312.5568 [pdf, other]

An Adaptive Dictionary Learning Approach for Modeling Dynamical Textures

Authors: Xian Wei, Hao Shen, Martin Kleinsteuber

Abstract: Video representation is an important and challenging task in the computer vision community. In this paper, we assume that image frames of a moving scene can be modeled as a Linear Dynamical System. We propose a sparse coding framework, named adaptive video dictionary learning (AVDL), to model a video adaptively. The developed framework is able to capture the dynamics of a moving scene by exploring… ▽ More Video representation is an important and challenging task in the computer vision community. In this paper, we assume that image frames of a moving scene can be modeled as a Linear Dynamical System. We propose a sparse coding framework, named adaptive video dictionary learning (AVDL), to model a video adaptively. The developed framework is able to capture the dynamics of a moving scene by exploring both sparse properties and the temporal correlations of consecutive video frames. The proposed method is compared with state of the art video processing methods on several benchmark data sequences, which exhibit appearance changes and heavy occlusions. △ Less

Submitted 19 December, 2013; originally announced December 2013.

arXiv:1312.4746 [pdf, other]

Co-Sparse Textural Similarity for Image Segmentation

Authors: Claudia Nieuwenhuis, Daniel Cremers, Simon Hawe, Martin Kleinsteuber

Abstract: We propose an algorithm for segmenting natural images based on texture and color information, which leverages the co-sparse analysis model for image segmentation within a convex multilabel optimization framework. As a key ingredient of this method, we introduce a novel textural similarity measure, which builds upon the co-sparse representation of image patches. We propose a Bayesian approach to me… ▽ More We propose an algorithm for segmenting natural images based on texture and color information, which leverages the co-sparse analysis model for image segmentation within a convex multilabel optimization framework. As a key ingredient of this method, we introduce a novel textural similarity measure, which builds upon the co-sparse representation of image patches. We propose a Bayesian approach to merge textural similarity with information about color and location. Combined with recently developed convex multilabel optimization methods this leads to an efficient algorithm for both supervised and unsupervised segmentation, which is easily parallelized on graphics hardware. The approach provides competitive results in unsupervised segmentation and outperforms state-of-the-art interactive segmentation methods on the Graz Benchmark. △ Less

Submitted 17 December, 2013; originally announced December 2013.

arXiv:1312.3790 [pdf, ps, other]

Sample Complexity of Dictionary Learning and other Matrix Factorizations

Authors: Rémi Gribonval, Rodolphe Jenatton, Francis Bach, Martin Kleinsteuber, Matthias Seibert

Abstract: Many modern tools in machine learning and signal processing, such as sparse dictionary learning, principal component analysis (PCA), non-negative matrix factorization (NMF), $K$-means clustering, etc., rely on the factorization of a matrix obtained by concatenating high-dimensional vectors from a training collection. While the idealized task would be to optimize the expected quality of the factors… ▽ More Many modern tools in machine learning and signal processing, such as sparse dictionary learning, principal component analysis (PCA), non-negative matrix factorization (NMF), $K$-means clustering, etc., rely on the factorization of a matrix obtained by concatenating high-dimensional vectors from a training collection. While the idealized task would be to optimize the expected quality of the factors over the underlying distribution of training vectors, it is achieved in practice by minimizing an empirical average over the considered collection. The focus of this paper is to provide sample complexity estimates to uniformly control how much the empirical average deviates from the expected cost function. Standard arguments imply that the performance of the empirical predictor also exhibit such guarantees. The level of genericity of the approach encompasses several possible constraints on the factors (tensor product structure, shift-invariance, sparsity \ldots), thus providing a unified perspective on the sample complexity of several widely used matrix factorization schemes. The derived generalization bounds behave proportional to $\sqrt{\log(n)/n}$ w.r.t.\ the number of samples $n$ for the considered matrix factorization techniques. △ Less

Submitted 9 April, 2015; v1 submitted 13 December, 2013; originally announced December 2013.

Comments: to appear

Journal ref: IEEE Transactions on Information Theory, Institute of Electrical and Electronics Engineers (IEEE), 2015, pp.18

arXiv:1304.5319 [pdf, other]

A Joint Intensity and Depth Co-Sparse Analysis Model for Depth Map Super-Resolution

Authors: Martin Kiechle, Simon Hawe, Martin Kleinsteuber

Abstract: High-resolution depth maps can be inferred from low-resolution depth measurements and an additional high-resolution intensity image of the same scene. To that end, we introduce a bimodal co-sparse analysis model, which is able to capture the interdependency of registered intensity and depth information. This model is based on the assumption that the co-supports of corresponding bimodal image struc… ▽ More High-resolution depth maps can be inferred from low-resolution depth measurements and an additional high-resolution intensity image of the same scene. To that end, we introduce a bimodal co-sparse analysis model, which is able to capture the interdependency of registered intensity and depth information. This model is based on the assumption that the co-supports of corresponding bimodal image structures are aligned when computed by a suitable pair of analysis operators. No analytic form of such operators exist and we propose a method for learning them from a set of registered training signals. This learning process is done offline and returns a bimodal analysis operator that is universally applicable to natural scenes. We use this to exploit the bimodal co-sparse analysis model as a prior for solving inverse problems, which leads to an efficient algorithm for depth map super-resolution. △ Less

Submitted 19 April, 2013; originally announced April 2013.

Comments: 13 pages, 4 figures

arXiv:1303.5244 [pdf, other]

Separable Dictionary Learning

Authors: Simon Hawe, Matthias Seibert, Martin Kleinsteuber

Abstract: Many techniques in computer vision, machine learning, and statistics rely on the fact that a signal of interest admits a sparse representation over some dictionary. Dictionaries are either available analytically, or can be learned from a suitable training set. While analytic dictionaries permit to capture the global structure of a signal and allow a fast implementation, learned dictionaries often… ▽ More Many techniques in computer vision, machine learning, and statistics rely on the fact that a signal of interest admits a sparse representation over some dictionary. Dictionaries are either available analytically, or can be learned from a suitable training set. While analytic dictionaries permit to capture the global structure of a signal and allow a fast implementation, learned dictionaries often perform better in applications as they are more adapted to the considered class of signals. In imagery, unfortunately, the numerical burden for (i) learning a dictionary and for (ii) employing the dictionary for reconstruction tasks only allows to deal with relatively small image patches that only capture local image information. The approach presented in this paper aims at overcoming these drawbacks by allowing a separable structure on the dictionary throughout the learning process. On the one hand, this permits larger patch-sizes for the learning phase, on the other hand, the dictionary is applied efficiently in reconstruction tasks. The learning procedure is based on optimizing over a product of spheres which updates the dictionary as a whole, thus enforces basic dictionary properties such as mutual coherence explicitly during the learning procedure. In the special case where no separable structure is enforced, our method competes with state-of-the-art dictionary learning methods like K-SVD. △ Less

Submitted 21 March, 2013; originally announced March 2013.

Comments: 12 pages, 2 figures, 1 table

arXiv:1302.2073 [pdf, other]

pROST : A Smoothed Lp-norm Robust Online Subspace Tracking Method for Realtime Background Subtraction in Video

Authors: Florian Seidel, Clemens Hage, Martin Kleinsteuber

Abstract: An increasing number of methods for background subtraction use Robust PCA to identify sparse foreground objects. While many algorithms use the L1-norm as a convex relaxation of the ideal sparsifying function, we approach the problem with a smoothed Lp-norm and present pROST, a method for robust online subspace tracking. The algorithm is based on alternating minimization on manifolds. Implemented o… ▽ More An increasing number of methods for background subtraction use Robust PCA to identify sparse foreground objects. While many algorithms use the L1-norm as a convex relaxation of the ideal sparsifying function, we approach the problem with a smoothed Lp-norm and present pROST, a method for robust online subspace tracking. The algorithm is based on alternating minimization on manifolds. Implemented on a graphics processing unit it achieves realtime performance. Experimental results on a state-of-the-art benchmark for background subtraction on real-world video data indicate that the method succeeds at a broad variety of background subtraction scenarios, and it outperforms competing approaches when video quality is deteriorated by camera jitter. △ Less

Submitted 28 March, 2013; v1 submitted 8 February, 2013; originally announced February 2013.

arXiv:1302.1094 [pdf, ps, other]

doi 10.1109/LSP.2013.2252900

Analysis Based Blind Compressive Sensing

Authors: Julian Wörmann, Simon Hawe, Martin Kleinsteuber

Abstract: In this work we address the problem of blindly reconstructing compressively sensed signals by exploiting the co-sparse analysis model. In the analysis model it is assumed that a signal multiplied by an analysis operator results in a sparse vector. We propose an algorithm that learns the operator adaptively during the reconstruction process. The arising optimization problem is tackled via a geometr… ▽ More In this work we address the problem of blindly reconstructing compressively sensed signals by exploiting the co-sparse analysis model. In the analysis model it is assumed that a signal multiplied by an analysis operator results in a sparse vector. We propose an algorithm that learns the operator adaptively during the reconstruction process. The arising optimization problem is tackled via a geometric conjugate gradient approach. Different types of sampling noise are handled by simply exchanging the data fidelity term. Numerical experiments are performed for measurements corrupted with Gaussian as well as impulsive noise to show the effectiveness of our method. △ Less

Submitted 26 March, 2013; v1 submitted 5 February, 2013; originally announced February 2013.

Comments: 7 pages, 2 figures

arXiv:1204.5309 [pdf, other]

doi 10.1109/TIP.2013.2246175

Analysis Operator Learning and Its Application to Image Reconstruction

Authors: Simon Hawe, Martin Kleinsteuber, Klaus Diepold

Abstract: Exploiting a priori known structural information lies at the core of many image reconstruction methods that can be stated as inverse problems. The synthesis model, which assumes that images can be decomposed into a linear combination of very few atoms of some dictionary, is now a well established tool for the design of image reconstruction algorithms. An interesting alternative is the analysis mod… ▽ More Exploiting a priori known structural information lies at the core of many image reconstruction methods that can be stated as inverse problems. The synthesis model, which assumes that images can be decomposed into a linear combination of very few atoms of some dictionary, is now a well established tool for the design of image reconstruction algorithms. An interesting alternative is the analysis model, where the signal is multiplied by an analysis operator and the outcome is assumed to be the sparse. This approach has only recently gained increasing interest. The quality of reconstruction methods based on an analysis model severely depends on the right choice of the suitable operator. In this work, we present an algorithm for learning an analysis operator from training images. Our method is based on an $\ell_p$-norm minimization on the set of full rank matrices with normalized columns. We carefully introduce the employed conjugate gradient method on manifolds, and explain the underlying geometry of the constraints. Moreover, we compare our approach to state-of-the-art methods for image denoising, inpainting, and single image super-resolution. Our numerical results show competitive performance of our general approach in all presented applications compared to the specialized state-of-the-art techniques. △ Less

Submitted 26 March, 2013; v1 submitted 24 April, 2012; originally announced April 2012.

Comments: 12 pages, 7 figures

ACM Class: I.4.5

arXiv:1111.7088 [pdf, other]

Uniqueness Analysis of Non-Unitary Matrix Joint Diagonalization

Authors: Martin Kleinsteuber, Hao Shen

Abstract: Matrix Joint Diagonalization (MJD) is a powerful approach for solving the Blind Source Separation (BSS) problem. It relies on the construction of matrices which are diagonalized by the unknown demixing matrix. Their joint diagonalizer serves as a correct estimate of this demixing matrix only if it is uniquely determined. Thus, a critical question is under what conditions a joint diagonalizer is un… ▽ More Matrix Joint Diagonalization (MJD) is a powerful approach for solving the Blind Source Separation (BSS) problem. It relies on the construction of matrices which are diagonalized by the unknown demixing matrix. Their joint diagonalizer serves as a correct estimate of this demixing matrix only if it is uniquely determined. Thus, a critical question is under what conditions a joint diagonalizer is unique. In the present work we fully answer this question about the identifiability of MJD based BSS approaches and provide a general result on uniqueness conditions of matrix joint diagonalization. It unifies all existing results which exploit the concepts of non-circularity, non-stationarity, non-whiteness, and non-Gaussianity. As a corollary, we propose a solution for complex BSS, which can be formulated in a closed form in terms of an eigenvalue and a singular value decomposition of two matrices. △ Less

Submitted 4 April, 2012; v1 submitted 30 November, 2011; originally announced November 2011.

Comments: 23 pages

arXiv:1110.2593 [pdf, other]

doi 10.1109/LSP.2011.2181945

Blind Source Separation with Compressively Sensed Linear Mixtures

Authors: Martin Kleinsteuber, Hao Shen

Abstract: This work studies the problem of simultaneously separating and reconstructing signals from compressively sensed linear mixtures. We assume that all source signals share a common sparse representation basis. The approach combines classical Compressive Sensing (CS) theory with a linear mixing model. It allows the mixtures to be sampled independently of each other. If samples are acquired in the time… ▽ More This work studies the problem of simultaneously separating and reconstructing signals from compressively sensed linear mixtures. We assume that all source signals share a common sparse representation basis. The approach combines classical Compressive Sensing (CS) theory with a linear mixing model. It allows the mixtures to be sampled independently of each other. If samples are acquired in the time domain, this means that the sensors need not be synchronized. Since Blind Source Separation (BSS) from a linear mixture is only possible up to permutation and scaling, factoring out these ambiguities leads to a minimization problem on the so-called oblique manifold. We develop a geometric conjugate subgradient method that scales to large systems for solving the problem. Numerical results demonstrate the promising performance of the proposed algorithm compared to several state of the art methods. △ Less

Submitted 12 October, 2011; originally announced October 2011.

Comments: 9 pages, 2 figures

Showing 1–27 of 27 results for author: Kleinsteuber, M