Search | arXiv e-print repository

doi 10.1109/TSP.2023.3333212

Revisiting Deep Generalized Canonical Correlation Analysis

Authors: Paris A. Karakasis, Nicholas D. Sidiropoulos

Abstract: Canonical correlation analysis (CCA) is a classic statistical method for discovering latent co-variation that underpins two or more observed random vectors. Several extensions and variations of CCA have been proposed that have strengthened our capabilities in terms of revealing common random factors from multiview datasets. In this work, we first revisit the most recent deterministic extensions of… ▽ More Canonical correlation analysis (CCA) is a classic statistical method for discovering latent co-variation that underpins two or more observed random vectors. Several extensions and variations of CCA have been proposed that have strengthened our capabilities in terms of revealing common random factors from multiview datasets. In this work, we first revisit the most recent deterministic extensions of deep CCA and highlight the strengths and limitations of these state-of-the-art methods. Some methods allow trivial solutions, while others can miss weak common factors. Others overload the problem by also seeking to reveal what is not common among the views -- i.e., the private components that are needed to fully reconstruct each view. The latter tends to overload the problem and its computational and sample complexities. Aiming to improve upon these limitations, we design a novel and efficient formulation that alleviates some of the current restrictions. The main idea is to model the private components as conditionally independent given the common ones, which enables the proposed compact formulation. In addition, we also provide a sufficient condition for identifying the common random factors. Judicious experiments with synthetic and real datasets showcase the validity of our claims and the effectiveness of the proposed approach. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Journal ref: in IEEE Transactions on Signal Processing, vol. 71, pp. 4392-4406, 2023

arXiv:2305.03884 [pdf, other]

On High-dimensional and Low-rank Tensor Bandits

Authors: Chengshuai Shi, Cong Shen, Nicholas D. Sidiropoulos

Abstract: Most existing studies on linear bandits focus on the one-dimensional characterization of the overall system. While being representative, this formulation may fail to model applications with high-dimensional but favorable structures, such as the low-rank tensor representation for recommender systems. To address this limitation, this work studies a general tensor bandits model, where actions and sys… ▽ More Most existing studies on linear bandits focus on the one-dimensional characterization of the overall system. While being representative, this formulation may fail to model applications with high-dimensional but favorable structures, such as the low-rank tensor representation for recommender systems. To address this limitation, this work studies a general tensor bandits model, where actions and system parameters are represented by tensors as opposed to vectors, and we particularly focus on the case that the unknown system tensor is low-rank. A novel bandit algorithm, coined TOFU (Tensor Optimism in the Face of Uncertainty), is developed. TOFU first leverages flexible tensor regression techniques to estimate low-dimensional subspaces associated with the system tensor. These estimates are then utilized to convert the original problem to a new one with norm constraints on its system parameters. Lastly, a norm-constrained bandit subroutine is adopted by TOFU, which utilizes these constraints to avoid exploring the entire high-dimensional parameter space. Theoretical analyses show that TOFU improves the best-known regret upper bound by a multiplicative factor that grows exponentially in the system order. A novel performance lower bound is also established, which further corroborates the efficiency of TOFU. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: Accepted to the 2023 IEEE International Symposium on Information Theory (ISIT 2023)

arXiv:2210.11413 [pdf, other]

doi 10.1109/TSP.2023.3338062

Minimizing low-rank models of high-order tensors: Hardness, span, tight relaxation, and applications

Authors: Nicholas D. Sidiropoulos, Paris Karakasis, Aritra Konar

Abstract: We consider the problem of finding the smallest or largest entry of a tensor of order N that is specified via its rank decomposition. Stated in a different way, we are given N sets of R-dimensional vectors and we wish to select one vector from each set such that the sum of the Hadamard product of the selected vectors is minimized or maximized. We show that this fundamental tensor problem is NP-har… ▽ More We consider the problem of finding the smallest or largest entry of a tensor of order N that is specified via its rank decomposition. Stated in a different way, we are given N sets of R-dimensional vectors and we wish to select one vector from each set such that the sum of the Hadamard product of the selected vectors is minimized or maximized. We show that this fundamental tensor problem is NP-hard for any tensor rank higher than one, and polynomial-time solvable in the rank-one case. We also propose a continuous relaxation and prove that it is tight for any rank. For low-enough ranks, the proposed continuous reformulation is amenable to low-complexity gradient-based optimization, and we propose a suite of gradient-based optimization algorithms drawing from projected gradient descent, Frank-Wolfe, or explicit parametrization of the relaxed constraints. We also show that our core results remain valid no matter what kind of polyadic tensor model is used to represent the tensor of interest, including Tucker, HOSVD/MLSVD, tensor train, or tensor ring. Next, we consider the class of problems that can be posed as special instances of the problem of interest. We show that this class includes the partition problem (and thus all NP-complete problems via polynomial-time transformation), integer least squares, integer linear programming, integer quadratic programming, sign retrieval (a special kind of mixed integer programming / restricted version of phase retrieval), and maximum likelihood decoding of parity check codes. We demonstrate promising experimental results on a number of hard problems, including state-of-art performance in decoding low density parity check codes and general parity check codes. △ Less

Submitted 21 December, 2023; v1 submitted 16 October, 2022; originally announced October 2022.

Comments: 14 pages, 11 figures

Journal ref: in IEEE Transactions on Signal Processing, vol. 72, pp. 129-142, 2024

arXiv:2210.08531 [pdf, other]

doi 10.1109/TIP.2022.3159125

Multisubject Task-Related fMRI Data Processing via a Two-Stage Generalized Canonical Correlation Analysis

Authors: Paris A. Karakasis, Athanasios P. Liavas, Nicholas D. Sidiropoulos, Panagiotis G. Simos, Efrosini Papadaki

Abstract: Functional magnetic resonance imaging (fMRI) is one of the most popular methods for studying the human brain. Task-related fMRI data processing aims to determine which brain areas are activated when a specific task is performed and is usually based on the Blood Oxygen Level Dependent (BOLD) signal. The background BOLD signal also reflects systematic fluctuations in regional brain activity which ar… ▽ More Functional magnetic resonance imaging (fMRI) is one of the most popular methods for studying the human brain. Task-related fMRI data processing aims to determine which brain areas are activated when a specific task is performed and is usually based on the Blood Oxygen Level Dependent (BOLD) signal. The background BOLD signal also reflects systematic fluctuations in regional brain activity which are attributed to the existence of resting-state brain networks. We propose a new fMRI data generating model which takes into consideration the existence of common task-related and resting-state components. We first estimate the common task-related temporal component, via two successive stages of generalized canonical correlation analysis and, then, we estimate the common task-related spatial component, leading to a task-related activation map. The experimental tests of our method with synthetic data reveal that we are able to obtain very accurate temporal and spatial estimates even at very low Signal to Noise Ratio (SNR), which is usually the case in fMRI data processing. The tests with real-world fMRI data show significant advantages over standard procedures based on General Linear Models (GLMs). △ Less

Submitted 16 October, 2022; originally announced October 2022.

Journal ref: IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 31, 2022, pg 4011-4022

arXiv:2210.07132 [pdf, other]

Learning Multivariate CDFs and Copulas using Tensor Factorization

Authors: Magda Amiridi, Nicholas D. Sidiropoulos

Abstract: Learning the multivariate distribution of data is a core challenge in statistics and machine learning. Traditional methods aim for the probability density function (PDF) and are limited by the curse of dimensionality. Modern neural methods are mostly based on black-box models, lacking identifiability guarantees. In this work, we aim to learn multivariate cumulative distribution functions (CDFs), a… ▽ More Learning the multivariate distribution of data is a core challenge in statistics and machine learning. Traditional methods aim for the probability density function (PDF) and are limited by the curse of dimensionality. Modern neural methods are mostly based on black-box models, lacking identifiability guarantees. In this work, we aim to learn multivariate cumulative distribution functions (CDFs), as they can handle mixed random variables, allow efficient box probability evaluation, and have the potential to overcome local sample scarcity owing to their cumulative nature. We show that any grid sampled version of a joint CDF of mixed random variables admits a universal representation as a naive Bayes model via the Canonical Polyadic (tensor-rank) decomposition. By introducing a low-rank model, either directly in the raw data domain, or indirectly in a transformed (Copula) domain, the resulting model affords efficient sampling, closed form inference and uncertainty quantification, and comes with uniqueness guarantees under relatively mild conditions. We demonstrate the superior performance of the proposed model in several synthetic and real datasets and applications including regression, sampling and data imputation. Interestingly, our experiments with real data show that it is possible to obtain better density/mass estimates indirectly via a low-rank CDF model, than a low-rank PDF/PMF model. △ Less

Submitted 13 October, 2022; originally announced October 2022.

arXiv:2112.15296 [pdf, other]

Uncovering migration systems through spatio-temporal tensor co-clustering

Authors: Zack W. Almquist, Tri Duc Nguyen, Mikael Sorensen, Xiao Fu, Nicholas D. Sidiropoulos

Abstract: A central problem in the study of human mobility is that of migration systems. Typically, migration systems are defined as a set of relatively stable movements of people between two or more locations over time. While these emergent systems are expected to vary over time, they ideally contain a stable underlying structure that could be discovered empirically. There have been some notable attempts t… ▽ More A central problem in the study of human mobility is that of migration systems. Typically, migration systems are defined as a set of relatively stable movements of people between two or more locations over time. While these emergent systems are expected to vary over time, they ideally contain a stable underlying structure that could be discovered empirically. There have been some notable attempts to formally or informally define migration systems, however they have been limited by being hard to operationalize, and by defining migration systems in ways that ignore origin/destination aspects and/or fail to account for migration dynamics. In this work we propose a novel method, spatio-temporal (ST) tensor co-clustering, stemming from signal processing and machine learning theory. To demonstrate its effectiveness for describing stable migration systems we focus on domestic migration between counties in the US from 1990-2018. Relevant data for this period has been made available through the US Internal Revenue Service. Specifically, we concentrate on three illustrative case studies: (i) US Metropolitan Areas, (ii) the state of California, and (iii) Louisiana, focusing on detecting exogenous events such as Hurricane Katrina in 2005. Finally, we conclude with discussion and limitations of this approach. △ Less

Submitted 26 January, 2023; v1 submitted 30 December, 2021; originally announced December 2021.

arXiv:2106.10591 [pdf, other]

doi 10.1109/TSP.2022.3158422

Low-rank Characteristic Tensor Density Estimation Part II: Compression and Latent Density Estimation

Authors: Magda Amiridi, Nikos Kargas, Nicholas D. Sidiropoulos

Abstract: Learning generative probabilistic models is a core problem in machine learning, which presents significant challenges due to the curse of dimensionality. This paper proposes a joint dimensionality reduction and non-parametric density estimation framework, using a novel estimator that can explicitly capture the underlying distribution of appropriate reduced-dimension representations of the input da… ▽ More Learning generative probabilistic models is a core problem in machine learning, which presents significant challenges due to the curse of dimensionality. This paper proposes a joint dimensionality reduction and non-parametric density estimation framework, using a novel estimator that can explicitly capture the underlying distribution of appropriate reduced-dimension representations of the input data. The idea is to jointly design a nonlinear dimensionality reducing auto-encoder to model the training data in terms of a parsimonious set of latent random variables, and learn a canonical low-rank tensor model of the joint distribution of the latent variables in the Fourier domain. The proposed latent density model is non-parametric and universal, as opposed to the predefined prior that is assumed in variational auto-encoders. Joint optimization of the auto-encoder and the latent density estimator is pursued via a formulation which learns both by minimizing a combination of the negative log-likelihood in the latent domain and the auto-encoder reconstruction loss. We demonstrate that the proposed model achieves very promising results on toy, tabular, and image datasets on regression tasks, sampling, and anomaly detection. △ Less

Submitted 19 June, 2021; originally announced June 2021.

arXiv:2103.10027 [pdf, other]

doi 10.1109/TSP.2021.3133690

Probabilistic Simplex Component Analysis

Authors: Ruiyuan Wu, Wing-Kin Ma, Yuening Li, Anthony Man-Cho So, Nicholas D. Sidiropoulos

Abstract: This study presents PRISM, a probabilistic simplex component analysis approach to identifying the vertices of a data-circumscribing simplex from data. The problem has a rich variety of applications, the most notable being hyperspectral unmixing in remote sensing and non-negative matrix factorization in machine learning. PRISM uses a simple probabilistic model, namely, uniform simplex data distribu… ▽ More This study presents PRISM, a probabilistic simplex component analysis approach to identifying the vertices of a data-circumscribing simplex from data. The problem has a rich variety of applications, the most notable being hyperspectral unmixing in remote sensing and non-negative matrix factorization in machine learning. PRISM uses a simple probabilistic model, namely, uniform simplex data distribution and additive Gaussian noise, and it carries out inference by maximum likelihood. The inference model is sound in the sense that the vertices are provably identifiable under some assumptions, and it suggests that PRISM can be effective in combating noise when the number of data points is large. PRISM has strong, but hidden, relationships with simplex volume minimization, a powerful geometric approach for the same problem. We study these fundamental aspects, and we also consider algorithmic schemes based on importance sampling and variational inference. In particular, the variational inference scheme is shown to resemble a matrix factorization problem with a special regularizer, which draws an interesting connection to the matrix factorization approach. Numerical results are provided to demonstrate the potential of PRISM. △ Less

Submitted 20 January, 2022; v1 submitted 18 March, 2021; originally announced March 2021.

arXiv:2102.03434 [pdf, ps, other]

Exploring the Subgraph Density-Size Trade-off via the Lovász Extension

Authors: Aritra Konar, Nicholas D. Sidiropoulos

Abstract: Given an undirected graph, the Densest-k-Subgraph problem (DkS) seeks to find a subset of k vertices such that the sum of the edge weights in the corresponding subgraph is maximized. The problem is known to be NP-hard, and is also very difficult to approximate, in the worst-case. In this paper, we present a new convex relaxation for the problem. Our key idea is to reformulate DkS as minimizing a s… ▽ More Given an undirected graph, the Densest-k-Subgraph problem (DkS) seeks to find a subset of k vertices such that the sum of the edge weights in the corresponding subgraph is maximized. The problem is known to be NP-hard, and is also very difficult to approximate, in the worst-case. In this paper, we present a new convex relaxation for the problem. Our key idea is to reformulate DkS as minimizing a submodular function subject to a cardinality constraint. Exploiting the fact that submodular functions possess a convex, continuous extension (known as the Lovász extension), we propose to minimize the Lovász extension over the convex hull of the cardinality constraints. Although the Lovász extension of a submodular function does not admit an analytical form in general, for DkS we show that it does. We leverage this result to develop a highly scalable algorithm based on the Alternating Direction Method of Multipliers (ADMM) for solving the relaxed problem. Coupled with a pair of fortuitously simple rounding schemes, we demonstrate that our approach outperforms existing baselines on real-world graphs and can yield high quality sub-optimal solutions which typically are a posteriori no worse than 65-80\% of the optimal density. △ Less

Submitted 5 February, 2021; originally announced February 2021.

Comments: Accepted for publication at ACM WSDM 2021

arXiv:2012.10853 [pdf, other]

eTREE: Learning Tree-structured Embeddings

Authors: Faisal M. Almutairi, Yunlong Wang, Dong Wang, Emily Zhao, Nicholas D. Sidiropoulos

Abstract: Matrix factorization (MF) plays an important role in a wide range of machine learning and data mining models. MF is commonly used to obtain item embeddings and feature representations due to its ability to capture correlations and higher-order statistical dependencies across dimensions. In many applications, the categories of items exhibit a hierarchical tree structure. For instance, human disease… ▽ More Matrix factorization (MF) plays an important role in a wide range of machine learning and data mining models. MF is commonly used to obtain item embeddings and feature representations due to its ability to capture correlations and higher-order statistical dependencies across dimensions. In many applications, the categories of items exhibit a hierarchical tree structure. For instance, human diseases can be divided into coarse categories, e.g., bacterial, and viral. These categories can be further divided into finer categories, e.g., viral infections can be respiratory, gastrointestinal, and exanthematous viral diseases. In e-commerce, products, movies, books, etc., are grouped into hierarchical categories, e.g., clothing items are divided by gender, then by type (formal, casual, etc.). While the tree structure and the categories of the different items may be known in some applications, they have to be learned together with the embeddings in many others. In this work, we propose eTREE, a model that incorporates the (usually ignored) tree structure to enhance the quality of the embeddings. We leverage the special uniqueness properties of Nonnegative MF (NMF) to prove identifiability of eTREE. The proposed model not only exploits the tree structure prior, but also learns the hierarchical clustering in an unsupervised data-driven fashion. We derive an efficient algorithmic solution and a scalable implementation of eTREE that exploits parallel computing, computation caching, and warm start strategies. We showcase the effectiveness of eTREE on real data from various application domains: healthcare, recommender systems, and education. We also demonstrate the meaningfulness of the tree obtained from eTREE by means of domain experts interpretation. △ Less

Submitted 20 December, 2020; originally announced December 2020.

arXiv:2012.04747 [pdf, other]

STELAR: Spatio-temporal Tensor Factorization with Latent Epidemiological Regularization

Authors: Nikos Kargas, Cheng Qian, Nicholas D. Sidiropoulos, Cao Xiao, Lucas M. Glass, Jimeng Sun

Abstract: Accurate prediction of the transmission of epidemic diseases such as COVID-19 is crucial for implementing effective mitigation measures. In this work, we develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously. We construct a 3-way spatio-temporal tensor (location, attribute, time) of case counts and propose a nonnegative tensor factorization with latent… ▽ More Accurate prediction of the transmission of epidemic diseases such as COVID-19 is crucial for implementing effective mitigation measures. In this work, we develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously. We construct a 3-way spatio-temporal tensor (location, attribute, time) of case counts and propose a nonnegative tensor factorization with latent epidemiological model regularization named STELAR. Unlike standard tensor factorization methods which cannot predict slabs ahead, STELAR enables long-term prediction by incorporating latent temporal regularization through a system of discrete-time difference equations of a widely adopted epidemiological model. We use latent instead of location/attribute-level epidemiological dynamics to capture common epidemic profile sub-types and improve collaborative learning and prediction. We conduct experiments using both county- and state-level COVID-19 data and show that our model can identify interesting latent patterns of the epidemic. Finally, we evaluate the predictive ability of our method and show superior performance compared to the baselines, achieving up to 21% lower root mean square error and 25% lower mean absolute error for county-level prediction. △ Less

Submitted 17 March, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: AAAI 2021

arXiv:2011.01422 [pdf, ps, other]

GAGE: Geometry Preserving Attributed Graph Embeddings

Authors: Charilaos I. Kanatsoulis, Nicholas D. Sidiropoulos

Abstract: Node embedding is the task of extracting concise and informative representations of certain entities that are connected in a network. Various real-world networks include information about both node connectivity and certain node attributes, in the form of features or time-series data. Modern representation learning techniques employ both the connectivity and attribute information of the nodes to pr… ▽ More Node embedding is the task of extracting concise and informative representations of certain entities that are connected in a network. Various real-world networks include information about both node connectivity and certain node attributes, in the form of features or time-series data. Modern representation learning techniques employ both the connectivity and attribute information of the nodes to produce embeddings in an unsupervised manner. In this context, deriving embeddings that preserve the geometry of the network and the attribute vectors would be highly desirable, as they would reflect both the topological neighborhood structure and proximity in feature space. While this is fairly straightforward to maintain when only observing the connectivity or attribute information of the network, preserving the geometry of both types of information is challenging. A novel tensor factorization approach for node embedding in attributed networks is proposed in this paper, that preserves the distances of both the connections and the attributes. Furthermore, an effective and lightweight algorithm is developed to tackle the learning task and judicious experiments with multiple state-of-the-art baselines suggest that the proposed algorithm offers significant performance improvements in downstream tasks. △ Less

Submitted 23 February, 2022; v1 submitted 2 November, 2020; originally announced November 2020.

arXiv:2010.16181 [pdf, other]

doi 10.1109/TSP.2021.3125147

Information-theoretic Feature Selection via Tensor Decomposition and Submodularity

Authors: Magda Amiridi, Nikos Kargas, Nicholas D. Sidiropoulos

Abstract: Feature selection by maximizing high-order mutual information between the selected feature vector and a target variable is the gold standard in terms of selecting the best subset of relevant features that maximizes the performance of prediction models. However, such an approach typically requires knowledge of the multivariate probability distribution of all features and the target, and involves a… ▽ More Feature selection by maximizing high-order mutual information between the selected feature vector and a target variable is the gold standard in terms of selecting the best subset of relevant features that maximizes the performance of prediction models. However, such an approach typically requires knowledge of the multivariate probability distribution of all features and the target, and involves a challenging combinatorial optimization problem. Recent work has shown that any joint Probability Mass Function (PMF) can be represented as a naive Bayes model, via Canonical Polyadic (tensor rank) Decomposition. In this paper, we introduce a low-rank tensor model of the joint PMF of all variables and indirect targeting as a way of mitigating complexity and maximizing the classification performance for a given number of features. Through low-rank modeling of the joint PMF, it is possible to circumvent the curse of dimensionality by learning principal components of the joint distribution. By indirectly aiming to predict the latent variable of the naive Bayes model instead of the original target variable, it is possible to formulate the feature selection problem as maximization of a monotone submodular function subject to a cardinality constraint - which can be tackled using a greedy algorithm that comes with performance guarantees. Numerical experiments with several standard datasets suggest that the proposed approach compares favorably to the state-of-art for this important problem. △ Less

Submitted 30 October, 2020; originally announced October 2020.

arXiv:2010.11367 [pdf, other]

TeX-Graph: Coupled tensor-matrix knowledge-graph embedding for COVID-19 drug repurposing

Authors: Charilaos I. Kanatsoulis, Nicholas D. Sidiropoulos

Abstract: Knowledge graphs (KGs) are powerful tools that codify relational behaviour between entities in knowledge bases. KGs can simultaneously model many different types of subject-predicate-object and higher-order relations. As such, they offer a flexible modeling framework that has been applied to many areas, including biology and pharmacology -- most recently, in the fight against COVID-19. The flexibi… ▽ More Knowledge graphs (KGs) are powerful tools that codify relational behaviour between entities in knowledge bases. KGs can simultaneously model many different types of subject-predicate-object and higher-order relations. As such, they offer a flexible modeling framework that has been applied to many areas, including biology and pharmacology -- most recently, in the fight against COVID-19. The flexibility of KG modeling is both a blessing and a challenge from the learning point of view. In this paper we propose a novel coupled tensor-matrix framework for KG embedding. We leverage tensor factorization tools to learn concise representations of entities and relations in knowledge bases and employ these representations to perform drug repurposing for COVID-19. Our proposed framework is principled, elegant, and achieves 100% improvement over the best baseline in the COVID-19 drug repurposing task using a recently developed biological KG. △ Less

Submitted 25 October, 2020; v1 submitted 21 October, 2020; originally announced October 2020.

arXiv:2010.00696 [pdf, other]

PHASED: Phase-Aware Submodularity-Based Energy Disaggregation

Authors: Faisal M. Almutairi, Aritra Konar, Ahmed S. Zamzam, Nicholas D. Sidiropoulos

Abstract: Energy disaggregation is the task of discerning the energy consumption of individual appliances from aggregated measurements, which holds promise for understanding and reducing energy usage. In this paper, we propose PHASED, an optimization approach for energy disaggregation that has two key features: PHASED (i) exploits the structure of power distribution systems to make use of readily available… ▽ More Energy disaggregation is the task of discerning the energy consumption of individual appliances from aggregated measurements, which holds promise for understanding and reducing energy usage. In this paper, we propose PHASED, an optimization approach for energy disaggregation that has two key features: PHASED (i) exploits the structure of power distribution systems to make use of readily available measurements that are neglected by existing methods, and (ii) poses the problem as a minimization of a difference of submodular functions. We leverage this form by applying a discrete optimization variant of the majorization-minimization algorithm to iteratively minimize a sequence of global upper bounds of the cost function to obtain high-quality approximate solutions. PHASED improves the disaggregation accuracy of state-of-the-art models by up to 61% and achieves better prediction on heavy load appliances. △ Less

Submitted 1 October, 2020; originally announced October 2020.

arXiv:2008.12315 [pdf, other]

doi 10.1109/TSP.2022.3175608

Low-rank Characteristic Tensor Density Estimation Part I: Foundations

Authors: Magda Amiridi, Nikos Kargas, Nicholas D. Sidiropoulos

Abstract: Effective non-parametric density estimation is a key challenge in high-dimensional multivariate data analysis. In this paper,we propose a novel approach that builds upon tensor factorization tools. Any multivariate density can be represented by its characteristic function, via the Fourier transform. If the sought density is compactly supported, then its characteristic function can be approximated,… ▽ More Effective non-parametric density estimation is a key challenge in high-dimensional multivariate data analysis. In this paper,we propose a novel approach that builds upon tensor factorization tools. Any multivariate density can be represented by its characteristic function, via the Fourier transform. If the sought density is compactly supported, then its characteristic function can be approximated, within controllable error, by a finite tensor of leading Fourier coefficients, whose size de-pends on the smoothness of the underlying density. This tensor can be naturally estimated from observed realizations of the random vector of interest, via sample averaging. In order to circumvent the curse of dimensionality, we introduce a low-rank model of this characteristic tensor, which significantly improves the density estimate especially for high-dimensional data and/or in the sample-starved regime. By virtue of uniqueness of low-rank tensor decomposition, under certain conditions, our method enables learning the true data-generating distribution. We demonstrate the very promising performance of the proposed method using several measured datasets. △ Less

Submitted 4 June, 2021; v1 submitted 27 August, 2020; originally announced August 2020.

arXiv:2008.07996 [pdf, ps, other]

Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods

Authors: Aritra Konar, Nicholas D. Sidiropoulos

Abstract: Mining dense subgraphs is an important primitive across a spectrum of graph-mining tasks. In this work, we formally establish that two recurring characteristics of real-world graphs, namely heavy-tailed degree distributions and large clustering coefficients, imply the existence of substantially large vertex neighborhoods with high edge-density. This observation suggests a very simple approach for… ▽ More Mining dense subgraphs is an important primitive across a spectrum of graph-mining tasks. In this work, we formally establish that two recurring characteristics of real-world graphs, namely heavy-tailed degree distributions and large clustering coefficients, imply the existence of substantially large vertex neighborhoods with high edge-density. This observation suggests a very simple approach for extracting large quasi-cliques: simply scan the vertex neighborhoods, compute the clustering coefficient of each vertex, and output the best such subgraph. The implementation of such a method requires counting the triangles in a graph, which is a well-studied problem in graph mining. When empirically tested across a number of real-world graphs, this approach reveals a surprise: vertex neighborhoods include maximal cliques of non-trivial sizes, and the density of the best neighborhood often compares favorably to subgraphs produced by dedicated algorithms for maximizing subgraph density. For graphs with small clustering coefficients, we demonstrate that small vertex neighborhoods can be refined using a local-search method to ``grow'' larger cliques and near-cliques. Our results indicate that contrary to worst-case theoretical results, mining cliques and quasi-cliques of non-trivial sizes from real-world graphs is often not a difficult problem, and provides motivation for further work geared towards a better explanation of these empirical successes. △ Less

Submitted 18 August, 2020; originally announced August 2020.

Comments: Accepted for publication at KDD 2020 (Research Track), 12 pages

ACM Class: G.2.2; G.2.1

arXiv:2005.07803 [pdf, other]

Exactness of OPF Relaxation on Three-phase Radial Networks with Delta Connections

Authors: Fengyu Zhou, Ahmed S. Zamzam, Steven H. Low, Nicholas D. Sidiropoulos

Abstract: Simulations have shown that while semi-definite relaxations of AC optimal power flow (AC-OPF) on three-phase radial networks with only wye connections tend to be exact, the presence of delta connections seem to render them inexact. This paper shows that such inexactness originates from the non-uniqueness of relaxation solutions and numerical errors amplified by the non-uniqueness. This finding mot… ▽ More Simulations have shown that while semi-definite relaxations of AC optimal power flow (AC-OPF) on three-phase radial networks with only wye connections tend to be exact, the presence of delta connections seem to render them inexact. This paper shows that such inexactness originates from the non-uniqueness of relaxation solutions and numerical errors amplified by the non-uniqueness. This finding motivates two algorithms to recover the exact solution of AC-OPF in the presence of delta connections. In simulations using IEEE 13, 37 and 123-bus systems, the proposed algorithms provide exact optimal solutions up to numerical precision. △ Less

Submitted 15 May, 2020; originally announced May 2020.

Comments: 10 pages, 1 figure

arXiv:2004.05522 [pdf, ps, other]

Cell-Edge Detection via Selective Cooperation and Generalized Canonical Correlation

Authors: Mohamed Salah Ibrahim, Ahmed S. Zamzam, Aritra Konar, Nicholas D. Sidiropoulos

Abstract: Improving the uplink quality of service for users located around the boundaries between cells is a key challenge in LTE systems. Relying on power control, existing approaches throttle the rates of cell-center users, while multi-user detection requires accurate channel estimates for the cell-edge users, which is another challenge due to their low received signal-to-noise ratio (SNR). Utilizing the… ▽ More Improving the uplink quality of service for users located around the boundaries between cells is a key challenge in LTE systems. Relying on power control, existing approaches throttle the rates of cell-center users, while multi-user detection requires accurate channel estimates for the cell-edge users, which is another challenge due to their low received signal-to-noise ratio (SNR). Utilizing the fact that cell-edge user signals are weak but common (received at roughly equal power) at different base stations (BSs), this paper establishes a connection between cell-edge user detection and generalized canonical correlation analysis (GCCA). It puts forth a GCCA-based method that leverages selective BS cooperation to recover the cell-edge user signal subspace even at low SNR. The cell-edge user signals can then be extracted from the resulting mixture via algebraic signal processing techniques. The paper includes theoretical analysis showing why GCCA recovers the correct subspace containing the cell-edge user signals under mild conditions. The proposed method can also identify the number of cell-edge users in the system, i.e., the common subspace dimension. Simulations reveal significant performance improvement relative to various multiuser detection techniques. Cell-edge detection performance is further studied as a function of how many / which BSs are selected, and it is shown that using the closest three BS is always the best choice. △ Less

Submitted 11 April, 2020; originally announced April 2020.

arXiv:2003.12666 [pdf, other]

GRATE: Granular Recovery of Aggregated Tensor Data by Example

Authors: Ahmed S. Zamzam, Bo Yang, Nicholas D. Sidiropoulos

Abstract: In this paper, we address the challenge of recovering an accurate breakdown of aggregated tensor data using disaggregation examples. This problem is motivated by several applications. For example, given the breakdown of energy consumption at some homes, how can we disaggregate the total energy consumed during the same period at other homes? In order to address this challenge, we propose GRATE, a p… ▽ More In this paper, we address the challenge of recovering an accurate breakdown of aggregated tensor data using disaggregation examples. This problem is motivated by several applications. For example, given the breakdown of energy consumption at some homes, how can we disaggregate the total energy consumed during the same period at other homes? In order to address this challenge, we propose GRATE, a principled method that turns the ill-posed task at hand into a constrained tensor factorization problem. Then, this optimization problem is tackled using an alternating least-squares algorithm. GRATE has the ability to handle exact aggregated data as well as inexact aggregation where some unobserved quantities contribute to the aggregated data. Special emphasis is given to the energy disaggregation problem where the goal is to provide energy breakdown for consumers from their monthly aggregated consumption. Experiments on two real datasets show the efficacy of GRATE in recovering more accurate disaggregation than state-of-the-art energy disaggregation methods. △ Less

Submitted 5 April, 2020; v1 submitted 27 March, 2020; originally announced March 2020.

Comments: 20 pages, 3 figures

arXiv:2003.11205 [pdf, ps, other]

doi 10.1109/TSP.2021.3061218

Generalized Canonical Correlation Analysis: A Subspace Intersection Approach

Authors: Mikael Sørensen, Charilaos I. Kanatsoulis, Nicholas D. Sidiropoulos

Abstract: Generalized Canonical Correlation Analysis (GCCA) is an important tool that finds numerous applications in data mining, machine learning, and artificial intelligence. It aims at finding `common' random variables that are strongly correlated across multiple feature representations (views) of the same set of entities. CCA and to a lesser extent GCCA have been studied from the statistical and algorit… ▽ More Generalized Canonical Correlation Analysis (GCCA) is an important tool that finds numerous applications in data mining, machine learning, and artificial intelligence. It aims at finding `common' random variables that are strongly correlated across multiple feature representations (views) of the same set of entities. CCA and to a lesser extent GCCA have been studied from the statistical and algorithmic points of view, but not as much from the standpoint of linear algebra. This paper offers a fresh algebraic perspective of GCCA based on a (bi-)linear generative model that naturally captures its essence. It is shown that from a linear algebra point of view, GCCA is tantamount to subspace intersection; and conditions under which the common subspace of the different views is identifiable are provided. A novel GCCA algorithm is proposed based on subspace intersection, which scales up to handle large GCCA tasks. Synthetic as well as real data experiments are provided to showcase the effectiveness of the proposed approach. △ Less

Submitted 25 March, 2020; originally announced March 2020.

arXiv:2003.02255 [pdf, ps, other]

Reliable Detection of Unknown Cell-Edge Users Via Canonical Correlation Analysis

Authors: Mohamed Salah Ibrahim, Nicholas D. Sidiropoulos

Abstract: Providing reliable service to users close to the edge between cells remains a challenge in cellular systems, even as 5G deployment is around the corner. These users are subject to significant signal attenuation, which also degrades their uplink channel estimates. Even joint detection using base station (BS) cooperation often fails to reliably detect such users, due to near-far power imbalance, and… ▽ More Providing reliable service to users close to the edge between cells remains a challenge in cellular systems, even as 5G deployment is around the corner. These users are subject to significant signal attenuation, which also degrades their uplink channel estimates. Even joint detection using base station (BS) cooperation often fails to reliably detect such users, due to near-far power imbalance, and channel estimation errors. Is it possible to bypass the channel estimation stage and design a detector that can reliably detect cell-edge user signals under significant near-far imbalance? This paper shows, perhaps surprisingly, that the answer is affirmative -- albeit not via traditional multiuser detection. Exploiting that cell-edge user signals are weak but {\em common} to different base stations, while cell-center users are unique to their serving BS, this paper establishes an elegant connection between cell-edge user detection and canonical correlation analysis (CCA) of the associated space-time baseband-equivalent matrices. It proves that CCA identifies the common subspace of these matrices, even under significant intra- and inter-cell interference. The resulting mixture of cell-edge user signals can subsequently be unraveled using a well-known algebraic signal processing technique. Interestingly, the proposed approach does not even require that the signals from the different base stations are synchronized -- the right synchronization can be automatically determined as well. Experimental results demonstrate that the proposed approach achieves order of magnitude BER improvements compared to `oracle' multiuser detection that assumes perfect knowledge of the cell-center user channels. △ Less

Submitted 4 March, 2020; originally announced March 2020.

arXiv:2003.02240 [pdf, ps, other]

Fast Algorithms for Joint Multicast Beamforming and Antenna Selection in Massive MIMO

Authors: Mohamed Salah Ibrahim, Aritra Konar, Nicholas D. Sidiropoulos

Abstract: Massive MIMO is currently a leading physical layer technology candidate that can dramatically enhance throughput in 5G systems, for both unicast and multicast transmission modalities. As antenna elements are becoming smaller and cheaper in the mmW range compared to radio frequency (RF) chains, it is crucial to perform antenna selection at the transmitter, such that the available RF chains are swit… ▽ More Massive MIMO is currently a leading physical layer technology candidate that can dramatically enhance throughput in 5G systems, for both unicast and multicast transmission modalities. As antenna elements are becoming smaller and cheaper in the mmW range compared to radio frequency (RF) chains, it is crucial to perform antenna selection at the transmitter, such that the available RF chains are switched to an appropriate subset of antennas. This paper considers the joint problem of multicast beamforming and antenna selection for a single multicast group in massive MIMO systems. The prior state-of-art for this problem relies on semi-definite relaxation (SDR), which cannot scale up to the massive MIMO regime. A successive convex approximation (SCA) based approach is proposed to tackle max-min fair joint multicast beamforming and antenna selection. The key idea of SCA is to successively approximate the non-convex problem by a class of non-smooth, convex optimization problems. Two fast and memory efficient first-order methods are proposed to solve each SCA subproblem. Simulations demonstrate that the proposed algorithms outperform the existing state-of-art approach in terms of solution quality and run time, in both traditional and especially in massive MIMO settings. △ Less

Submitted 4 March, 2020; originally announced March 2020.

arXiv:1910.12001 [pdf, other]

PREMA: Principled Tensor Data Recovery from Multiple Aggregated Views

Authors: Faisal M. Almutairi, Charilaos I. Kanatsoulis, Nicholas D. Sidiropoulos

Abstract: Multidimensional data have become ubiquitous and are frequently encountered in situations where the information is aggregated over multiple data atoms. The aggregation can be over time or other features, such as geographical location. We often have access to multiple aggregated views of the same data, each aggregated in one or more dimensions, especially when data are collected or measured by diff… ▽ More Multidimensional data have become ubiquitous and are frequently encountered in situations where the information is aggregated over multiple data atoms. The aggregation can be over time or other features, such as geographical location. We often have access to multiple aggregated views of the same data, each aggregated in one or more dimensions, especially when data are collected or measured by different agencies. For instance, item sales can be aggregated temporally, and over groups of stores based on their location or affiliation. However, data mining and machine learning models benefit from detailed data for personalized analysis and prediction. Thus, data disaggregation algorithms are becoming increasingly important in various domains. The goal of this paper is to reconstruct finer-scale data from multiple coarse views, aggregated over different (subsets of) dimensions. The proposed method, called PREMA, leverages low-rank tensor factorization tools to fuse the multiple views and provide recovery guarantees under certain conditions. PREMA can tackle challenging scenarios, such as missing or partially observed data, double aggregation, and even blind disaggregation (without knowledge of the aggregation patterns) using a variant of PREMA called B-PREMA. To showcase the effectiveness of PREMA, the paper includes extensive experiments using real data from different domains: retail sales, crime counts, and weather observations. △ Less

Submitted 10 April, 2020; v1 submitted 26 October, 2019; originally announced October 2019.

arXiv:1907.11911 [pdf, other]

REP: Predicting the Time-Course of Drug Sensitivity

Authors: Cheng Qian, Amin Emad, Nicholas D. Sidiropoulos

Abstract: The biological processes involved in a drug's mechanisms of action are oftentimes dynamic, complex and difficult to discern. Time-course gene expression data is a rich source of information that can be used to unravel these complex processes, identify biomarkers of drug sensitivity and predict the response to a drug. However, the majority of previous work has not fully utilized this temporal dimen… ▽ More The biological processes involved in a drug's mechanisms of action are oftentimes dynamic, complex and difficult to discern. Time-course gene expression data is a rich source of information that can be used to unravel these complex processes, identify biomarkers of drug sensitivity and predict the response to a drug. However, the majority of previous work has not fully utilized this temporal dimension. In these studies, the gene expression data is either considered at one time-point (before the administration of the drug) or two timepoints (before and after the administration of the drug). This is clearly inadequate in modeling dynamic gene-drug interactions, especially for applications such as long-term drug therapy. In this work, we present a novel REcursive Prediction (REP) framework for drug response prediction by taking advantage of time-course gene expression data. Our goal is to predict drug response values at every stage of a long-term treatment, given the expression levels of genes collected in the previous time-points. To this end, REP employs a built-in recursive structure that exploits the intrinsic time-course nature of the data and integrates past values of drug responses for subsequent predictions. It also incorporates tensor completion that can not only alleviate the impact of noise and missing data, but also predict unseen gene expression levels (GELs). These advantages enable REP to estimate drug response at any stage of a given treatment from some GELs measured in the beginning of the treatment. Extensive experiments on a dataset corresponding to 53 multiple sclerosis patients treated with interferon are included to showcase the effectiveness of REP. △ Less

Submitted 27 July, 2019; originally announced July 2019.

arXiv:1907.11904 [pdf, other]

doi 10.1109/LSP.2019.2945490

Amplitude Retrieval for Channel Estimation of MIMO Systems with One-Bit ADCs

Authors: Cheng Qian, Xiao Fu, Nicholas D. Sidiropoulos

Abstract: This letter revisits the channel estimation problem for MIMO systems with one-bit analog-to-digital converters (ADCs) through a novel algorithm--Amplitude Retrieval (AR). Unlike the state-of-the-art methods such as those based on one-bit compressive sensing, AR takes a different approach. It accounts for the lost amplitudes of the one-bit quantized measurements, and performs channel estimation and… ▽ More This letter revisits the channel estimation problem for MIMO systems with one-bit analog-to-digital converters (ADCs) through a novel algorithm--Amplitude Retrieval (AR). Unlike the state-of-the-art methods such as those based on one-bit compressive sensing, AR takes a different approach. It accounts for the lost amplitudes of the one-bit quantized measurements, and performs channel estimation and amplitude completion jointly. This way, the direction information of the propagation paths can be estimated via accurate direction finding algorithms in array processing, e.g., maximum likelihood. The upsot is that AR is able to handle off-grid angles and provide more accurate channel estimates. Simulation results are included to showcase the advantages of AR. △ Less

Submitted 27 July, 2019; originally announced July 2019.

arXiv:1906.05746 [pdf, other]

Nonlinear System Identification via Tensor Completion

Authors: Nikos Kargas, Nicholas D. Sidiropoulos

Abstract: Function approximation from input and output data pairs constitutes a fundamental problem in supervised learning. Deep neural networks are currently the most popular method for learning to mimic the input-output relationship of a general nonlinear system, as they have proven to be very effective in approximating complex highly nonlinear functions. In this work, we show that identifying a general n… ▽ More Function approximation from input and output data pairs constitutes a fundamental problem in supervised learning. Deep neural networks are currently the most popular method for learning to mimic the input-output relationship of a general nonlinear system, as they have proven to be very effective in approximating complex highly nonlinear functions. In this work, we show that identifying a general nonlinear function $y = f(x_1,\ldots,x_N)$ from input-output examples can be formulated as a tensor completion problem and under certain conditions provably correct nonlinear system identification is possible. Specifically, we model the interactions between the $N$ input variables and the scalar output of a system by a single $N$-way tensor, and setup a weighted low-rank tensor completion problem with smoothness regularization which we tackle using a block coordinate descent algorithm. We extend our method to the multi-output setting and the case of partially observed data, which cannot be readily handled by neural networks. Finally, we demonstrate the effectiveness of the approach using several regression tasks including some standard benchmarks and a challenging student grade prediction task. △ Less

Submitted 6 December, 2019; v1 submitted 13 June, 2019; originally announced June 2019.

Comments: AAAI 2020

arXiv:1904.12385 [pdf, other]

Machine Learning in the Air

Authors: Deniz Gunduz, Paul de Kerret, Nicholas D. Sidiropoulos, David Gesbert, Chandra Murthy, Mihaela van der Schaar

Abstract: Thanks to the recent advances in processing speed and data acquisition and storage, machine learning (ML) is penetrating every facet of our lives, and transforming research in many areas in a fundamental manner. Wireless communications is another success story -- ubiquitous in our lives, from handheld devices to wearables, smart homes, and automobiles. While recent years have seen a flurry of rese… ▽ More Thanks to the recent advances in processing speed and data acquisition and storage, machine learning (ML) is penetrating every facet of our lives, and transforming research in many areas in a fundamental manner. Wireless communications is another success story -- ubiquitous in our lives, from handheld devices to wearables, smart homes, and automobiles. While recent years have seen a flurry of research activity in exploiting ML tools for various wireless communication problems, the impact of these techniques in practical communication systems and standards is yet to be seen. In this paper, we review some of the major promises and challenges of ML in wireless communication systems, focusing mainly on the physical layer. We present some of the most striking recent accomplishments that ML techniques have achieved with respect to classical approaches, and point to promising research directions where ML is likely to make the biggest impact in the near future. We also highlight the complementary problem of designing physical layer techniques to enable distributed ML at the wireless network edge, which further emphasizes the need to understand and connect ML with fundamental concepts in wireless communications. △ Less

Submitted 28 April, 2019; originally announced April 2019.

arXiv:1904.01156 [pdf, ps, other]

Learning Mixtures of Smooth Product Distributions: Identifiability and Algorithm

Authors: Nikos Kargas, Nicholas D. Sidiropoulos

Abstract: We study the problem of learning a mixture model of non-parametric product distributions. The problem of learning a mixture model is that of finding the component distributions along with the mixing weights using observed samples generated from the mixture. The problem is well-studied in the parametric setting, i.e., when the component distributions are members of a parametric family -- such as Ga… ▽ More We study the problem of learning a mixture model of non-parametric product distributions. The problem of learning a mixture model is that of finding the component distributions along with the mixing weights using observed samples generated from the mixture. The problem is well-studied in the parametric setting, i.e., when the component distributions are members of a parametric family -- such as Gaussian distributions. In this work, we focus on multivariate mixtures of non-parametric product distributions and propose a two-stage approach which recovers the component distributions of the mixture under a smoothness condition. Our approach builds upon the identifiability properties of the canonical polyadic (low-rank) decomposition of tensors, in tandem with Fourier and Shannon-Nyquist sampling staples from signal processing. We demonstrate the effectiveness of the approach on synthetic and real datasets. △ Less

Submitted 1 April, 2019; originally announced April 2019.

Comments: accepted to appear in AISTATS 2019

arXiv:1903.11107 [pdf, other]

Energy Storage Management via Deep Q-Networks

Authors: Ahmed S. Zamzam, Bo Yang, Nicholas D. Sidiropoulos

Abstract: Energy storage devices represent environmentally friendly candidates to cope with volatile renewable energy generation. Motivated by the increase in privately owned storage systems, this paper studies the problem of real-time control of a storage unit co-located with a renewable energy generator and an inelastic load. Unlike many approaches in the literature, no distributional assumptions are bein… ▽ More Energy storage devices represent environmentally friendly candidates to cope with volatile renewable energy generation. Motivated by the increase in privately owned storage systems, this paper studies the problem of real-time control of a storage unit co-located with a renewable energy generator and an inelastic load. Unlike many approaches in the literature, no distributional assumptions are being made on the renewable energy generation or the real-time prices. Building on the deep Q-networks algorithm, a reinforcement learning approach utilizing a neural network is devised where the storage unit operational constraints are respected. The neural network approximates the action-value function which dictates what action (charging, discharging, etc.) to take. Simulations indicate that near-optimal performance can be attained with the proposed learning-based control policy for the storage units. △ Less

Submitted 26 March, 2019; originally announced March 2019.

Comments: IEEE PES-GM 2019

arXiv:1903.09669 [pdf, other]

Physics-Aware Neural Networks for Distribution System State Estimation

Authors: Ahmed S. Zamzam, Nicholas D. Sidiropoulos

Abstract: The distribution system state estimation problem seeks to determine the network state from available measurements. Widely used Gauss-Newton approaches are very sensitive to the initialization and often not suitable for real-time estimation. Learning approaches are very promising for real-time estimation, as they shift the computational burden to an offline training stage. Prior machine learning ap… ▽ More The distribution system state estimation problem seeks to determine the network state from available measurements. Widely used Gauss-Newton approaches are very sensitive to the initialization and often not suitable for real-time estimation. Learning approaches are very promising for real-time estimation, as they shift the computational burden to an offline training stage. Prior machine learning approaches to power system state estimation have been electrical model-agnostic, in that they did not exploit the topology and physical laws governing the power grid to design the architecture of the learning model. In this paper, we propose a novel learning model that utilizes the structure of the power grid. The proposed neural network architecture reduces the number of coefficients needed to parameterize the map** from the measurements to the network state by exploiting the separability of the estimation problem. This prevents overfitting and reduces the complexity of the training stage. We also propose a greedy algorithm for phasor measuring units placement that aims at minimizing the complexity of the neural network required for realizing the state estimation map**. Simulation results show superior performance of the proposed method over the Gauss-Newton approach. △ Less

Submitted 12 July, 2019; v1 submitted 22 March, 2019; originally announced March 2019.

Comments: 8 pages, 5 figures, 3 tables

arXiv:1903.08938 [pdf, other]

doi 10.1109/JSTSP.2019.2930893

Algebraic Channel Estimation Algorithms for FDD Massive MIMO systems

Authors: Cheng Qian, Xiao Fu, Nicholas D. Sidiropoulos

Abstract: We consider downlink (DL) channel estimation for frequency division duplex based massive MIMO systems under the multipath model. Our goal is to provide fast and accurate channel estimation from a small amount of DL training overhead. Prior art tackles this problem using compressive sensing or classic array processing techniques (e.g., ESPRIT and MUSIC). However, these methods have challenges in so… ▽ More We consider downlink (DL) channel estimation for frequency division duplex based massive MIMO systems under the multipath model. Our goal is to provide fast and accurate channel estimation from a small amount of DL training overhead. Prior art tackles this problem using compressive sensing or classic array processing techniques (e.g., ESPRIT and MUSIC). However, these methods have challenges in some scenarios, e.g., when the number of paths is greater than the number of receive antennas. Tensor factorization methods can also be used to handle such challenging cases, but it is hard to solve the associated optimization problems. In this work, we propose an efficient channel estimation framework to circumvent such difficulties. Specifically, a structural training sequence that imposes a tensor structure on the received signal is proposed. We show that with such a training sequence, the parameters of DL MIMO channels can be provably identified even when the number of paths largely exceeds the number of receive antennas---under very small training overhead. Our approach is a judicious combination of Vandermonde tensor algebra and a carefully designed conjugate-invariant training sequence. Unlike existing tensor-based channel estimation methods that involve hard optimization problems, the proposed approach consists of very lightweight algebraic operations, and thus real-time implementation is within reach. Simulation results are carried out to showcase the effectiveness of the proposed methods. △ Less

Submitted 12 July, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

arXiv:1903.00435 [pdf, other]

doi 10.1109/TSP.2019.2952044

Tensor Completion from Regular Sub-Nyquist Samples

Authors: Charilaos I. Kanatsoulis, Xiao Fu, Nicholas D. Sidiropoulos, Mehmet Akçakaya

Abstract: Signal sampling and reconstruction is a fundamental engineering task at the heart of signal processing. The celebrated Shannon-Nyquist theorem guarantees perfect signal reconstruction from uniform samples, obtained at a rate twice the maximum frequency present in the signal. Unfortunately a large number of signals of interest are far from being band-limited. This motivated research on reconstructi… ▽ More Signal sampling and reconstruction is a fundamental engineering task at the heart of signal processing. The celebrated Shannon-Nyquist theorem guarantees perfect signal reconstruction from uniform samples, obtained at a rate twice the maximum frequency present in the signal. Unfortunately a large number of signals of interest are far from being band-limited. This motivated research on reconstruction from sub-Nyquist samples, which mainly hinges on the use of random / incoherent sampling procedures. However, uniform or regular sampling is more appealing in practice and from the system design point of view, as it is far simpler to implement, and often necessary due to system constraints. In this work, we study regular sampling and reconstruction of three- or higher-dimensional signals (tensors). We show that reconstructing a tensor signal from regular samples is feasible. Under the proposed framework, the sample complexity is determined by the tensor rank---rather than the signal bandwidth. This result offers new perspectives for designing practical regular sampling patterns and systems for signals that are naturally tensors, e.g., images and video. For a concrete application, we show that functional magnetic resonance imaging (fMRI) acceleration is a tensor sampling problem, and design practical sampling schemes and an algorithmic framework to handle it. Numerical results show that our tensor sampling strategy accelerates the fMRI sampling process significantly without sacrificing reconstruction accuracy. △ Less

Submitted 1 March, 2019; originally announced March 2019.

arXiv:1902.10226 [pdf, other]

doi 10.1109/GlobalSIP.2018.8646665

TensorMap: Lidar-Based Topological Map** and Localization via Tensor Decompositions

Authors: Sirisha Rambhatla, Nikos D. Sidiropoulos, Jarvis Haupt

Abstract: We propose a technique to develop (and localize in) topological maps from light detection and ranging (Lidar) data. Localizing an autonomous vehicle with respect to a reference map in real-time is crucial for its safe operation. Owing to the rich information provided by Lidar sensors, these are emerging as a promising choice for this task. However, since a Lidar outputs a large amount of data ever… ▽ More We propose a technique to develop (and localize in) topological maps from light detection and ranging (Lidar) data. Localizing an autonomous vehicle with respect to a reference map in real-time is crucial for its safe operation. Owing to the rich information provided by Lidar sensors, these are emerging as a promising choice for this task. However, since a Lidar outputs a large amount of data every fraction of a second, it is progressively harder to process the information in real-time. Consequently, current systems have migrated towards faster alternatives at the expense of accuracy. To overcome this inherent trade-off between latency and accuracy, we propose a technique to develop topological maps from Lidar data using the orthogonal Tucker3 tensor decomposition. Our experimental evaluations demonstrate that in addition to achieving a high compression ratio as compared to full data, the proposed technique, $\textit{TensorMap}$, also accurately detects the position of the vehicle in a graph-based representation of a map. We also analyze the robustness of the proposed technique to Gaussian and translational noise, thus initiating explorations into potential applications of tensor decompositions in Lidar data analysis. △ Less

Submitted 26 February, 2019; originally announced February 2019.

Comments: 5 pages; Index Terms - Topological maps, Lidar, Localization of Autonomous Vehicles, Orthogonal Tucker Decomposition, and Scan-matching

Journal ref: 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

arXiv:1901.01568 [pdf, other]

doi 10.1109/TSP.2020.2989551

Learning Nonlinear Mixtures: Identifiability and Algorithm

Authors: Bo Yang, Xiao Fu, Nicholas D. Sidiropoulos, Kejun Huang

Abstract: Linear mixture models have proven very useful in a plethora of applications, e.g., topic modeling, clustering, and source separation. As a critical aspect of the linear mixture models, identifiability of the model parameters is well-studied, under frameworks such as independent component analysis and constrained matrix factorization. Nevertheless, when the linear mixtures are distorted by an unkno… ▽ More Linear mixture models have proven very useful in a plethora of applications, e.g., topic modeling, clustering, and source separation. As a critical aspect of the linear mixture models, identifiability of the model parameters is well-studied, under frameworks such as independent component analysis and constrained matrix factorization. Nevertheless, when the linear mixtures are distorted by an unknown nonlinear functions -- which is well-motivated and more realistic in many cases -- the identifiability issues are much less studied. This work proposes an identification criterion for a nonlinear mixture model that is well grounded in many real-world applications, and offers identifiability guarantees. A practical implementation based on a judiciously designed neural network is proposed to realize the criterion, and an effective learning algorithm is proposed. Numerical results on synthetic and real-data corroborate effectiveness of the proposed method. △ Less

Submitted 6 January, 2019; originally announced January 2019.

Comments: 15 pages

arXiv:1810.12758 [pdf, ps, other]

From Gene Expression to Drug Response: A Collaborative Filtering Approach

Authors: Cheng Qian, Nicholas D. Sidiropoulos, Magda Amiridi, Amin Emad

Abstract: Predicting the response of cancer cells to drugs is an important problem in pharmacogenomics. Recent efforts in generation of large scale datasets profiling gene expression and drug sensitivity in cell lines have provided a unique opportunity to study this problem. However, one major challenge is the small number of samples (cell lines) compared to the number of features (genes) even in these larg… ▽ More Predicting the response of cancer cells to drugs is an important problem in pharmacogenomics. Recent efforts in generation of large scale datasets profiling gene expression and drug sensitivity in cell lines have provided a unique opportunity to study this problem. However, one major challenge is the small number of samples (cell lines) compared to the number of features (genes) even in these large datasets. We propose a collaborative filtering (CF) like algorithm for modeling gene-drug relationship to identify patients most likely to benefit from a treatment. Due to the correlation of gene expressions in different cell lines, the gene expression matrix is approximately low-rank, which suggests that drug responses could be estimated from a reduced dimension latent space of the gene expression. Towards this end, we propose a joint low-rank matrix factorization and latent linear regression approach. Experiments with data from the Genomics of Drug Sensitivity in Cancer database are included to show that the proposed method can predict drug-gene associations better than the state-of-the-art methods. △ Less

Submitted 30 October, 2018; v1 submitted 29 October, 2018; originally announced October 2018.

arXiv:1809.08353 [pdf, other]

Coupled Graphs and Tensor Factorization for Recommender Systems and Community Detection

Authors: Vassilis N. Ioannidis, Ahmed S. Zamzam, Georgios B. Giannakis, Nicholas D. Sidiropoulos

Abstract: Joint analysis of data from multiple information repositories facilitates uncovering the underlying structure in heterogeneous datasets. Single and coupled matrix-tensor factorization (CMTF) has been widely used in this context for imputation-based recommendation from ratings, social network, and other user-item data. When this side information is in the form of item-item correlation matrices or g… ▽ More Joint analysis of data from multiple information repositories facilitates uncovering the underlying structure in heterogeneous datasets. Single and coupled matrix-tensor factorization (CMTF) has been widely used in this context for imputation-based recommendation from ratings, social network, and other user-item data. When this side information is in the form of item-item correlation matrices or graphs, existing CMTF algorithms may fall short. Alleviating current limitations, we introduce a novel model coined coupled graph-tensor factorization (CGTF) that judiciously accounts for graph-related side information. The CGTF model has the potential to overcome practical challenges, such as missing slabs from the tensor and/or missing rows/columns from the correlation matrices. A novel alternating direction method of multipliers (ADMM) is also developed that recovers the nonnegative factors of CGTF. Our algorithm enjoys closed-form updates that result in reduced computational complexity and allow for convergence claims. A novel direction is further explored by employing the interpretable factors to detect graph communities having the tensor as side information. The resulting community detection approach is successful even when some links in the graphs are missing. Results with real data sets corroborate the merits of the proposed methods relative to state-of-the-art competing factorization techniques in providing recommendations and detecting communities. △ Less

Submitted 30 May, 2019; v1 submitted 21 September, 2018; originally announced September 2018.

Comments: This paper is submitted to the IEEE Transactions on Knowledge and Data Engineering. A preliminary version of this work was accepted for presentation in the special track of GlobalSIP on Tensor Methods for Signal Processing and Machine Learning

arXiv:1807.01671 [pdf, other]

Data-Driven Learning-Based Optimization for Distribution System State Estimation

Authors: Ahmed S. Zamzam, Xiao Fu, Nicholas D. Sidiropoulos

Abstract: Distribution system state estimation (DSSE) is a core task for monitoring and control of distribution networks. Widely used algorithms such as Gauss-Netwon perform poorly with the limited number of measurements typically available for DSSE, often require many iterations to obtain reasonable results, and sometimes fail to converge. DSSE is a non-convex problem, and working with a limited number of… ▽ More Distribution system state estimation (DSSE) is a core task for monitoring and control of distribution networks. Widely used algorithms such as Gauss-Netwon perform poorly with the limited number of measurements typically available for DSSE, often require many iterations to obtain reasonable results, and sometimes fail to converge. DSSE is a non-convex problem, and working with a limited number of measurements further aggravate the situation, as indeterminacy induces multiple global (in addition to local) minima. Gauss-Newton is also known to be sensitive to initialization. Hence, the situation is far from ideal. It is therefore natural to ask if there is a smart way of initializing Gauss-Newton that will avoid these DSSE-specific pitfalls. This paper proposes using historical or simulation-derived data to train a shallow neural network to `learn to initialize' -- that is, map the available measurements to a point in the neighborhood of the true latent states (network voltages), which is used to initialize Gauss-Newton. It is shown that this hybrid machine learning / optimization approach yields superior performance in terms of stability, accuracy, and runtime efficiency, compared to conventional optimization-only approaches. It is also shown that judicious design of the neural network training cost function helps to improve the overall DSSE performance. △ Less

Submitted 22 March, 2019; v1 submitted 4 July, 2018; originally announced July 2018.

Comments: 13 pages, 6 figures, 5 tables

arXiv:1805.02223 [pdf, other]

doi 10.1109/TSP.2018.2873506

Tensor-Based Channel Estimation for Dual-Polarized Massive MIMO Systems

Authors: Cheng Qian, Xiao Fu, Nicholas D. Sidiropoulos, Ye Yang

Abstract: The 3GPP suggests to combine dual polarized (DP) antenna arrays with the double directional (DD) channel model for downlink channel estimation. This combination strikes a good balance between high-capacity communications and parsimonious channel modeling, and also brings limited feedback schemes for downlink channel state information within reach---since such channel can be fully characterized by… ▽ More The 3GPP suggests to combine dual polarized (DP) antenna arrays with the double directional (DD) channel model for downlink channel estimation. This combination strikes a good balance between high-capacity communications and parsimonious channel modeling, and also brings limited feedback schemes for downlink channel state information within reach---since such channel can be fully characterized by several key parameters. However, most existing channel estimation work under the DD model has not yet considered DP arrays, perhaps because of the complex array manifold and the resulting difficulty in algorithm design. In this paper, we first reveal that the DD channel with DP arrays at the transmitter and receiver can be naturally modeled as a low-rank tensor, and thus the key parameters of the channel can be effectively estimated via tensor decomposition algorithms. On the theory side, we show that the DD-DP parameters are identifiable under very mild conditions, by leveraging identifiability of low-rank tensors. Furthermore, a compressed tensor decomposition algorithm is developed for alleviating the downlink training overhead. We show that, by using judiciously designed pilot structure, the channel parameters are still guaranteed to be identified via the compressed tensor decomposition formulation even when the size of the pilot sequence is much smaller than what is needed for conventional channel identification methods, such as linear least squares and matched filtering. Numerical simulations are presented to showcase the effectiveness of the proposed methods. △ Less

Submitted 26 October, 2018; v1 submitted 6 May, 2018; originally announced May 2018.

Comments: matlab code is available at: https://www.mathworks.com/matlabcentral/fileexchange/69176-tensor-based-channel-estimation-for-dual-polarized-mimo

arXiv:1804.08806 [pdf, other]

doi 10.1109/TSP.2018.2878544

Structured SUMCOR Multiview Canonical Correlation Analysis for Large-Scale Data

Authors: Charilaos I. Kanatsoulis, Xiao Fu, Nicholas D. Sidiropoulos, Mingyi Hong

Abstract: The sum-of-correlations (SUMCOR) formulation of generalized canonical correlation analysis (GCCA) seeks highly correlated low-dimensional representations of different views via maximizing pairwise latent similarity of the views. SUMCOR is considered arguably the most natural extension of classical two-view CCA to the multiview case, and thus has numerous applications in signal processing and data… ▽ More The sum-of-correlations (SUMCOR) formulation of generalized canonical correlation analysis (GCCA) seeks highly correlated low-dimensional representations of different views via maximizing pairwise latent similarity of the views. SUMCOR is considered arguably the most natural extension of classical two-view CCA to the multiview case, and thus has numerous applications in signal processing and data analytics. Recent work has proposed effective algorithms for handling the SUMCOR problem at very large scale. However, the existing scalable algorithms cannot incorporate structural regularization and prior information -- which are critical for good performance in real-world applications. In this work, we propose a new computational framework for large-scale SUMCOR GCCA that can easily incorporate a suite of structural regularizers which are frequently used in data analytics. The updates of the proposed algorithm are lightweight and the memory complexity is also low. In addition, the proposed algorithm can be readily implemented in a parallel fashion. We show that the proposed algorithm converges to a Karush-Kuhn-Tucker (KKT) point of the regularized SUMCOR problem. Judiciously designed simulations and real-data experiments are employed to demonstrate the effectiveness of the proposed algorithm. △ Less

Submitted 23 April, 2018; originally announced April 2018.

arXiv:1804.05307 [pdf, other]

doi 10.1109/TSP.2018.2876362

Hyperspectral Super-Resolution: A Coupled Tensor Factorization Approach

Authors: Charilaos I. Kanatsoulis, Xiao Fu, Nicholas D. Sidiropoulos, Wing-Kin Ma

Abstract: Hyperspectral super-resolution refers to the problem of fusing a hyperspectral image (HSI) and a multispectral image (MSI) to produce a super-resolution image (SRI) that has fine spatial and spectral resolution. State-of-the-art methods approach the problem via low-rank matrix approximations to the matricized HSI and MSI. These methods are effective to some extent, but a number of challenges remai… ▽ More Hyperspectral super-resolution refers to the problem of fusing a hyperspectral image (HSI) and a multispectral image (MSI) to produce a super-resolution image (SRI) that has fine spatial and spectral resolution. State-of-the-art methods approach the problem via low-rank matrix approximations to the matricized HSI and MSI. These methods are effective to some extent, but a number of challenges remain. First, HSIs and MSIs are naturally third-order tensors (data "cubes") and thus matricization is prone to loss of structural information--which could degrade performance. Second, it is unclear whether or not these low-rank matrix-based fusion strategies can guarantee identifiability or exact recovery of the SRI. However, identifiability plays a pivotal role in estimation problems and usually has a significant impact on performance in practice. Third, the majority of the existing methods assume that there are known (or easily estimated) degradation operators applied to the SRI to form the corresponding HSI and MSI--which is hardly the case in practice. In this work, we propose to tackle the super-resolution problem from a tensor perspective. Specifically, we utilize the multidimensional structure of the HSI and MSI to propose a coupled tensor factorization framework that can effectively overcome the aforementioned issues. The proposed approach guarantees the identifiability of the SRI under mild and realistic conditions. Furthermore, it works with little knowledge of the degradation operators, which is clearly an advantage over the existing methods. Semi-real numerical experiments are included to show the effectiveness of the proposed approach. △ Less

Submitted 22 April, 2018; v1 submitted 15 April, 2018; originally announced April 2018.

arXiv:1803.04261 [pdf, other]

Tensor-Based Parameter Estimation of Double Directional Massive MIMO Channel with Dual-Polarized Antennas

Authors: Cheng Qian, Xiao Fu, Nicholas D. Sidiropoulos, Ye Yang

Abstract: The 3GPP suggests to combine dual polarized (DP) antenna arrays with the double directional (DD) channel model for downlink channel estimation. This combination strikes a good balance between high-capacity communications and parsimonious channel modeling, and also brings limited feedback schemes for downlink channel estimation within reach. However, most existing channel estimation work under the… ▽ More The 3GPP suggests to combine dual polarized (DP) antenna arrays with the double directional (DD) channel model for downlink channel estimation. This combination strikes a good balance between high-capacity communications and parsimonious channel modeling, and also brings limited feedback schemes for downlink channel estimation within reach. However, most existing channel estimation work under the DD model has not considered DP arrays, perhaps because of the complex array manifold and the resulting difficulty in algorithm design. In this paper, we first reveal that the DD channel with DP arrays at the transmitter and receiver can be naturally modeled as a low-rank four-way tensor, and thus the parameters can be effectively estimated via tensor decomposition algorithms. To reduce computational complexity, we show that the problem can be recast as a four-snapshot three-dimensional harmonic retrieval problem, which can be solved using computationally efficient subspace methods. On the theory side, we show that the DD channel with DP arrays is identifiable under very mild conditions, leveraging identifiability of low-rank tensors. Numerical simulations are employed to showcase the effectiveness of our methods. △ Less

Submitted 3 March, 2018; originally announced March 2018.

Comments: 5 pages, 2 figures, conference

arXiv:1803.01257 [pdf, other]

doi 10.1109/MSP.2018.2877582

Nonnegative Matrix Factorization for Signal and Data Analytics: Identifiability, Algorithms, and Applications

Authors: Xiao Fu, Kejun Huang, Nicholas D. Sidiropoulos, Wing-Kin Ma

Abstract: Nonnegative matrix factorization (NMF) has become a workhorse for signal and data analytics, triggered by its model parsimony and interpretability. Perhaps a bit surprisingly, the understanding to its model identifiability---the major reason behind the interpretability in many applications such as topic mining and hyperspectral imaging---had been rather limited until recent years. Beginning from t… ▽ More Nonnegative matrix factorization (NMF) has become a workhorse for signal and data analytics, triggered by its model parsimony and interpretability. Perhaps a bit surprisingly, the understanding to its model identifiability---the major reason behind the interpretability in many applications such as topic mining and hyperspectral imaging---had been rather limited until recent years. Beginning from the 2010s, the identifiability research of NMF has progressed considerably: Many interesting and important results have been discovered by the signal processing (SP) and machine learning (ML) communities. NMF identifiability has a great impact on many aspects in practice, such as ill-posed formulation avoidance and performance-guaranteed algorithm design. On the other hand, there is no tutorial paper that introduces NMF from an identifiability viewpoint. In this paper, we aim at filling this gap by offering a comprehensive and deep tutorial on model identifiability of NMF as well as the connections to algorithms and applications. This tutorial will help researchers and graduate students grasp the essence and insights of NMF, thereby avoiding typical `pitfalls' that are often times due to unidentifiable NMF formulations. This paper will also help practitioners pick/design suitable factorization tools for their own problems. △ Less

Submitted 16 November, 2018; v1 submitted 3 March, 2018; originally announced March 2018.

Comments: accepted version, IEEE Signal Processing Magazine; supplementary materials added. Some minor revisions implemented

arXiv:1803.00678 [pdf, ps, other]

Mirror-Prox SCA Algorithm for Multicast Beamforming and Antenna Selection

Authors: Mohamed S. Ibrahim, Aritra Konar, Mingyi Hong, Nicholas D. Sidiropoulos

Abstract: This paper considers the (NP-)hard problem of joint multicast beamforming and antenna selection. Prior work has focused on using Semi-Definite relaxation (SDR) techniques in an attempt to obtain a high quality sub-optimal solution. However, SDR suffers from the drawback of having high computational complexity, as SDR lifts the problem to higher dimensional space, effectively squaring the number of… ▽ More This paper considers the (NP-)hard problem of joint multicast beamforming and antenna selection. Prior work has focused on using Semi-Definite relaxation (SDR) techniques in an attempt to obtain a high quality sub-optimal solution. However, SDR suffers from the drawback of having high computational complexity, as SDR lifts the problem to higher dimensional space, effectively squaring the number of variables. This paper proposes a high performance, low complexity Successive Convex Approximation (SCA) algorithm for max-min SNR "fair" joint multicast beamforming and antenna selection under a sum power constraint. The proposed approach relies on iteratively approximating the non-convex objective with a series of non-smooth convex subproblems, and then, a first order-based method called Saddle Point Mirror-Prox (SP-MP) is used to compute optimal solutions for each SCA subproblem. Simulations reveal that the SP-MP SCA algorithm provides a higher quality and lower complexity solution compared to the one obtained using SDR. △ Less

Submitted 1 March, 2018; originally announced March 2018.

Comments: 6 pages, 3 figures

arXiv:1802.06894 [pdf, ps, other]

Learning Hidden Markov Models from Pairwise Co-occurrences with Application to Topic Modeling

Authors: Kejun Huang, Xiao Fu, Nicholas D. Sidiropoulos

Abstract: We present a new algorithm for identifying the transition and emission probabilities of a hidden Markov model (HMM) from the emitted data. Expectation-maximization becomes computationally prohibitive for long observation records, which are often required for identification. The new algorithm is particularly suitable for cases where the available sample size is large enough to accurately estimate s… ▽ More We present a new algorithm for identifying the transition and emission probabilities of a hidden Markov model (HMM) from the emitted data. Expectation-maximization becomes computationally prohibitive for long observation records, which are often required for identification. The new algorithm is particularly suitable for cases where the available sample size is large enough to accurately estimate second-order output probabilities, but not higher-order ones. We show that if one is only able to obtain a reliable estimate of the pairwise co-occurrence probabilities of the emissions, it is still possible to uniquely identify the HMM if the emission probability is \emph{sufficiently scattered}. We apply our method to hidden topic Markov modeling, and demonstrate that we can learn topics with higher quality if documents are modeled as observations of HMMs sharing the same emission (topic) probability, compared to the simple but widely used bag-of-words model. △ Less

Submitted 18 June, 2018; v1 submitted 19 February, 2018; originally announced February 2018.

Comments: ICML 2018

arXiv:1712.10085 [pdf, ps, other]

doi 10.1109/TSP.2018.2865412

Limited Feedback Channel Estimation in Massive MIMO with Non-uniform Directional Dictionaries

Authors: Panos N. Alevizos, Xiao Fu, Nicholas D. Sidiropoulos, Yang Ye, Aggelos Bletsas

Abstract: Channel state information (CSI) at the base station (BS) is crucial to achieve beamforming and multiplexing gains in multiple-input multiple-output (MIMO) systems. State-of-the-art limited feedback schemes require feedback overhead that scales linearly with the number of BS antennas, which is prohibitive for $5$G massive MIMO. This work proposes novel limited feedback algorithms that lift this bur… ▽ More Channel state information (CSI) at the base station (BS) is crucial to achieve beamforming and multiplexing gains in multiple-input multiple-output (MIMO) systems. State-of-the-art limited feedback schemes require feedback overhead that scales linearly with the number of BS antennas, which is prohibitive for $5$G massive MIMO. This work proposes novel limited feedback algorithms that lift this burden by exploiting the inherent sparsity in double directional (DD) MIMO channel representation using overcomplete dictionaries. These dictionaries are associated with angle of arrival (AoA) and angle of departure (AoD) that specifically account for antenna directivity patterns at both ends of the link. The proposed algorithms achieve satisfactory channel estimation accuracy using a small number of feedback bits, even when the number of transmit antennas at the BS is large -- making them ideal for $5$G massive MIMO. Judicious simulations reveal that they outperform a number of popular feedback schemes, and underscore the importance of using angle dictionaries matching the given antenna directivity patterns, as opposed to uniform dictionaries. The proposed algorithms are lightweight in terms of computation, especially on the user equipment side, making them ideal for actual deployment in $5$G systems. △ Less

Submitted 7 August, 2018; v1 submitted 28 December, 2017; originally announced December 2017.

arXiv:1712.00205 [pdf, other]

doi 10.1109/TSP.2018.2862383

Tensors, Learning, and 'Kolmogorov Extension' for Finite-alphabet Random Vectors

Authors: Nikos Kargas, Nicholas D. Sidiropoulos, Xiao Fu

Abstract: Estimating the joint probability mass function (PMF) of a set of random variables lies at the heart of statistical learning and signal processing. Without structural assumptions, such as modeling the variables as a Markov chain, tree, or other graphical model, joint PMF estimation is often considered mission impossible - the number of unknowns grows exponentially with the number of variables. But… ▽ More Estimating the joint probability mass function (PMF) of a set of random variables lies at the heart of statistical learning and signal processing. Without structural assumptions, such as modeling the variables as a Markov chain, tree, or other graphical model, joint PMF estimation is often considered mission impossible - the number of unknowns grows exponentially with the number of variables. But who gives us the structural model? Is there a generic, `non-parametric' way to control joint PMF complexity without relying on a priori structural assumptions regarding the underlying probability model? Is it possible to discover the operational structure without biasing the analysis up front? What if we only observe random subsets of the variables, can we still reliably estimate the joint PMF of all? This paper shows, perhaps surprisingly, that if the joint PMF of any three variables can be estimated, then the joint PMF of all the variables can be provably recovered under relatively mild conditions. The result is reminiscent of Kolmogorov's extension theorem - consistent specification of lower-dimensional distributions induces a unique probability measure for the entire process. The difference is that for processes of limited complexity (rank of the high-dimensional PMF) it is possible to obtain complete characterization from only three-dimensional distributions. In fact not all three-dimensional PMFs are needed; and under more stringent conditions even two-dimensional will do. Exploiting multilinear algebra, this paper proves that such higher-dimensional PMF completion can be guaranteed - several pertinent identifiability results are derived. It also provides a practical and efficient algorithm to carry out the recovery task. Judiciously designed simulations and real-data experiments on movie recommendation and data classification are presented to showcase the effectiveness of the approach. △ Less

Submitted 27 July, 2018; v1 submitted 1 December, 2017; originally announced December 2017.

arXiv:1711.07925 [pdf, ps, other]

Kullback-Leibler Principal Component for Tensors is not NP-hard

Authors: Kejun Huang, Nicholas D. Sidiropoulos

Abstract: We study the problem of nonnegative rank-one approximation of a nonnegative tensor, and show that the globally optimal solution that minimizes the generalized Kullback-Leibler divergence can be efficiently obtained, i.e., it is not NP-hard. This result works for arbitrary nonnegative tensors with an arbitrary number of modes (including two, i.e., matrices). We derive a closed-form expression for t… ▽ More We study the problem of nonnegative rank-one approximation of a nonnegative tensor, and show that the globally optimal solution that minimizes the generalized Kullback-Leibler divergence can be efficiently obtained, i.e., it is not NP-hard. This result works for arbitrary nonnegative tensors with an arbitrary number of modes (including two, i.e., matrices). We derive a closed-form expression for the KL principal component, which is easy to compute and has an intuitive probabilistic interpretation. For generalized KL approximation with higher ranks, the problem is for the first time shown to be equivalent to multinomial latent variable modeling, and an iterative algorithm is derived that resembles the expectation-maximization algorithm. On the Iris dataset, we showcase how the derived results help us learn the model in an \emph{unsupervised} manner, and obtain strikingly close performance to that from supervised methods. △ Less

Submitted 21 November, 2017; originally announced November 2017.

Comments: Asilomar 2017

arXiv:1711.07441 [pdf, other]

On Convergence of Epanechnikov Mean Shift

Authors: Kejun Huang, Xiao Fu, Nicholas D. Sidiropoulos

Abstract: Epanechnikov Mean Shift is a simple yet empirically very effective algorithm for clustering. It localizes the centroids of data clusters via estimating modes of the probability distribution that generates the data points, using the `optimal' Epanechnikov kernel density estimator. However, since the procedure involves non-smooth kernel density functions, the convergence behavior of Epanechnikov mea… ▽ More Epanechnikov Mean Shift is a simple yet empirically very effective algorithm for clustering. It localizes the centroids of data clusters via estimating modes of the probability distribution that generates the data points, using the `optimal' Epanechnikov kernel density estimator. However, since the procedure involves non-smooth kernel density functions, the convergence behavior of Epanechnikov mean shift lacks theoretical support as of this writing---most of the existing analyses are based on smooth functions and thus cannot be applied to Epanechnikov Mean Shift. In this work, we first show that the original Epanechnikov Mean Shift may indeed terminate at a non-critical point, due to the non-smoothness nature. Based on our analysis, we propose a simple remedy to fix it. The modified Epanechnikov Mean Shift is guaranteed to terminate at a local maximum of the estimated density, which corresponds to a cluster centroid, within a finite number of iterations. We also propose a way to avoid running the Mean Shift iterates from every data point, while maintaining good clustering accuracies under non-overlap** spherical Gaussian mixture models. This further pushes Epanechnikov Mean Shift to handle very large and high-dimensional data sets. Experiments show surprisingly good performance compared to the Lloyd's K-means algorithm and the EM algorithm. △ Less

Submitted 20 November, 2017; originally announced November 2017.

Comments: AAAI 2018

arXiv:1709.00614 [pdf, other]

doi 10.1109/LSP.2018.2789405

On Identifiability of Nonnegative Matrix Factorization

Authors: Xiao Fu, Kejun Huang, Nicholas D. Sidiropoulos

Abstract: In this letter, we propose a new identification criterion that guarantees the recovery of the low-rank latent factors in the nonnegative matrix factorization (NMF) model, under mild conditions. Specifically, using the proposed criterion, it suffices to identify the latent factors if the rows of one factor are \emph{sufficiently scattered} over the nonnegative orthant, while no structural assumptio… ▽ More In this letter, we propose a new identification criterion that guarantees the recovery of the low-rank latent factors in the nonnegative matrix factorization (NMF) model, under mild conditions. Specifically, using the proposed criterion, it suffices to identify the latent factors if the rows of one factor are \emph{sufficiently scattered} over the nonnegative orthant, while no structural assumption is imposed on the other factor except being full-rank. This is by far the mildest condition under which the latent factors are provably identifiable from the NMF model. △ Less

Submitted 2 September, 2017; originally announced September 2017.

Showing 1–50 of 71 results for author: Sidiropoulos, N D