-
Characteristic Neural Ordinary Differential Equations
Authors:
Xingzi Xu,
Ali Hasan,
Khalil Elkhalil,
Jie Ding,
Vahid Tarokh
Abstract:
We propose Characteristic-Neural Ordinary Differential Equations (C-NODEs), a framework for extending Neural Ordinary Differential Equations (NODEs) beyond ODEs. While NODEs model the evolution of a latent variables as the solution to an ODE, C-NODE models the evolution of the latent variables as the solution of a family of first-order quasi-linear partial differential equations (PDEs) along curve…
▽ More
We propose Characteristic-Neural Ordinary Differential Equations (C-NODEs), a framework for extending Neural Ordinary Differential Equations (NODEs) beyond ODEs. While NODEs model the evolution of a latent variables as the solution to an ODE, C-NODE models the evolution of the latent variables as the solution of a family of first-order quasi-linear partial differential equations (PDEs) along curves on which the PDEs reduce to ODEs, referred to as characteristic curves. This in turn allows the application of the standard frameworks for solving ODEs, namely the adjoint method. Learning optimal characteristic curves for given tasks improves the performance and computational efficiency, compared to state of the art NODE models. We prove that the C-NODE framework extends the classical NODE on classification tasks by demonstrating explicit C-NODE representable functions not expressible by NODEs. Additionally, we present C-NODE-based continuous normalizing flows, which describe the density evolution of latent variables along multiple dimensions. Empirical results demonstrate the improvements provided by the proposed method for classification and density estimation on CIFAR-10, SVHN, and MNIST datasets under a similar computational budget as the existing NODE methods. The results also provide empirical evidence that the learned curves improve the efficiency of the system through a lower number of parameters and function evaluations compared with baselines.
△ Less
Submitted 9 November, 2022; v1 submitted 25 November, 2021;
originally announced November 2021.
-
Generative Archimedean Copulas
Authors:
Yuting Ng,
Ali Hasan,
Khalil Elkhalil,
Vahid Tarokh
Abstract:
We propose a new generative modeling technique for learning multidimensional cumulative distribution functions (CDFs) in the form of copulas. Specifically, we consider certain classes of copulas known as Archimedean and hierarchical Archimedean copulas, popular for their parsimonious representation and ability to model different tail dependencies. We consider their representation as mixture models…
▽ More
We propose a new generative modeling technique for learning multidimensional cumulative distribution functions (CDFs) in the form of copulas. Specifically, we consider certain classes of copulas known as Archimedean and hierarchical Archimedean copulas, popular for their parsimonious representation and ability to model different tail dependencies. We consider their representation as mixture models with Laplace transforms of latent random variables from generative neural networks. This alternative representation allows for computational efficiencies and easy sampling, especially in high dimensions. We describe multiple methods for optimizing the network parameters. Finally, we present empirical results that demonstrate the efficacy of our proposed method in learning multidimensional CDFs and its computational efficiency compared to existing methods.
△ Less
Submitted 10 June, 2021; v1 submitted 22 February, 2021;
originally announced February 2021.
-
Modeling Extremes with d-max-decreasing Neural Networks
Authors:
Ali Hasan,
Khalil Elkhalil,
Yuting Ng,
Joao M. Pereira,
Sina Farsiu,
Jose H. Blanchet,
Vahid Tarokh
Abstract:
We propose a novel neural network architecture that enables non-parametric calibration and generation of multivariate extreme value distributions (MEVs). MEVs arise from Extreme Value Theory (EVT) as the necessary class of models when extrapolating a distributional fit over large spatial and temporal scales based on data observed in intermediate scales. In turn, EVT dictates that $d$-max-decreasin…
▽ More
We propose a novel neural network architecture that enables non-parametric calibration and generation of multivariate extreme value distributions (MEVs). MEVs arise from Extreme Value Theory (EVT) as the necessary class of models when extrapolating a distributional fit over large spatial and temporal scales based on data observed in intermediate scales. In turn, EVT dictates that $d$-max-decreasing, a stronger form of convexity, is an essential shape constraint in the characterization of MEVs. As far as we know, our proposed architecture provides the first class of non-parametric estimators for MEVs that preserve these essential shape constraints. We show that our architecture approximates the dependence structure encoded by MEVs at parametric rate. Moreover, we present a new method for sampling high-dimensional MEVs using a generative model. We demonstrate our methodology on a wide range of experimental settings, ranging from environmental sciences to financial mathematics and verify that the structural properties of MEVs are retained compared to existing methods.
△ Less
Submitted 1 March, 2022; v1 submitted 17 February, 2021;
originally announced February 2021.
-
Fisher Auto-Encoders
Authors:
Khalil Elkhalil,
Ali Hasan,
Jie Ding,
Sina Farsiu,
Vahid Tarokh
Abstract:
It has been conjectured that the Fisher divergence is more robust to model uncertainty than the conventional Kullback-Leibler (KL) divergence. This motivates the design of a new class of robust generative auto-encoders (AE) referred to as Fisher auto-encoders. Our approach is to design Fisher AEs by minimizing the Fisher divergence between the intractable joint distribution of observed data and la…
▽ More
It has been conjectured that the Fisher divergence is more robust to model uncertainty than the conventional Kullback-Leibler (KL) divergence. This motivates the design of a new class of robust generative auto-encoders (AE) referred to as Fisher auto-encoders. Our approach is to design Fisher AEs by minimizing the Fisher divergence between the intractable joint distribution of observed data and latent variables, with that of the postulated/modeled joint distribution. In contrast to KL-based variational AEs (VAEs), the Fisher AE can exactly quantify the distance between the true and the model-based posterior distributions. Qualitative and quantitative results are provided on both MNIST and celebA datasets demonstrating the competitive performance of Fisher AEs in terms of robustness compared to other AEs such as VAEs and Wasserstein AEs.
△ Less
Submitted 23 October, 2020; v1 submitted 12 July, 2020;
originally announced July 2020.
-
Improved Design of Quadratic Discriminant Analysis Classifier in Unbalanced Settings
Authors:
Amine Bejaoui,
Khalil Elkhalil,
Abla Kammoun,
Mohamed Slim Alouni,
Tarek Al-Naffouri
Abstract:
The use of quadratic discriminant analysis (QDA) or its regularized version (R-QDA) for classification is often not recommended, due to its well-acknowledged high sensitivity to the estimation noise of the covariance matrix. This becomes all the more the case in unbalanced data settings for which it has been found that R-QDA becomes equivalent to the classifier that assigns all observations to the…
▽ More
The use of quadratic discriminant analysis (QDA) or its regularized version (R-QDA) for classification is often not recommended, due to its well-acknowledged high sensitivity to the estimation noise of the covariance matrix. This becomes all the more the case in unbalanced data settings for which it has been found that R-QDA becomes equivalent to the classifier that assigns all observations to the same class. In this paper, we propose an improved R-QDA that is based on the use of two regularization parameters and a modified bias, properly chosen to avoid inappropriate behaviors of R-QDA in unbalanced settings and to ensure the best possible classification performance. The design of the proposed classifier builds on a refined asymptotic analysis of its performance when the number of samples and that of features grow large simultaneously, which allows to cope efficiently with the high-dimensionality frequently met within the big data paradigm. The performance of the proposed classifier is assessed on both real and synthetic data sets and was shown to be much better than what one would expect from a traditional R-QDA.
△ Less
Submitted 14 September, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Risk Convergence of Centered Kernel Ridge Regression with Large Dimensional Data
Authors:
Khalil Elkhalil,
Abla Kammoun,
Xiangliang Zhang,
Mohamed-Slim Alouini,
Tareq Al-Naffouri
Abstract:
This paper carries out a large dimensional analysis of a variation of kernel ridge regression that we call \emph{centered kernel ridge regression} (CKRR), also known in the literature as kernel ridge regression with offset. This modified technique is obtained by accounting for the bias in the regression problem resulting in the old kernel ridge regression but with \emph{centered} kernels. The anal…
▽ More
This paper carries out a large dimensional analysis of a variation of kernel ridge regression that we call \emph{centered kernel ridge regression} (CKRR), also known in the literature as kernel ridge regression with offset. This modified technique is obtained by accounting for the bias in the regression problem resulting in the old kernel ridge regression but with \emph{centered} kernels. The analysis is carried out under the assumption that the data is drawn from a Gaussian distribution and heavily relies on tools from random matrix theory (RMT). Under the regime in which the data dimension and the training size grow infinitely large with fixed ratio and under some mild assumptions controlling the data statistics, we show that both the empirical and the prediction risks converge to a deterministic quantities that describe in closed form fashion the performance of CKRR in terms of the data statistics and dimensions. Inspired by this theoretical result, we subsequently build a consistent estimator of the prediction risk based on the training data which allows to optimally tune the design parameters. A key insight of the proposed analysis is the fact that asymptotically a large class of kernels achieve the same minimum prediction risk. This insight is validated with both synthetic and real data.
△ Less
Submitted 19 April, 2019;
originally announced April 2019.
-
Numerically Stable Evaluation of Moments of Random Gram Matrices with Applications
Authors:
Khalil Elkhalil,
Abla Kammoun,
Tareq Y. Al-Naffouri,
Mohamed-Slim Alouini
Abstract:
This paper is focuses on the computation of the positive moments of one-side correlated random Gram matrices. Closed-form expressions for the moments can be obtained easily, but numerical evaluation thereof is prone to numerical stability, especially in high-dimensional settings. This letter provides a numerically stable method that efficiently computes the positive moments in closed-form. The dev…
▽ More
This paper is focuses on the computation of the positive moments of one-side correlated random Gram matrices. Closed-form expressions for the moments can be obtained easily, but numerical evaluation thereof is prone to numerical stability, especially in high-dimensional settings. This letter provides a numerically stable method that efficiently computes the positive moments in closed-form. The developed expressions are more accurate and can lead to higher accuracy levels when fed to moment based-approaches. As an application, we show how the obtained moments can be used to approximate the marginal distribution of the eigenvalues of random Gram matrices.
△ Less
Submitted 8 January, 2017;
originally announced January 2017.
-
Blind Measurement Selection: A Random Matrix Theory Approach
Authors:
Khalil Elkhalil,
Abla Kammoun,
Tareq Y. Al-Naffouri,
Mohamed-Slim Alouini
Abstract:
This paper considers the problem of selecting a set of $k$ measurements from $n$ available sensor observations. The selected measurements should minimize a certain error function assessing the error in estimating a certain $m$ dimensional parameter vector. The exhaustive search inspecting each of the $n\choose k$ possible choices would require a very high computational complexity and as such is no…
▽ More
This paper considers the problem of selecting a set of $k$ measurements from $n$ available sensor observations. The selected measurements should minimize a certain error function assessing the error in estimating a certain $m$ dimensional parameter vector. The exhaustive search inspecting each of the $n\choose k$ possible choices would require a very high computational complexity and as such is not practical for large $n$ and $k$. Alternative methods with low complexity have recently been investigated but their main drawbacks are that 1) they require perfect knowledge of the measurement matrix and 2) they need to be applied at the pace of change of the measurement matrix. To overcome these issues, we consider the asymptotic regime in which $k$, $n$ and $m$ grow large at the same pace. Tools from random matrix theory are then used to approximate in closed-form the most important error measures that are commonly used. The asymptotic approximations are then leveraged to select properly $k$ measurements exhibiting low values for the asymptotic error measures. Two heuristic algorithms are proposed: the first one merely consists in applying the convex optimization artifice to the asymptotic error measure. The second algorithm is a low-complexity greedy algorithm that attempts to look for a sufficiently good solution for the original minimization problem. The greedy algorithm can be applied to both the exact and the asymptotic error measures and can be thus implemented in blind and channel-aware fashions. We present two potential applications where the proposed algorithms can be used, namely antenna selection for uplink transmissions in large scale multi-user systems and sensor selection for wireless sensor networks. Numerical results are also presented and sustain the efficiency of the proposed blind methods in reaching the performances of channel-aware algorithms.
△ Less
Submitted 14 December, 2016;
originally announced December 2016.
-
Fluctuations of the SNR at the output of the MVDR with Regularized Tyler Estimators
Authors:
Khalil Elkhalil,
Abla Kammoun,
Tareq Y. Al-Naffouri,
Mohamed-Slim Alouini
Abstract:
This paper analyzes the statistical properties of the signal-to-noise ratio (SNR) at the output of the Capon's minimum variance distortionless response (MVDR) beamformers when operating over impulsive noises. Particularly, we consider the supervised case in which the receiver employs the regularized Tyler estimator in order to estimate the covariance matrix of the interference-plus-noise process u…
▽ More
This paper analyzes the statistical properties of the signal-to-noise ratio (SNR) at the output of the Capon's minimum variance distortionless response (MVDR) beamformers when operating over impulsive noises. Particularly, we consider the supervised case in which the receiver employs the regularized Tyler estimator in order to estimate the covariance matrix of the interference-plus-noise process using $n$ observations of size $N\times 1$. The choice for the regularized Tylor estimator (RTE) is motivated by its resilience to the presence of outliers and its regularization parameter that guarantees a good conditioning of the covariance estimate. Of particular interest in this paper is the derivation of the second order statistics of the SINR. To achieve this goal, we consider two different approaches. The first one is based on considering the classical regime, referred to as the $n$-large regime, in which $N$ is assumed to be fixed while $n$ grows to infinity. The second approach is built upon recent results developed within the framework of random matrix theory and assumes that $N$ and $n$ grow large together. Numerical results are provided in order to compare between the accuracies of each regime under different settings.
△ Less
Submitted 15 May, 2016;
originally announced May 2016.
-
Analytical Derivation of the Inverse Moments of One-sided Correlated Gram Matrices with Applications
Authors:
Khalil Elkhalil,
Abla Kammoun,
Tareq Y Al-Naffouri,
Mohamed-Slim Alouini
Abstract:
This paper addresses the development of analytical tools for the computation of the moments of random Gram matrices with one side correlation. Such a question is mainly driven by applications in signal processing and wireless communications wherein such matrices naturally arise. In particular, we derive closed-form expressions for the inverse moments and show that the obtained results can help app…
▽ More
This paper addresses the development of analytical tools for the computation of the moments of random Gram matrices with one side correlation. Such a question is mainly driven by applications in signal processing and wireless communications wherein such matrices naturally arise. In particular, we derive closed-form expressions for the inverse moments and show that the obtained results can help approximate several performance metrics such as the average estimation error corresponding to the Best Linear Unbiased Estimator (BLUE) and the Linear Minimum Mean Square Error LMMSE or also other loss functions used to measure the accuracy of covariance matrix estimates.
△ Less
Submitted 2 July, 2015;
originally announced July 2015.
-
On the Feedback Reduction of Relay Aided Multiuser Networks using Compressive Sensing
Authors:
Khalil M. Elkhalil,
Mohammed E. Eltayeb,
Abla Kammoun,
Tareq Y. Al-Naffouri,
Hamid Reza Bahrami
Abstract:
In this paper, we propose a feedback reduction scheme for full-duplex relay-aided multiuser networks. The proposed scheme permits the base station (BS) to obtain channel state information (CSI) from a subset of strong users under substantially reduced feedback overhead. More specifically, we cast the problem of user identification and CSI estimation as a block sparse signal recovery problem in com…
▽ More
In this paper, we propose a feedback reduction scheme for full-duplex relay-aided multiuser networks. The proposed scheme permits the base station (BS) to obtain channel state information (CSI) from a subset of strong users under substantially reduced feedback overhead. More specifically, we cast the problem of user identification and CSI estimation as a block sparse signal recovery problem in compressive sensing (CS). Using existing CS block recovery algorithms, we first obtain the identity of the strong users and then estimate their CSI using the best linear unbiased estimator (BLUE). To minimize the effect of noise on the estimated CSI, we introduce a back-off strategy that optimally backs-off on the noisy estimated CSI and derive the error covariance matrix of the post-detection noise. In addition to this, we provide exact closed form expressions for the average maximum equivalent SNR at the destination user. Numerical results show that the proposed algorithm drastically reduces the feedback air-time and achieves a rate close to that obtained by scheduling schemes that require dedicated error-free feedback from all the network users.
△ Less
Submitted 4 May, 2015;
originally announced May 2015.