Search | arXiv e-print repository

Mitigating Data Injection Attacks on Federated Learning

Authors: Or Shalom, Amir Leshem, Waheed U. Bajwa

Abstract: Federated learning is a technique that allows multiple entities to collaboratively train models using their data without compromising data privacy. However, despite its advantages, federated learning can be susceptible to false data injection attacks. In these scenarios, a malicious entity with control over specific agents in the network can manipulate the learning process, leading to a suboptimal… ▽ More Federated learning is a technique that allows multiple entities to collaboratively train models using their data without compromising data privacy. However, despite its advantages, federated learning can be susceptible to false data injection attacks. In these scenarios, a malicious entity with control over specific agents in the network can manipulate the learning process, leading to a suboptimal model. Consequently, addressing these data injection attacks presents a significant research challenge in federated learning systems. In this paper, we propose a novel technique to detect and mitigate data injection attacks on federated learning systems. Our mitigation method is a local scheme, performed during a single instance of training by the coordinating node, allowing the mitigation during the convergence of the algorithm. Whenever an agent is suspected to be an attacker, its data will be ignored for a certain period, this decision will often be re-evaluated. We prove that with probability 1, after a finite time, all attackers will be ignored while the probability of ignoring a trustful agent becomes 0, provided that there is a majority of truthful agents. Simulations show that when the coordinating node detects and isolates all the attackers, the model recovers and converges to the truthful model. △ Less

Submitted 14 January, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: This work will be presented at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

arXiv:2308.02922 [pdf, other]

Structured Low-Rank Tensors for Generalized Linear Models

Authors: Batoul Taki, Anand D. Sarwate, Waheed U. Bajwa

Abstract: Recent works have shown that imposing tensor structures on the coefficient tensor in regression problems can lead to more reliable parameter estimation and lower sample complexity compared to vector-based methods. This work investigates a new low-rank tensor model, called Low Separation Rank (LSR), in Generalized Linear Model (GLM) problems. The LSR model -- which generalizes the well-known Tucker… ▽ More Recent works have shown that imposing tensor structures on the coefficient tensor in regression problems can lead to more reliable parameter estimation and lower sample complexity compared to vector-based methods. This work investigates a new low-rank tensor model, called Low Separation Rank (LSR), in Generalized Linear Model (GLM) problems. The LSR model -- which generalizes the well-known Tucker and CANDECOMP/PARAFAC (CP) models, and is a special case of the Block Tensor Decomposition (BTD) model -- is imposed onto the coefficient tensor in the GLM model. This work proposes a block coordinate descent algorithm for parameter estimation in LSR-structured tensor GLMs. Most importantly, it derives a minimax lower bound on the error threshold on estimating the coefficient tensor in LSR tensor GLM problems. The minimax bound is proportional to the intrinsic degrees of freedom in the LSR tensor GLM problem, suggesting that its sample complexity may be significantly lower than that of vectorized GLMs. This result can also be specialised to lower bound the estimation error in CP and Tucker-structured GLMs. The derived bounds are comparable to tight bounds in the literature for Tucker linear regression, and the tightness of the minimax lower bound is further assessed numerically. Finally, numerical experiments on synthetic datasets demonstrate the efficacy of the proposed LSR tensor model for three regression types (linear, logistic and Poisson). Experiments on a collection of medical imaging datasets demonstrate the usefulness of the LSR model over other tensor models (Tucker and CP) on real, imbalanced data with limited available samples. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Comments: 43 pages; published in Transactions on Machine Learning Research (08/2023)

Journal ref: Transactions on Machine Learning Research, Aug. 2023 (https://openreview.net/forum?id=qUxBs3Ln41)

arXiv:2307.07030 [pdf, other]

Accelerated gradient methods for nonconvex optimization: Escape trajectories from strict saddle points and convergence to local minima

Authors: Rishabh Dixit, Mert Gurbuzbalaban, Waheed U. Bajwa

Abstract: This paper considers the problem of understanding the behavior of a general class of accelerated gradient methods on smooth nonconvex functions. Motivated by some recent works that have proposed effective algorithms, based on Polyak's heavy ball method and the Nesterov accelerated gradient method, to achieve convergence to a local minimum of nonconvex functions, this work proposes a broad class of… ▽ More This paper considers the problem of understanding the behavior of a general class of accelerated gradient methods on smooth nonconvex functions. Motivated by some recent works that have proposed effective algorithms, based on Polyak's heavy ball method and the Nesterov accelerated gradient method, to achieve convergence to a local minimum of nonconvex functions, this work proposes a broad class of Nesterov-type accelerated methods and puts forth a rigorous study of these methods encompassing the escape from saddle-points and convergence to local minima through a both asymptotic and a non-asymptotic analysis. In the asymptotic regime, this paper answers an open question of whether Nesterov's accelerated gradient method (NAG) with variable momentum parameter avoids strict saddle points almost surely. This work also develops two metrics of asymptotic rate of convergence and divergence, and evaluates these two metrics for several popular standard accelerated methods such as the NAG, and Nesterov's accelerated gradient with constant momentum (NCM) near strict saddle points. In the local regime, this work provides an analysis that leads to the "linear" exit time estimates from strict saddle neighborhoods for trajectories of these accelerated methods as well the necessary conditions for the existence of such trajectories. Finally, this work studies a sub-class of accelerated methods that can converge in convex neighborhoods of nonconvex functions with a near optimal rate to a local minima and at the same time this sub-class offers superior saddle-escape behavior compared to that of NAG. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 107 pages, 10 figures; pre-print of a journal submission

arXiv:2211.09076 [pdf, other]

doi 10.1002/aisy.202300341

Programming Wireless Security through Learning-Aided Spatiotemporal Digital Coding Metamaterial Antenna

Authors: Alireza Nooraiepour, Shaghayegh Vosoughitabar, Chung-Tse Michael Wu, Waheed U. Bajwa, Narayan B. Mandayam

Abstract: The advancement of future large-scale wireless networks necessitates the development of cost-effective and scalable security solutions. Conventional cryptographic methods, due to their computational and key management complexity, are unable to fulfill the low-latency and scalability requirements of these networks. Physical layer (PHY) security has been put forth as a cost-effective alternative to… ▽ More The advancement of future large-scale wireless networks necessitates the development of cost-effective and scalable security solutions. Conventional cryptographic methods, due to their computational and key management complexity, are unable to fulfill the low-latency and scalability requirements of these networks. Physical layer (PHY) security has been put forth as a cost-effective alternative to cryptographic mechanisms that can circumvent the need for explicit key exchange between communication devices, owing to the fact that PHY security relies on the physics of the signal transmission for providing security. In this work, a space-time-modulated digitally-coded metamaterial (MTM) leaky wave antenna (LWA) is proposed that can enable PHY security by achieving the functionalities of directional modulation (DM) using a machine learning-aided branch and bound (B&B) optimized coding sequence. From the theoretical perspective, it is first shown that the proposed space-time MTM antenna architecture can achieve DM through both the spatial and spectral manipulation of the orthogonal frequency division multiplexing (OFDM) signal received by a user equipment. Simulation results are then provided as proof-of-principle, demonstrating the applicability of our approach for achieving DM in various communication settings. To further validate our simulation results, a prototype of the proposed architecture controlled by a field-programmable gate array (FPGA) is realized, which achieves DM via an optimized coding sequence carried out by the learning-aided branch-and-bound algorithm corresponding to the states of the MTM LWA's unit cells. Experimental results confirm the theory behind the space-time-modulated MTM LWA in achieving DM, which is observed via both the spectral harmonic patterns and bit error rate (BER) measurements. △ Less

Submitted 12 August, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Journal ref: Adv. Intell. Syst. 2300341 (2023)

arXiv:2108.12383 [pdf, other]

A Guide to Computational Reproducibility in Signal Processing and Machine Learning

Authors: Joseph Shenouda, Waheed U. Bajwa

Abstract: Computational reproducibility is a growing problem that has been extensively studied among computational researchers and within the signal processing and machine learning research community. However, with the changing landscape of signal processing and machine learning research come new obstacles and unseen challenges in creating reproducible experiments. Due to these new challenges most computati… ▽ More Computational reproducibility is a growing problem that has been extensively studied among computational researchers and within the signal processing and machine learning research community. However, with the changing landscape of signal processing and machine learning research come new obstacles and unseen challenges in creating reproducible experiments. Due to these new challenges most computational experiments have become difficult, if not impossible, to be reproduced by an independent researcher. In 2016 a survey conducted by the journal Nature found that 50% of researchers were unable to reproduce their own experiments. While the issue of computational reproducibility has been discussed in the literature and specifically within the signal processing community, it is still unclear to most researchers what are the best practices to ensure reproducibility without im**ing on their primary responsibility of conducting research. We feel that although researchers understand the importance of making experiments reproducible, the lack of a clear set of standards and tools makes it difficult to incorporate good reproducibility practices in most labs. It is in this regard that we aim to present signal processing researchers with a set of practical tools and strategies that can help mitigate many of the obstacles to producing reproducible computational experiments. △ Less

Submitted 15 February, 2022; v1 submitted 27 August, 2021; originally announced August 2021.

Comments: 20 pages; preprint of a magazine article

arXiv:2108.12373 [pdf, other]

doi 10.1109/TSP.2022.3229635

FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component Analysis

Authors: Arpita Gang, Waheed U. Bajwa

Abstract: Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning. While PCA is often thought of as a dimensionality reduction method, the purpose of PCA is actually two-fold: dimension reduction and uncorrelated feature learning. Furthermore, the enormity of the dimensions and sample size in the modern day datasets have rendered the centralized PCA solut… ▽ More Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning. While PCA is often thought of as a dimensionality reduction method, the purpose of PCA is actually two-fold: dimension reduction and uncorrelated feature learning. Furthermore, the enormity of the dimensions and sample size in the modern day datasets have rendered the centralized PCA solutions unusable. In that vein, this paper reconsiders the problem of PCA when data samples are distributed across nodes in an arbitrarily connected network. While a few solutions for distributed PCA exist, those either overlook the uncorrelated feature learning aspect of the PCA, tend to have high communication overhead that makes them inefficient and/or lack `exact' or `global' convergence guarantees. To overcome these aforementioned issues, this paper proposes a distributed PCA algorithm termed FAST-PCA (Fast and exAct diSTributed PCA). The proposed algorithm is efficient in terms of communication and is proven to converge linearly and exactly to the principal components, leading to dimension reduction as well as uncorrelated features. The claims are further supported by experimental results. △ Less

Submitted 15 February, 2022; v1 submitted 27 August, 2021; originally announced August 2021.

Comments: 16 pages (two-column version); substantially revised version, including expanded comparisons with other works

arXiv:2106.13436 [pdf, other]

doi 10.1109/OJSP.2021.3135254

A hybrid model-based and learning-based approach for classification using limited number of training samples

Authors: Alireza Nooraiepour, Waheed U. Bajwa, Narayan B. Mandayam

Abstract: The fundamental task of classification given a limited number of training data samples is considered for physical systems with known parametric statistical models. The standalone learning-based and statistical model-based classifiers face major challenges towards the fulfillment of the classification task using a small training set. Specifically, classifiers that solely rely on the physics-based s… ▽ More The fundamental task of classification given a limited number of training data samples is considered for physical systems with known parametric statistical models. The standalone learning-based and statistical model-based classifiers face major challenges towards the fulfillment of the classification task using a small training set. Specifically, classifiers that solely rely on the physics-based statistical models usually suffer from their inability to properly tune the underlying unobservable parameters, which leads to a mismatched representation of the system's behaviors. Learning-based classifiers, on the other hand, typically rely on a large number of training data from the underlying physical process, which might not be feasible in most practical scenarios. In this paper, a hybrid classification method -- termed HyPhyLearn -- is proposed that exploits both the physics-based statistical models and the learning-based classifiers. The proposed solution is based on the conjecture that HyPhyLearn would alleviate the challenges associated with the individual approaches of learning-based and statistical model-based classifiers by fusing their respective strengths. The proposed hybrid approach first estimates the unobservable model parameters using the available (suboptimal) statistical estimation procedures, and subsequently use the physics-based statistical models to generate synthetic data. Then, the training data samples are incorporated with the synthetic data in a learning-based classifier that is based on domain-adversarial training of neural networks. Specifically, in order to address the mismatch problem, the classifier learns a map** from the training data and the synthetic data to a common feature space. Simultaneously, the classifier is trained to find discriminative features within this space in order to fulfill the classification task. △ Less

Submitted 1 December, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: An extended version of the paper accepted in the open journal of signal processing. This version includes the following extra materials compared to the accepted paper: 1. Appendix D: Proof of Lemma 5, 2. Appendix E: Proof of Lemma 6, 3. Appendix F: A heuristic approach for channel parameter estimation for CDMA system

MSC Class: I.5 ACM Class: I.5

Journal ref: IEEE Open J. Signal Proc., vol. 3, pp. 49-70, Jan. 2022

arXiv:2105.14673 [pdf, ps, other]

doi 10.1109/IEEECONF53345.2021.9723149

A Minimax Lower Bound for Low-Rank Matrix-Variate Logistic Regression

Authors: Batoul Taki, Mohsen Ghassemi, Anand D. Sarwate, Waheed U. Bajwa

Abstract: This paper considers the problem of matrix-variate logistic regression. It derives the fundamental error threshold on estimating low-rank coefficient matrices in the logistic regression problem by obtaining a lower bound on the minimax risk. The bound depends explicitly on the dimension and distribution of the covariates, the rank and energy of the coefficient matrix, and the number of samples. Th… ▽ More This paper considers the problem of matrix-variate logistic regression. It derives the fundamental error threshold on estimating low-rank coefficient matrices in the logistic regression problem by obtaining a lower bound on the minimax risk. The bound depends explicitly on the dimension and distribution of the covariates, the rank and energy of the coefficient matrix, and the number of samples. The resulting bound is proportional to the intrinsic degrees of freedom in the problem, which suggests the sample complexity of the low-rank matrix logistic regression problem can be lower than that for vectorized logistic regression. The proof techniques utilized in this work also set the stage for development of minimax lower bounds for tensor-variate logistic regression problems. △ Less

Submitted 28 January, 2022; v1 submitted 30 May, 2021; originally announced May 2021.

Comments: 8 pages; published in Proc. 55th Asilomar Conf. Signals, Systems, and Computers, Pacific Grove, CA, Oct. 31-Nov. 3, 2021

arXiv:2103.06406 [pdf, other]

doi 10.1109/TSIPN.2021.3122297

Distributed Principal Subspace Analysis for Partitioned Big Data: Algorithms, Analysis, and Implementation

Authors: Arpita Gang, Bingqing Xiang, Waheed U. Bajwa

Abstract: Principal Subspace Analysis (PSA) -- and its sibling, Principal Component Analysis (PCA) -- is one of the most popular approaches for dimensionality reduction in signal processing and machine learning. But centralized PSA/PCA solutions are fast becoming irrelevant in the modern era of big data, in which the number of samples and/or the dimensionality of samples often exceed the storage and/or comp… ▽ More Principal Subspace Analysis (PSA) -- and its sibling, Principal Component Analysis (PCA) -- is one of the most popular approaches for dimensionality reduction in signal processing and machine learning. But centralized PSA/PCA solutions are fast becoming irrelevant in the modern era of big data, in which the number of samples and/or the dimensionality of samples often exceed the storage and/or computational capabilities of individual machines. This has led to the study of distributed PSA/PCA solutions, in which the data are partitioned across multiple machines and an estimate of the principal subspace is obtained through collaboration among the machines. It is in this vein that this paper revisits the problem of distributed PSA/PCA under the general framework of an arbitrarily connected network of machines that lacks a central server. The main contributions of the paper in this regard are threefold. First, two algorithms are proposed in the paper that can be used for distributed PSA/PCA, with one in the case of data partitioned across samples and the other in the case of data partitioned across (raw) features. Second, in the case of sample-wise partitioned data, the proposed algorithm and a variant of it are analyzed, and their convergence to the true subspace at linear rates is established. Third, extensive experiments on both synthetic and real-world data are carried out to validate the usefulness of the proposed algorithms. In particular, in the case of sample-wise partitioned data, an MPI-based distributed implementation is carried out to study the interplay between network topology and communications cost as well as to study the effects of straggler machines on the proposed algorithms. △ Less

Submitted 12 October, 2021; v1 submitted 10 March, 2021; originally announced March 2021.

Comments: 16 pages; Final accepted version; To appear in IEEE Transactions on Signal and Information Processing Over Networks

Journal ref: IEEE Trans. Signal Inform. Proc. over Netw., vol. 7, pp. 699-715, Oct. 2021

arXiv:2101.02625 [pdf, other]

Boundary Conditions for Linear Exit Time Gradient Trajectories Around Saddle Points: Analysis and Algorithm

Authors: Rishabh Dixit, Mert Gurbuzbalaban, Waheed U. Bajwa

Abstract: Gradient-related first-order methods have become the workhorse of large-scale numerical optimization problems. Many of these problems involve nonconvex objective functions with multiple saddle points, which necessitates an understanding of the behavior of discrete trajectories of first-order methods within the geometrical landscape of these functions. This paper concerns convergence of first-order… ▽ More Gradient-related first-order methods have become the workhorse of large-scale numerical optimization problems. Many of these problems involve nonconvex objective functions with multiple saddle points, which necessitates an understanding of the behavior of discrete trajectories of first-order methods within the geometrical landscape of these functions. This paper concerns convergence of first-order discrete methods to a local minimum of nonconvex optimization problems that comprise strict-saddle points within the geometrical landscape. To this end, it focuses on analysis of discrete gradient trajectories around saddle neighborhoods, derives sufficient conditions under which these trajectories can escape strict-saddle neighborhoods in linear time, explores the contractive and expansive dynamics of these trajectories in neighborhoods of strict-saddle points that are characterized by gradients of moderate magnitude, characterizes the non-curving nature of these trajectories, and highlights the inability of these trajectories to re-enter the neighborhoods around strict-saddle points after exiting them. Based on these insights and analyses, the paper then proposes a simple variant of the vanilla gradient descent algorithm, termed Curvature Conditioned Regularized Gradient Descent (CCRGD) algorithm, which utilizes a check for an initial boundary condition to ensure its trajectories can escape strict-saddle neighborhoods in linear time. Convergence analysis of the CCRGD algorithm, which includes its rate of convergence to a local minimum, is also presented in the paper. Numerical experiments are then provided on a test function as well as a low-rank matrix factorization problem to evaluate the efficacy of the proposed algorithm. △ Less

Submitted 9 March, 2022; v1 submitted 7 January, 2021; originally announced January 2021.

Comments: 69 pages; 10 figures; extensive revision of the earlier version, including fewer assumptions, more comparisons with prior art, and new theoretical results

arXiv:2101.01300 [pdf, other]

doi 10.1016/j.sigpro.2021.108408

A Linearly Convergent Algorithm for Distributed Principal Component Analysis

Authors: Arpita Gang, Waheed U. Bajwa

Abstract: Principal Component Analysis (PCA) is the workhorse tool for dimensionality reduction in this era of big data. While often overlooked, the purpose of PCA is not only to reduce data dimensionality, but also to yield features that are uncorrelated. Furthermore, the ever-increasing volume of data in the modern world often requires storage of data samples across multiple machines, which precludes the… ▽ More Principal Component Analysis (PCA) is the workhorse tool for dimensionality reduction in this era of big data. While often overlooked, the purpose of PCA is not only to reduce data dimensionality, but also to yield features that are uncorrelated. Furthermore, the ever-increasing volume of data in the modern world often requires storage of data samples across multiple machines, which precludes the use of centralized PCA algorithms. This paper focuses on the dual objective of PCA, namely, dimensionality reduction and decorrelation of features, but in a distributed setting. This requires estimating the eigenvectors of the data covariance matrix, as opposed to only estimating the subspace spanned by the eigenvectors, when data is distributed across a network of machines. Although a few distributed solutions to the PCA problem have been proposed recently, convergence guarantees and/or communications overhead of these solutions remain a concern. With an eye towards communications efficiency, this paper introduces a feedforward neural network-based one time-scale distributed PCA algorithm termed Distributed Sanger's Algorithm (DSA) that estimates the eigenvectors of the data covariance matrix when data is distributed across an undirected and arbitrarily connected network of machines. Furthermore, the proposed algorithm is shown to converge linearly to a neighborhood of the true solution. Numerical results are also provided to demonstrate the efficacy of the proposed solution. △ Less

Submitted 28 November, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

Comments: 34 pages; final version of journal paper accepted for publication in a special issue of EURASIP J. Signal Processing

arXiv:2006.01106 [pdf, other]

doi 10.1093/imaiai/iaac025

Exit Time Analysis for Approximations of Gradient Descent Trajectories Around Saddle Points

Authors: Rishabh Dixit, Mert Gurbuzbalaban, Waheed U. Bajwa

Abstract: This paper considers the problem of understanding the exit time for trajectories of gradient-related first-order methods from saddle neighborhoods under some initial boundary conditions. Given the 'flat' geometry around saddle points, first-order methods can struggle to escape these regions in a fast manner due to the small magnitudes of gradients encountered. In particular, while it is known that… ▽ More This paper considers the problem of understanding the exit time for trajectories of gradient-related first-order methods from saddle neighborhoods under some initial boundary conditions. Given the 'flat' geometry around saddle points, first-order methods can struggle to escape these regions in a fast manner due to the small magnitudes of gradients encountered. In particular, while it is known that gradient-related first-order methods escape strict-saddle neighborhoods, existing analytic techniques do not explicitly leverage the local geometry around saddle points in order to control behavior of gradient trajectories. It is in this context that this paper puts forth a rigorous geometric analysis of the gradient-descent method around strict-saddle neighborhoods using matrix perturbation theory. In doing so, it provides a key result that can be used to generate an approximate gradient trajectory for any given initial conditions. In addition, the analysis leads to a linear exit-time solution for gradient-descent method under certain necessary initial conditions, which explicitly bring out the dependence on problem dimension, conditioning of the saddle neighborhood, and more, for a class of strict-saddle functions. △ Less

Submitted 6 October, 2023; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: 70 pages; pre-print of the journal paper published in Information and Inference: A Journal of the IMA, 2023

MSC Class: 90C26; 15Axx; 41A58; 65Hxx

Journal ref: Information and Inference: A Journal of the IMA, vol. 12, no. 2, pp. 714-786, Jun. 2023

arXiv:2005.08854 [pdf, other]

doi 10.1109/JPROC.2020.3021381

Scaling-up Distributed Processing of Data Streams for Machine Learning

Authors: Matthew Nokleby, Haroon Raja, Waheed U. Bajwa

Abstract: Emerging applications of machine learning in numerous areas involve continuous gathering of and learning from streams of data. Real-time incorporation of streaming data into the learned models is essential for improved inference in these applications. Further, these applications often involve data that are either inherently gathered at geographically distributed entities or that are intentionally… ▽ More Emerging applications of machine learning in numerous areas involve continuous gathering of and learning from streams of data. Real-time incorporation of streaming data into the learned models is essential for improved inference in these applications. Further, these applications often involve data that are either inherently gathered at geographically distributed entities or that are intentionally distributed across multiple machines for memory, computational, and/or privacy reasons. Training of models in this distributed, streaming setting requires solving stochastic optimization problems in a collaborative manner over communication links between the physical entities. When the streaming data rate is high compared to the processing capabilities of compute nodes and/or the rate of the communications links, this poses a challenging question: how can one best leverage the incoming data for distributed training under constraints on computing capabilities and/or communications rate? A large body of research has emerged in recent decades to tackle this and related problems. This paper reviews recently developed methods that focus on large-scale distributed stochastic optimization in the compute- and bandwidth-limited regime, with an emphasis on convergence analysis that explicitly accounts for the mismatch between computation, communication and streaming rates. In particular, it focuses on methods that solve: (i) distributed stochastic convex problems, and (ii) distributed principal component analysis, which is a nonconvex problem with geometric structure that permits global convergence. For such methods, the paper discusses recent advances in terms of distributed algorithmic designs when faced with high-rate streaming data. Further, it reviews guarantees underlying these methods, which show there exist regimes in which systems can learn from distributed, streaming data at order-optimal rates. △ Less

Submitted 31 August, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

Comments: 45 pages, 9 figures; preprint of a journal paper published in Proceedings of the IEEE (Special Issue on Optimization for Data-driven Learning and Control)

Journal ref: Proc. of the IEEE, vol. 108, no. 11, pp. 1984-2012, Nov. 2020

arXiv:2002.11277 [pdf, other]

Learning Product Graphs Underlying Smooth Graph Signals

Authors: Muhammad Asad Lodhi, Waheed U. Bajwa

Abstract: Real-world data is often times associated with irregular structures that can analytically be represented as graphs. Having access to this graph, which is sometimes trivially evident from domain knowledge, provides a better representation of the data and facilitates various information processing tasks. However, in cases where the underlying graph is unavailable, it needs to be learned from the dat… ▽ More Real-world data is often times associated with irregular structures that can analytically be represented as graphs. Having access to this graph, which is sometimes trivially evident from domain knowledge, provides a better representation of the data and facilitates various information processing tasks. However, in cases where the underlying graph is unavailable, it needs to be learned from the data itself for data representation, data processing and inference purposes. Existing literature on learning graphs from data has mostly considered arbitrary graphs, whereas the graphs generating real-world data tend to have additional structure that can be incorporated in the graph learning procedure. Structure-aware graph learning methods require learning fewer parameters and have the potential to reduce computational, memory and sample complexities. In light of this, the focus of this paper is to devise a method to learn structured graphs from data that are given in the form of product graphs. Product graphs arise naturally in many real-world datasets and provide an efficient and compact representation of large-scale graphs through several smaller factor graphs. To this end, first the graph learning problem is posed as a linear program, which (on average) outperforms the state-of-the-art graph learning algorithms. This formulation is of independent interest itself as it shows that graph learning is possible through a simple linear program. Afterwards, an alternating minimization-based algorithm aimed at learning various types of product graphs is proposed, and local convergence guarantees to the true solution are established for this algorithm. Finally the performance gains, reduced sample complexity, and inference capabilities of the proposed algorithm over existing methods are also validated through numerical simulations on synthetic and real datasets. △ Less

Submitted 12 June, 2020; v1 submitted 25 February, 2020; originally announced February 2020.

Comments: 14 pages, 5 figures, and 2 tables; revised version of the preprint of a journal paper; revision includes restructuring of text, improved theoretical results, and additional references

arXiv:2001.01017 [pdf, ps, other]

Distributed Stochastic Algorithms for High-rate Streaming Principal Component Analysis

Authors: Haroon Raja, Waheed U. Bajwa

Abstract: This paper considers the problem of estimating the principal eigenvector of a covariance matrix from independent and identically distributed data samples in streaming settings. The streaming rate of data in many contemporary applications can be high enough that a single processor cannot finish an iteration of existing methods for eigenvector estimation before a new sample arrives. This paper formu… ▽ More This paper considers the problem of estimating the principal eigenvector of a covariance matrix from independent and identically distributed data samples in streaming settings. The streaming rate of data in many contemporary applications can be high enough that a single processor cannot finish an iteration of existing methods for eigenvector estimation before a new sample arrives. This paper formulates and analyzes a distributed variant of the classical Krasulina's method (D-Krasulina) that can keep up with the high streaming rate of data by distributing the computational load across multiple processing nodes. The analysis shows that---under appropriate conditions---D-Krasulina converges to the principal eigenvector in an order-wise optimal manner; i.e., after receiving $M$ samples across all nodes, its estimation error can be $O(1/M)$. In order to reduce the network communication overhead, the paper also develops and analyzes a mini-batch extension of D-Krasulina, which is termed DM-Krasulina. The analysis of DM-Krasulina shows that it can also achieve order-optimal estimation error rates under appropriate conditions, even when some samples have to be discarded within the network due to communication latency. Finally, experiments are performed over synthetic and real-world data to validate the convergence behaviors of D-Krasulina and DM-Krasulina in high-rate streaming settings. △ Less

Submitted 3 January, 2020; originally announced January 2020.

Comments: 37 pages, 11 figures; preprint of a journal submission

arXiv:1911.03725 [pdf, other]

doi 10.1137/19M1299335

Tensor Regression Using Low-rank and Sparse Tucker Decompositions

Authors: Talal Ahmed, Haroon Raja, Waheed U. Bajwa

Abstract: This paper studies a tensor-structured linear regression model with a scalar response variable and tensor-structured predictors, such that the regression parameters form a tensor of order $d$ (i.e., a $d$-fold multiway array) in $\mathbb{R}^{n_1 \times n_2 \times \cdots \times n_d}$. It focuses on the task of estimating the regression tensor from $m$ realizations of the response variable and the p… ▽ More This paper studies a tensor-structured linear regression model with a scalar response variable and tensor-structured predictors, such that the regression parameters form a tensor of order $d$ (i.e., a $d$-fold multiway array) in $\mathbb{R}^{n_1 \times n_2 \times \cdots \times n_d}$. It focuses on the task of estimating the regression tensor from $m$ realizations of the response variable and the predictors where $m\ll n = \prod \nolimits_{i} n_i$. Despite the seeming ill-posedness of this problem, it can still be solved if the parameter tensor belongs to the space of sparse, low Tucker-rank tensors. Accordingly, the estimation procedure is posed as a non-convex optimization program over the space of sparse, low Tucker-rank tensors, and a tensor variant of projected gradient descent is proposed to solve the resulting non-convex problem. In addition, mathematical guarantees are provided that establish the proposed method linearly converges to an appropriate solution under a certain set of conditions. Further, an upper bound on sample complexity of tensor parameter estimation for the model under consideration is characterized for the special case when the individual (scalar) predictors independently draw values from a sub-Gaussian distribution. The sample complexity bound is shown to have a polylogarithmic dependence on $\bar{n} = \max \big\{n_i: i\in \{1,2,\ldots,d \} \big\}$ and, orderwise, it matches the bound one can obtain from a heuristic parameter counting argument. Finally, numerical experiments demonstrate the efficacy of the proposed tensor model and estimation method on a synthetic dataset and a collection of neuroimaging datasets pertaining to attention deficit hyperactivity disorder. Specifically, the proposed method exhibits better sample complexities on both synthetic and real datasets, demonstrating the usefulness of the model and the method in settings where $n \gg m$. △ Less

Submitted 20 July, 2020; v1 submitted 9 November, 2019; originally announced November 2019.

Comments: 28 pages, 5 figures, 2 tables; preprint of a journal paper published in SIAM Journal on Mathematics of Data Science

MSC Class: 41A52; 41A63; 62F10; 62J05

Journal ref: SIAM J. Math. Data Science, vol. 2, no. 4, pp. 944-966, 2020

arXiv:1908.08649 [pdf, other]

doi 10.1109/MSP.2020.2973345

Adversary-resilient Distributed and Decentralized Statistical Inference and Machine Learning: An Overview of Recent Advances Under the Byzantine Threat Model

Authors: Zhixiong Yang, Arpita Gang, Waheed U. Bajwa

Abstract: While the last few decades have witnessed a huge body of work devoted to inference and learning in distributed and decentralized setups, much of this work assumes a non-adversarial setting in which individual nodes---apart from occasional statistical failures---operate as intended within the algorithmic framework. In recent years, however, cybersecurity threats from malicious non-state actors and… ▽ More While the last few decades have witnessed a huge body of work devoted to inference and learning in distributed and decentralized setups, much of this work assumes a non-adversarial setting in which individual nodes---apart from occasional statistical failures---operate as intended within the algorithmic framework. In recent years, however, cybersecurity threats from malicious non-state actors and rogue entities have forced practitioners and researchers to rethink the robustness of distributed and decentralized algorithms against adversarial attacks. As a result, we now have a plethora of algorithmic approaches that guarantee robustness of distributed and/or decentralized inference and learning under different adversarial threat models. Driven in part by the world's growing appetite for data-driven decision making, however, securing of distributed/decentralized frameworks for inference and learning against adversarial threats remains a rapidly evolving research area. In this article, we provide an overview of some of the most recent developments in this area under the threat model of Byzantine attacks. △ Less

Submitted 1 June, 2020; v1 submitted 22 August, 2019; originally announced August 2019.

Comments: 24 pages, 6 figures, 2 tables; Published in IEEE Signal Processing Magazine, May 2020 (Special Issue on "Machine Learning From Distributed, Streaming Data")

Journal ref: IEEE Signal Processing Mag., vol. 37, no. 3, pp. 146-159, May 2020

arXiv:1908.08098 [pdf, other]

BRIDGE: Byzantine-resilient Decentralized Gradient Descent

Authors: Cheng Fang, Zhixiong Yang, Waheed U. Bajwa

Abstract: Machine learning has begun to play a central role in many applications. A multitude of these applications typically also involve datasets that are distributed across multiple computing devices/machines due to either design constraints (e.g., multiagent systems) or computational/privacy reasons (e.g., learning on smartphone data). Such applications often require the learning tasks to be carried out… ▽ More Machine learning has begun to play a central role in many applications. A multitude of these applications typically also involve datasets that are distributed across multiple computing devices/machines due to either design constraints (e.g., multiagent systems) or computational/privacy reasons (e.g., learning on smartphone data). Such applications often require the learning tasks to be carried out in a decentralized fashion, in which there is no central server that is directly connected to all nodes. In real-world decentralized settings, nodes are prone to undetected failures due to malfunctioning equipment, cyberattacks, etc., which are likely to crash non-robust learning algorithms. The focus of this paper is on robustification of decentralized learning in the presence of nodes that have undergone Byzantine failures. The Byzantine failure model allows faulty nodes to arbitrarily deviate from their intended behaviors, thereby ensuring designs of the most robust of algorithms. But the study of Byzantine resilience within decentralized learning, in contrast to distributed learning, is still in its infancy. In particular, existing Byzantine-resilient decentralized learning methods either do not scale well to large-scale machine learning models, or they lack statistical convergence guarantees that help characterize their generalization errors. In this paper, a scalable, Byzantine-resilient decentralized machine learning framework termed Byzantine-resilient decentralized gradient descent (BRIDGE) is introduced. Algorithmic and statistical convergence guarantees for one variant of BRIDGE are also provided in the paper for both strongly convex problems and a class of nonconvex problems. In addition, large-scale decentralized learning experiments are used to establish that the BRIDGE framework is scalable and it delivers competitive results for Byzantine-resilient convex and nonconvex learning. △ Less

Submitted 14 June, 2022; v1 submitted 21 August, 2019; originally announced August 2019.

Comments: 20 pages, 10 figures, 2 tables; some expanded discussion as well as additional numerical experiments using the CIFAR-10 dataset

arXiv:1908.00195 [pdf, other]

doi 10.1109/TCCN.2020.2990657

Learning-Aided Physical Layer Attacks Against Multicarrier Communications in IoT

Authors: Alireza Nooraiepour, Waheed U. Bajwa, Narayan B. Mandayam

Abstract: Internet-of-Things (IoT) devices that are limited in power and processing are susceptible to physical layer (PHY) spoofing (signal exploitation) attacks owing to their inability to implement a full-blown protocol stack for security. The overwhelming adoption of multicarrier techniques such as orthogonal frequency division multiplexing (OFDM) for the PHY layer makes IoT devices further vulnerable t… ▽ More Internet-of-Things (IoT) devices that are limited in power and processing are susceptible to physical layer (PHY) spoofing (signal exploitation) attacks owing to their inability to implement a full-blown protocol stack for security. The overwhelming adoption of multicarrier techniques such as orthogonal frequency division multiplexing (OFDM) for the PHY layer makes IoT devices further vulnerable to PHY spoofing attacks. These attacks which aim at injecting bogus/spurious data into the receiver, involve inferring transmission parameters and finding PHY characteristics of the transmitted signals so as to spoof the received signal. Non-contiguous (NC) OFDM systems have been argued to have low probability of exploitation (LPE) characteristics against classic attacks based on cyclostationary analysis, and the corresponding PHY has been deemed to be secure. However, with the advent of machine learning (ML) algorithms, adversaries can devise data-driven attacks to compromise such systems. It is in this vein that PHY spoofing performance of adversaries equipped with supervised and unsupervised ML tools are investigated in this paper. The supervised ML approach is based on deep neural networks (DNN) while the unsupervised one employs variational autoencoders (VAEs). In particular, VAEs are shown to be capable of learning representations from NC-OFDM signals related to their PHY characteristics such as frequency pattern and modulation scheme, which are useful for PHY spoofing. In addition, a new metric based on the disentanglement principle is proposed to measure the quality of such learned representations. Simulation results demonstrate that the performance of the spoofing adversaries highly depends on the subcarriers' allocation patterns. Particularly, it is shown that utilizing a random subcarrier occupancy pattern secures NC-OFDM systems against ML-based attacks. △ Less

Submitted 4 July, 2020; v1 submitted 31 July, 2019; originally announced August 2019.

Comments: 15 pages; 20 figures; 3 tables; preprint of a paper accepted for publication in IEEE Trans. Cognitive Commun. Netw

Journal ref: IEEE Trans. Cognitive Commun. Netw., vol. 7, no. 1, pp. 239-254, Mar. 2021

arXiv:1903.09284 [pdf, other]

doi 10.1109/TSP.2019.2952046

Learning Mixtures of Separable Dictionaries for Tensor Data: Analysis and Algorithms

Authors: Mohsen Ghassemi, Zahra Shakeri, Anand D. Sarwate, Waheed U. Bajwa

Abstract: This work addresses the problem of learning sparse representations of tensor data using structured dictionary learning. It proposes learning a mixture of separable dictionaries to better capture the structure of tensor data by generalizing the separable dictionary learning model. Two different approaches for learning mixture of separable dictionaries are explored and sufficient conditions for loca… ▽ More This work addresses the problem of learning sparse representations of tensor data using structured dictionary learning. It proposes learning a mixture of separable dictionaries to better capture the structure of tensor data by generalizing the separable dictionary learning model. Two different approaches for learning mixture of separable dictionaries are explored and sufficient conditions for local identifiability of the underlying dictionary are derived in each case. Moreover, computational algorithms are developed to solve the problem of learning mixture of separable dictionaries in both batch and online settings. Numerical experiments are used to show the usefulness of the proposed model and the efficacy of the developed algorithms. △ Less

Submitted 13 June, 2020; v1 submitted 21 March, 2019; originally announced March 2019.

Comments: 18 pages, 4 figures, 3 tables; Published in IEEE Trans. Signal Processing

Journal ref: IEEE Trans. Signal Processing, vol. 68, pp. 33-48, 2020

arXiv:1810.00532 [pdf, ps, other]

How Secure are Multicarrier Communication Systems Against Signal Exploitation Attacks?

Authors: Alireza Nooraiepour, Kenza Hamidouche, Waheed U. Bajwa, Narayan Mandayam

Abstract: In this paper, robustness of non-contiguous orthogonal frequency division multiplexing (NC-OFDM) transmissions is investigated and contrasted to OFDM transmissions for fending off signal exploitation attacks. In contrast to ODFM transmissions, NC-OFDM transmissions take place over a subset of active subcarriers to either avoid incumbent transmissions or for strategic considerations. A point-to-poi… ▽ More In this paper, robustness of non-contiguous orthogonal frequency division multiplexing (NC-OFDM) transmissions is investigated and contrasted to OFDM transmissions for fending off signal exploitation attacks. In contrast to ODFM transmissions, NC-OFDM transmissions take place over a subset of active subcarriers to either avoid incumbent transmissions or for strategic considerations. A point-to-point communication system is considered in this paper in the presence of an adversary (exploiter) that aims to infer transmission parameters (e.g., the subset of active subcarriers and duration of the signal) using a deep neural network (DNN). This method has been proposed since the existing methods for exploitation, which are based on cyclostationary analysis, have been shown to have limited success in NC-OFDM systems. A good estimation of the transmission parameters allows the adversary to transmit spurious data and attack the legitimate receiver. Simulation results show that the DNN can infer the transmit parameters of OFDM signals with very good accuracy. However, NC-OFDM with fully random selection of active subcarriers makes it difficult for the adversary to exploit the waveform and thus for the receiver to be affected by the spurious data. Moreover, the more structured the set of active subcarriers selected by the transmitter is, the easier it is for the adversary to infer the transmission parameters and attack the receiver using a DNN. △ Less

Submitted 1 October, 2018; originally announced October 2018.

arXiv:1711.08532 [pdf, other]

doi 10.1109/TSP.2018.2875897

Detection Theory for Union of Subspaces

Authors: Muhammad Asad Lodhi, Waheed U. Bajwa

Abstract: The focus of this paper is on detection theory for union of subspaces (UoS). To this end, generalized likelihood ratio tests (GLRTs) are presented for detection of signals conforming to the UoS model and detection of the corresponding "active" subspace. One of the main contributions of this paper is bounds on the performances of these GLRTs in terms of geometry of subspaces under various assumptio… ▽ More The focus of this paper is on detection theory for union of subspaces (UoS). To this end, generalized likelihood ratio tests (GLRTs) are presented for detection of signals conforming to the UoS model and detection of the corresponding "active" subspace. One of the main contributions of this paper is bounds on the performances of these GLRTs in terms of geometry of subspaces under various assumptions on the observation noise. The insights obtained through geometrical interpretation of the GLRTs are also validated through extensive numerical experiments on both synthetic and real-world data. △ Less

Submitted 19 January, 2019; v1 submitted 22 November, 2017; originally announced November 2017.

Comments: 16 pages, 1 table, 15 figures

Journal ref: Published in IEEE Trans. Signal Processing, vol. 66, no. 24, pp. 6347-6362, Dec. 2018

Showing 1–22 of 22 results for author: Bajwa, W U