-
A Unified Framework for Gradient-based Clustering of Distributed Data
Authors:
Aleksandar Armacki,
Dragana Bajović,
Dušan Jakovetić,
Soummya Kar
Abstract:
We develop a family of distributed clustering algorithms that work over networks of users. In the proposed scenario, users contain a local dataset and communicate only with their immediate neighbours, with the aim of finding a clustering of the full, joint data. The proposed family, termed Distributed Gradient Clustering (DGC-$\mathcal{F}_ρ$), is parametrized by $ρ\geq 1$, controling the proximity…
▽ More
We develop a family of distributed clustering algorithms that work over networks of users. In the proposed scenario, users contain a local dataset and communicate only with their immediate neighbours, with the aim of finding a clustering of the full, joint data. The proposed family, termed Distributed Gradient Clustering (DGC-$\mathcal{F}_ρ$), is parametrized by $ρ\geq 1$, controling the proximity of users' center estimates, with $\mathcal{F}$ determining the clustering loss. Specialized to popular clustering losses like $K$-means and Huber loss, DGC-$\mathcal{F}_ρ$ gives rise to novel distributed clustering algorithms DGC-KM$_ρ$ and DGC-HL$_ρ$, while a novel clustering loss based on the logistic function leads to DGC-LL$_ρ$. We provide a unified analysis and establish several strong results, under mild assumptions. First, the sequence of centers generated by the methods converges to a well-defined notion of fixed point, under any center initialization and value of $ρ$. Second, as $ρ$ increases, the family of fixed points produced by DGC-$\mathcal{F}_ρ$ converges to a notion of consensus fixed points. We show that consensus fixed points of DGC-$\mathcal{F}_ρ$ are equivalent to fixed points of gradient clustering over the full data, guaranteeing a clustering of the full data is produced. For the special case of Bregman losses, we show that our fixed points converge to the set of Lloyd points. Numerical experiments on real data confirm our theoretical findings and demonstrate strong performance of the methods.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise
Authors:
Aleksandar Armacki,
Pranay Sharma,
Gauri Joshi,
Dragana Bajovic,
Dusan Jakovetic,
Soummya Kar
Abstract:
We study high-probability convergence guarantees of learning on streaming data in the presence of heavy-tailed noise. In the proposed scenario, the model is updated in an online fashion, as new information is observed, without storing any additional data. To combat the heavy-tailed noise, we consider a general framework of nonlinear stochastic gradient descent (SGD), providing several strong resul…
▽ More
We study high-probability convergence guarantees of learning on streaming data in the presence of heavy-tailed noise. In the proposed scenario, the model is updated in an online fashion, as new information is observed, without storing any additional data. To combat the heavy-tailed noise, we consider a general framework of nonlinear stochastic gradient descent (SGD), providing several strong results. First, for non-convex costs and component-wise nonlinearities, we establish a convergence rate arbitrarily close to $\mathcal{O}\left(t^{-\frac{1}{4}}\right)$, whose exponent is independent of noise and problem parameters. Second, for strongly convex costs and component-wise nonlinearities, we establish a rate arbitrarily close to $\mathcal{O}\left(t^{-\frac{1}{2}}\right)$ for the weighted average of iterates, with exponent again independent of noise and problem parameters. Finally, for strongly convex costs and a broader class of nonlinearities, we establish convergence of the last iterate, with a rate $\mathcal{O}\left(t^{-ζ} \right)$, where $ζ\in (0,1)$ depends on problem parameters, noise and nonlinearity. As we show analytically and numerically, $ζ$ can be used to inform the preferred choice of nonlinearity for given problem settings. Compared to state-of-the-art, who only consider clip**, require bounded noise moments of order $η\in (1,2]$, and establish convergence rates whose exponents go to zero as $η\rightarrow 1$, we provide high-probability guarantees for a much broader class of nonlinearities and symmetric density noise, with convergence rates whose exponents are bounded away from zero, even when the noise has finite first moment only. Moreover, in the case of strongly convex functions, we demonstrate analytically and numerically that clip** is not always the optimal nonlinearity, further underlining the value of our general framework.
△ Less
Submitted 30 April, 2024; v1 submitted 28 October, 2023;
originally announced October 2023.
-
Nonlinear consensus+innovations under correlated heavy-tailed noises: Mean square convergence rate and asymptotics
Authors:
Manojlo Vukovic,
Dusan Jakovetic,
Dragana Bajovic,
Soummya Kar
Abstract:
We consider distributed recursive estimation of consensus+innovations type in the presence of heavy-tailed sensing and communication noises. We allow that the sensing and communication noises are mutually correlated while independent identically distributed (i.i.d.) in time, and that they may both have infinite moments of order higher than one (hence having infinite variances). Such heavy-tailed,…
▽ More
We consider distributed recursive estimation of consensus+innovations type in the presence of heavy-tailed sensing and communication noises. We allow that the sensing and communication noises are mutually correlated while independent identically distributed (i.i.d.) in time, and that they may both have infinite moments of order higher than one (hence having infinite variances). Such heavy-tailed, infinite-variance noises are highly relevant in practice and are shown to occur, e.g., in dense internet of things (IoT) deployments. We develop a consensus+innovations distributed estimator that employs a general nonlinearity in both consensus and innovations steps to combat the noise. We establish the estimator's almost sure convergence, asymptotic normality, and mean squared error (MSE) convergence. Moreover, we establish and explicitly quantify for the estimator a sublinear MSE convergence rate. We then quantify through analytical examples the effects of the nonlinearity choices and the noises correlation on the system performance. Finally, numerical examples corroborate our findings and verify that the proposed method works in the simultaneous heavy-tail communication-sensing noise setting, while existing methods fail under the same noise conditions.
△ Less
Submitted 9 November, 2023; v1 submitted 22 December, 2022;
originally announced December 2022.
-
Large deviations rates for stochastic gradient descent with strongly convex functions
Authors:
Dragana Bajovic,
Dusan Jakovetic,
Soummya Kar
Abstract:
Recent works have shown that high probability metrics with stochastic gradient descent (SGD) exhibit informativeness and in some cases advantage over the commonly adopted mean-square error-based ones. In this work we provide a formal framework for the study of general high probability bounds with SGD, based on the theory of large deviations. The framework allows for a generic (not-necessarily boun…
▽ More
Recent works have shown that high probability metrics with stochastic gradient descent (SGD) exhibit informativeness and in some cases advantage over the commonly adopted mean-square error-based ones. In this work we provide a formal framework for the study of general high probability bounds with SGD, based on the theory of large deviations. The framework allows for a generic (not-necessarily bounded) gradient noise satisfying mild technical assumptions, allowing for the dependence of the noise distribution on the current iterate. Under the preceding assumptions, we find an upper large deviations bound for SGD with strongly convex functions. The corresponding rate function captures analytical dependence on the noise distribution and other problem parameters. This is in contrast with conventional mean-square error analysis that captures only the noise dependence through the variance and does not capture the effect of higher order moments nor interplay between the noise geometry and the shape of the cost function. We also derive exact large deviation rates for the case when the objective function is quadratic and show that the obtained function matches the one from the general upper bound hence showing the tightness of the general upper bound. Numerical examples illustrate and corroborate theoretical findings.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
A One-shot Framework for Distributed Clustered Learning in Heterogeneous Environments
Authors:
Aleksandar Armacki,
Dragana Bajovic,
Dusan Jakovetic,
Soummya Kar
Abstract:
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments in which users obtain data from one of $K$ different distributions. In the proposed setup, the grou** of users (based on the data distributions they sample), as well as the underlying statistical properties of the distributions, are apriori unknown. A family of One-shot Distribut…
▽ More
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments in which users obtain data from one of $K$ different distributions. In the proposed setup, the grou** of users (based on the data distributions they sample), as well as the underlying statistical properties of the distributions, are apriori unknown. A family of One-shot Distributed Clustered Learning methods (ODCL-$\mathcal{C}$) is proposed, parametrized by the set of admissible clustering algorithms $\mathcal{C}$, with the objective of learning the true model at each user. The admissible clustering methods include $K$-means (KM) and convex clustering (CC), giving rise to various one-shot methods within the proposed family, such as ODCL-KM and ODCL-CC. The proposed one-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees. In particular, for strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error (MSE) rates in terms of the sample size. An explicit characterization of the threshold is provided in terms of problem parameters. The trade-offs with respect to selecting various clustering methods (ODCL-CC, ODCL-KM) are discussed and significant improvements over state-of-the-art are demonstrated. Numerical experiments illustrate the findings and corroborate the performance of the proposed methods.
△ Less
Submitted 21 October, 2023; v1 submitted 22 September, 2022;
originally announced September 2022.
-
Inaccuracy rates for distributed inference over random networks with applications to social learning
Authors:
Dragana Bajovic
Abstract:
This paper studies probabilistic rates of convergence for consensus+innovations type of algorithms in random, generic networks. For each node, we find a lower and also a family of upper bounds on the large deviations rate function, thus enabling the computation of the exponential convergence rates for the events of interest on the iterates. Relevant applications include error exponents in distribu…
▽ More
This paper studies probabilistic rates of convergence for consensus+innovations type of algorithms in random, generic networks. For each node, we find a lower and also a family of upper bounds on the large deviations rate function, thus enabling the computation of the exponential convergence rates for the events of interest on the iterates. Relevant applications include error exponents in distributed hypothesis testing, rates of convergence of beliefs in social learning, and inaccuracy rates in distributed estimation. The bounds on the rate function have a very particular form at each node: they are constructed as the convex envelope between the rate function of the hypothetical fusion center and the rate function corresponding to a certain topological mode of the node's presence. We further show tightness of the discovered bounds for several cases, such as pendant nodes and regular networks, thus establishing the first proof of the large deviations principle for consensus+innovations and social learning in random networks.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
Dynamic Split Computing for Efficient Deep Edge Intelligence
Authors:
Arian Bakhtiarnia,
Nemanja Milošević,
Qi Zhang,
Dragana Bajović,
Alexandros Iosifidis
Abstract:
Deploying deep neural networks (DNNs) on IoT and mobile devices is a challenging task due to their limited computational resources. Thus, demanding tasks are often entirely offloaded to edge servers which can accelerate inference, however, it also causes communication cost and evokes privacy concerns. In addition, this approach leaves the computational capacity of end devices unused. Split computi…
▽ More
Deploying deep neural networks (DNNs) on IoT and mobile devices is a challenging task due to their limited computational resources. Thus, demanding tasks are often entirely offloaded to edge servers which can accelerate inference, however, it also causes communication cost and evokes privacy concerns. In addition, this approach leaves the computational capacity of end devices unused. Split computing is a paradigm where a DNN is split into two sections; the first section is executed on the end device, and the output is transmitted to the edge server where the final section is executed. Here, we introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel. By using natural bottlenecks that already exist in modern DNN architectures, dynamic split computing avoids retraining and hyperparameter optimization, and does not have any negative impact on the final accuracy of DNNs. Through extensive experiments, we show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
△ Less
Submitted 17 June, 2022; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Nonlinear gradient map**s and stochastic optimization: A general framework with applications to heavy-tail noise
Authors:
Dusan Jakovetic,
Dragana Bajovic,
Anit Kumar Sahu,
Soummya Kar,
Nemanja Milosevic,
Dusan Stamenkovic
Abstract:
We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios when gradient noise exhibits heavy tails. The proposed framework subsumes several popular nonlinearity choices, like clipped, normalized, signed or quantized gradient, but we also consider novel nonlinearity choices. We establish for the considered class of methods strong convergence guarantees assum…
▽ More
We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios when gradient noise exhibits heavy tails. The proposed framework subsumes several popular nonlinearity choices, like clipped, normalized, signed or quantized gradient, but we also consider novel nonlinearity choices. We establish for the considered class of methods strong convergence guarantees assuming a strongly convex cost function with Lipschitz continuous gradients under very general assumptions on the gradient noise. Most notably, we show that, for a nonlinearity with bounded outputs and for the gradient noise that may not have finite moments of order greater than one, the nonlinear SGD's mean squared error (MSE), or equivalently, the expected cost function's optimality gap, converges to zero at rate~$O(1/t^ζ)$, $ζ\in (0,1)$. In contrast, for the same noise setting, the linear SGD generates a sequence with unbounded variances. Furthermore, for the nonlinearities that can be decoupled component wise, like, e.g., sign gradient or component-wise clip**, we show that the nonlinear SGD asymptotically (locally) achieves a $O(1/t)$ rate in the weak convergence sense and explicitly quantify the corresponding asymptotic variance. Experiments show that, while our framework is more general than existing studies of SGD under heavy-tail noise, several easy-to-implement nonlinearities from our framework are competitive with state of the art alternatives on real data sets with heavy tail noises.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
Gradient Based Clustering
Authors:
Aleksandar Armacki,
Dragana Bajovic,
Dusan Jakovetic,
Soummya Kar
Abstract:
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality with respect to cluster assignments and cluster center positions. The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions, satisfying some mild assumptions. Th…
▽ More
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality with respect to cluster assignments and cluster center positions. The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions, satisfying some mild assumptions. The main advantage of the proposed approach is a simple and computationally cheap update rule. Unlike previous methods that specialize to a specific formulation of the clustering problem, our approach is applicable to a wide range of costs, including non-Bregman clustering methods based on the Huber loss. We analyze the convergence of the proposed algorithm, and show that it converges to the set of appropriately defined fixed points, under arbitrary center initialization. In the special case of Bregman cost functions, the algorithm converges to the set of centroidal Voronoi partitions, which is consistent with prior works. Numerical experiments on real data demonstrate the effectiveness of the proposed method.
△ Less
Submitted 17 June, 2022; v1 submitted 1 February, 2022;
originally announced February 2022.
-
Personalized Federated Learning via Convex Clustering
Authors:
Aleksandar Armacki,
Dragana Bajovic,
Dusan Jakovetic,
Soummya Kar
Abstract:
We propose a parametric family of algorithms for personalized federated learning with locally convex user costs. The proposed framework is based on a generalization of convex clustering in which the differences between different users' models are penalized via a sum-of-norms penalty, weighted by a penalty parameter $λ$. The proposed approach enables "automatic" model clustering, without prior know…
▽ More
We propose a parametric family of algorithms for personalized federated learning with locally convex user costs. The proposed framework is based on a generalization of convex clustering in which the differences between different users' models are penalized via a sum-of-norms penalty, weighted by a penalty parameter $λ$. The proposed approach enables "automatic" model clustering, without prior knowledge of the hidden cluster structure, nor the number of clusters. Analytical bounds on the weight parameter, that lead to simultaneous personalization, generalization and automatic model clustering are provided. The solution to the formulated problem enables personalization, by providing different models across different clusters, and generalization, by providing models different than the per-user models computed in isolation. We then provide an efficient algorithm based on the Parallel Direction Method of Multipliers (PDMM) to solve the proposed formulation in a federated server-users setting. Numerical experiments corroborate our findings. As an interesting byproduct, our results provide several generalizations to convex clustering.
△ Less
Submitted 17 February, 2022; v1 submitted 1 February, 2022;
originally announced February 2022.
-
Deep Learning Anomaly Detection for Cellular IoT with Applications in Smart Logistics
Authors:
Milos Savic,
Milan Lukic,
Dragan Danilovic,
Zarko Bodroski,
Dragana Bajovic,
Ivan Mezei,
Dejan Vukobratovic,
Srdjan Skrbic,
Dusan Jakovetic
Abstract:
The number of connected Internet of Things (IoT) devices within cyber-physical infrastructure systems grows at an increasing rate. This poses significant device management and security challenges to current IoT networks. Among several approaches to cope with these challenges, data-based methods rooted in deep learning (DL) are receiving an increased interest. In this paper, motivated by the upcomi…
▽ More
The number of connected Internet of Things (IoT) devices within cyber-physical infrastructure systems grows at an increasing rate. This poses significant device management and security challenges to current IoT networks. Among several approaches to cope with these challenges, data-based methods rooted in deep learning (DL) are receiving an increased interest. In this paper, motivated by the upcoming surge of 5G IoT connectivity in industrial environments, we propose to integrate a DL-based anomaly detection (AD) as a service into the 3GPP mobile cellular IoT architecture. The proposed architecture embeds autoencoder based anomaly detection modules both at the IoT devices (ADM-EDGE) and in the mobile core network (ADM-FOG), thereby balancing between the system responsiveness and accuracy. We design, integrate, demonstrate and evaluate a testbed that implements the above service in a real-world deployment integrated within the 3GPP Narrow-Band IoT (NB-IoT) mobile operator network.
△ Less
Submitted 2 April, 2021; v1 submitted 17 February, 2021;
originally announced February 2021.
-
Primal-dual methods for large-scale and distributed convex optimization and data analytics
Authors:
Dusan Jakovetic,
Dragana Bajovic,
Joao Xavier,
Jose M. F. Moura
Abstract:
The augmented Lagrangian method (ALM) is a classical optimization tool that solves a given "difficult" (constrained) problem via finding solutions of a sequence of "easier"(often unconstrained) sub-problems with respect to the original (primal) variable, wherein constraints satisfaction is controlled via the so-called dual variables. ALM is highly flexible with respect to how primal sub-problems c…
▽ More
The augmented Lagrangian method (ALM) is a classical optimization tool that solves a given "difficult" (constrained) problem via finding solutions of a sequence of "easier"(often unconstrained) sub-problems with respect to the original (primal) variable, wherein constraints satisfaction is controlled via the so-called dual variables. ALM is highly flexible with respect to how primal sub-problems can be solved, giving rise to a plethora of different primal-dual methods. The powerful ALM mechanism has recently proved to be very successful in various large scale and distributed applications. In addition, several significant advances have appeared, primarily on precise complexity results with respect to computational and communication costs in the presence of inexact updates and design and analysis of novel optimal methods for distributed consensus optimization. We provide a tutorial-style introduction to ALM and its variants for solving convex optimization problems in large scale and distributed settings. We describe control-theoretic tools for the algorithms' analysis and design, survey recent results, and provide novel insights in the context of two emerging applications: federated learning and distributed energy trading.
△ Less
Submitted 14 April, 2020; v1 submitted 18 December, 2019;
originally announced December 2019.
-
Distributed Intelligent Illumination Control in the Context of Probabilistic Graphical Models
Authors:
M. Cosovic,
T. Devaja,
D. Bajovic,
J. Machaj,
G. McCutcheon,
V. Stankovic,
L. Stankovic,
D. Vukobratovic
Abstract:
Lighting systems based on light-emitting diodes (LEDs) possess many benefits over their incandescent counterparts including longer lifespans, lower energy costs, better quality of light and no toxic elements, all without sacrificing consumer satisfaction. Their lifespan is not affected by switching frequency allowing for better illumination control and system efficiency. In this paper, we present…
▽ More
Lighting systems based on light-emitting diodes (LEDs) possess many benefits over their incandescent counterparts including longer lifespans, lower energy costs, better quality of light and no toxic elements, all without sacrificing consumer satisfaction. Their lifespan is not affected by switching frequency allowing for better illumination control and system efficiency. In this paper, we present a fully distributed energy-saving illumination dimming control strategy for the system of a lighting network which consists of a group of LEDs and user-associated devices. In order to solve the optimization problem, we are using a distributed approach that utilizes factor graphs and the belief propagation algorithm. Using probabilistic graphical models to represent and solve the system model provides for a natural description of the problem structure, where user devices and LED controllers exchange data via line-of-sight communication.
△ Less
Submitted 12 June, 2019;
originally announced June 2019.
-
Optimal detection and error exponents for hidden multi-state processes via random duration model approach
Authors:
Dragana Bajović,
Kanghang He,
Lina Stanković,
Dejan Vukobratović,
Vladimir Stanković
Abstract:
We study detection of random signals corrupted by noise that over time switch their values (states) from a finite set of possible values, where the switchings occur at unknown points in time. We model such signals by means of a random duration model that to each possible state assigns a probability mass function which controls the statistics of durations of that state occurrences. Assuming two pos…
▽ More
We study detection of random signals corrupted by noise that over time switch their values (states) from a finite set of possible values, where the switchings occur at unknown points in time. We model such signals by means of a random duration model that to each possible state assigns a probability mass function which controls the statistics of durations of that state occurrences. Assuming two possible signal states and Gaussian noise, we derive optimal likelihood ratio test and show that it has a computationally tractable form of a matrix product, with the number of matrices involved in the product being the number of process observations. Each matrix involved in the product is of dimension equal to the sum of durations spreads of the two states, and it can be decomposed as a product of a diagonal random matrix controlled by the process observations and a sparse constant matrix which governs transitions in the sequence of states. Using this result, we show that the Neyman-Pearson error exponent is equal to the top Lyapunov exponent for the corresponding random matrices. Using theory of large deviations, we derive a lower bound on the error exponent. Finally, we show that this bound is tight by means of numerical simulations.
△ Less
Submitted 25 December, 2017;
originally announced December 2017.
-
Distributed second order methods with increasing number of working nodes
Authors:
Natasa Krklec Jerinkic,
Dusan Jakovetic,
Natasa Krejic,
Dragana Bajovic
Abstract:
Recently, an idling mechanism has been introduced in the context of distributed \emph{first order} methods for minimization of a sum of nodes' local convex costs over a generic, connected network. With the idling mechanism, each node $i$, at each iteration $k$, is active -- updates its solution estimate and exchanges messages with its network neighborhood -- with probability $p_k$, and it stays id…
▽ More
Recently, an idling mechanism has been introduced in the context of distributed \emph{first order} methods for minimization of a sum of nodes' local convex costs over a generic, connected network. With the idling mechanism, each node $i$, at each iteration $k$, is active -- updates its solution estimate and exchanges messages with its network neighborhood -- with probability $p_k$, and it stays idle with probability $1-p_k$, while the activations are independent both across nodes and across iterations. In this paper, we demonstrate that the idling mechanism can be successfully incorporated in \emph{distributed second order methods} also. Specifically, we apply the idling mechanism to the recently proposed Distributed Quasi Newton method (DQN). We first show theoretically that, when $p_k$ grows to one across iterations in a controlled manner, DQN with idling exhibits very similar theoretical convergence and convergence rates properties as the standard DQN method, thus achieving the same order of convergence rate (R-linear) as the standard DQN, but with significantly cheaper updates. Simulation examples confirm the benefits of incorporating the idling mechanism, demonstrate the method's flexibility with respect to the choice of the $p_k$'s, and compare the proposed idling method with related algorithms from the literature.
△ Less
Submitted 20 September, 2018; v1 submitted 5 September, 2017;
originally announced September 2017.
-
Detecting random walks on graphs with heterogeneous sensors
Authors:
Dragana Bajovic,
José M. F. Moura,
Dejan Vukobratovic
Abstract:
We consider the problem of detecting a random walk on a graph, based on observations of the graph nodes. When visited by the walk, each node of the graph observes a signal of elevated mean, which we assume can be different across different nodes. Outside of the path of the walk, and also in its absence, nodes measure only noise. Assuming the Neyman-Pearson setting, our goal then is to characterize…
▽ More
We consider the problem of detecting a random walk on a graph, based on observations of the graph nodes. When visited by the walk, each node of the graph observes a signal of elevated mean, which we assume can be different across different nodes. Outside of the path of the walk, and also in its absence, nodes measure only noise. Assuming the Neyman-Pearson setting, our goal then is to characterize detection performance by computing the error exponent for the probability of a miss, under a constraint on the probability of false alarm. Since exact computation of the error exponent is known to be difficult, equivalent to computation of the Lyapunov exponent, we approximate its value by finding a tractable lower bound. The bound reveals an interesting detectability condition: the walk is detectable whenever the entropy of the walk is smaller than one half of the expected signal-to-noise ratio. We derive the bound by extending the notion of Markov types to Gauss-Markov types. These are sequences of state-observation pairs with a given number of node-to-node transition counts and the same average signal values across nodes, computed from the measurements made during the times the random walk was visiting each node's respective location. The lower bound has an intuitive interpretation: among all Gauss-Markov types that are asymptotically feasible in the absence of the walk, the bound finds the most typical one under the presence of the walk. Finally, we show by a sequence of judicious problem reformulations that computing the bound reduces to solving a convex optimization problem, which is a result in its own right.
△ Less
Submitted 1 October, 2018; v1 submitted 21 July, 2017;
originally announced July 2017.
-
Cooperative Slotted ALOHA for Massive M2M Random Access Using Directional Antennas
Authors:
Aleksandar Mastilovic,
Dejan Vukobratovic,
Dusan Jakovetic,
Dragana Bajovic
Abstract:
Slotted ALOHA (SA) algorithms with Successive Interference Cancellation (SIC) decoding have received significant attention lately due to their ability to dramatically increase the throughput of traditional SA. Motivated by increased density of cellular radio access networks due to the introduction of small cells, and dramatic increase of user density in Machine-to-Machine (M2M) communications, SA…
▽ More
Slotted ALOHA (SA) algorithms with Successive Interference Cancellation (SIC) decoding have received significant attention lately due to their ability to dramatically increase the throughput of traditional SA. Motivated by increased density of cellular radio access networks due to the introduction of small cells, and dramatic increase of user density in Machine-to-Machine (M2M) communications, SA algorithms with SIC operating cooperatively in multi base station (BS) scenario are recently considered. In this paper, we generalize our previous work on Slotted ALOHA with multiple-BS (SA-MBS) by considering users that use directional antennas. In particular, we focus on a simple randomized beamforming strategy where, for every packet transmission, a user orients its main beam in a randomly selected direction. We are interested in the total achievable system throughput for two decoding scenarios: i) non-cooperative scenario in which traditional SA operates at each BS independently, and ii) cooperative SA-MBS in which centralized SIC-based decoding is applied over all received user signals. For both scenarios, we provide upper system throughput limits and compare them against the simulation results. Finally, we discuss the system performance as a function of simple directional antenna model parameters applied in this paper.
△ Less
Submitted 29 June, 2017;
originally announced June 2017.
-
CONDENSE: A Reconfigurable Knowledge Acquisition Architecture for Future 5G IoT
Authors:
Dejan Vukobratovic,
Dusan Jakovetic,
Vitaly Skachek,
Dragana Bajovic,
Dino Sejdinovic,
Gunes Karabulut Kurt,
Camilla Hollanti,
Ingo Fischer
Abstract:
In forthcoming years, the Internet of Things (IoT) will connect billions of smart devices generating and uploading a deluge of data to the cloud. If successfully extracted, the knowledge buried in the data can significantly improve the quality of life and foster economic growth. However, a critical bottleneck for realising the efficient IoT is the pressure it puts on the existing communication inf…
▽ More
In forthcoming years, the Internet of Things (IoT) will connect billions of smart devices generating and uploading a deluge of data to the cloud. If successfully extracted, the knowledge buried in the data can significantly improve the quality of life and foster economic growth. However, a critical bottleneck for realising the efficient IoT is the pressure it puts on the existing communication infrastructures, requiring transfer of enormous data volumes. Aiming at addressing this problem, we propose a novel architecture dubbed Condense, which integrates the IoT-communication infrastructure into data analysis. This is achieved via the generic concept of network function computation: Instead of merely transferring data from the IoT sources to the cloud, the communication infrastructure should actively participate in the data analysis by carefully designed en-route processing. We define the Condense architecture, its basic layers, and the interactions among its constituent modules. Further, from the implementation side, we describe how Condense can be integrated into the 3rd Generation Partnership Project (3GPP) Machine Type Communications (MTC) architecture, as well as the prospects of making it a practically viable technology in a short time frame, relying on Network Function Virtualization (NFV) and Software Defined Networking (SDN). Finally, from the theoretical side, we survey the relevant literature on computing "atomic" functions in both analog and digital domains, as well as on function decomposition over networks, highlighting challenges, insights, and future directions for exploiting these techniques within practical 3GPP MTC architecture.
△ Less
Submitted 12 September, 2016;
originally announced September 2016.
-
Newton-like method with diagonal correction for distributed optimization
Authors:
Dragana Bajovic,
Dusan Jakovetic,
Natasa Krejic,
Natasa Krklec Jerinkic
Abstract:
We consider distributed optimization problems where networked nodes cooperatively minimize the sum of their locally known convex costs. A popular class of methods to solve these problems are the distributed gradient methods, which are attractive due to their inexpensive iterations, but have a drawback of slow convergence rates. This motivates the incorporation of second-order information in the di…
▽ More
We consider distributed optimization problems where networked nodes cooperatively minimize the sum of their locally known convex costs. A popular class of methods to solve these problems are the distributed gradient methods, which are attractive due to their inexpensive iterations, but have a drawback of slow convergence rates. This motivates the incorporation of second-order information in the distributed methods, but this task is challenging: although the Hessians which arise in the algorithm design respect the sparsity of the network, their inverses are dense, hence rendering distributed implementations difficult. We overcome this challenge and propose a class of distributed Newton-like methods, which we refer to as Distributed Quasi Newton (DQN). The DQN family approximates the Hessian inverse by: 1) splitting the Hessian into its diagonal and off-diagonal part, 2) inverting the diagonal part, and 3) approximating the inverse of the off-diagonal part through a weighted linear function. The approximation is parameterized by the tuning variables which correspond to different splittings of the Hessian and by different weightings of the off-diagonal Hessian part. Specific choices of the tuning variables give rise to different variants of the proposed general DQN method -- dubbed DQN-0, DQN-1 and DQN-2 -- which mutually trade-off communication and computational costs for convergence. Simulations demonstrate the effectiveness of the proposed DQN methods.
△ Less
Submitted 20 February, 2017; v1 submitted 5 September, 2015;
originally announced September 2015.
-
Distributed inference over directed networks: Performance limits and optimal design
Authors:
Dragana Bajović,
José M. F. Moura,
João Xavier,
Bruno Sinopoli
Abstract:
We find large deviations rates for consensus-based distributed inference for directed networks. When the topology is deterministic, we establish the large deviations principle and find exactly the corresponding rate function, equal at all nodes. We show that the dependence of the rate function on the stochastic weight matrix associated with the network is fully captured by its left eigenvector cor…
▽ More
We find large deviations rates for consensus-based distributed inference for directed networks. When the topology is deterministic, we establish the large deviations principle and find exactly the corresponding rate function, equal at all nodes. We show that the dependence of the rate function on the stochastic weight matrix associated with the network is fully captured by its left eigenvector corresponding to the unit eigenvalue. Further, when the sensors' observations are Gaussian, the rate function admits a closed form expression. Motivated by these observations, we formulate the optimal network design problem of finding the left eigenvector which achieves the highest value of the rate function, for a given target accuracy. This eigenvector therefore minimizes the time that the inference algorithm needs to reach the desired accuracy. For Gaussian observations, we show that the network design problem can be formulated as a semidefinite (convex) program, and hence can be solved efficiently. When observations are identically distributed across agents, the system exhibits an interesting property: the graph of the rate function always lies between the graphs of the rate function of an isolated node and the rate function of a fusion center that has access to all observations. We prove that this fundamental property holds even when the topology and the associated system matrices change randomly over time, with arbitrary distribution. Due to generality of its assumptions, the latter result requires more subtle techniques than the standard large deviations tools, contributing to the general theory of large deviations.
△ Less
Submitted 28 April, 2015;
originally announced April 2015.
-
Distributed Gradient Methods with Variable Number of Working Nodes
Authors:
Dusan Jakovetic,
Dragana Bajovic,
Natasa Krejic,
Natasa Krklec-Jerinkic
Abstract:
We consider distributed optimization where $N$ nodes in a connected network minimize the sum of their local costs subject to a common constraint set. We propose a distributed projected gradient method where each node, at each iteration $k$, performs an update (is active) with probability $p_k$, and stays idle (is inactive) with probability $1-p_k$. Whenever active, each node performs an update by…
▽ More
We consider distributed optimization where $N$ nodes in a connected network minimize the sum of their local costs subject to a common constraint set. We propose a distributed projected gradient method where each node, at each iteration $k$, performs an update (is active) with probability $p_k$, and stays idle (is inactive) with probability $1-p_k$. Whenever active, each node performs an update by weight-averaging its solution estimate with the estimates of its active neighbors, taking a negative gradient step with respect to its local cost, and performing a projection onto the constraint set; inactive nodes perform no updates. Assuming that nodes' local costs are strongly convex, with Lipschitz continuous gradients, we show that, as long as activation probability $p_k$ grows to one asymptotically, our algorithm converges in the mean square sense (MSS) to the same solution as the standard distributed gradient method, i.e., as if all the nodes were active at all iterations. Moreover, when $p_k$ grows to one linearly, with an appropriately set convergence factor, the algorithm has a linear MSS convergence, with practically the same factor as the standard distributed gradient method. Simulations on both synthetic and real world data sets demonstrate that, when compared with the standard distributed gradient method, the proposed algorithm significantly reduces the overall number of per-node communications and per-node gradient evaluations (computational cost) for the same required accuracy.
△ Less
Submitted 10 March, 2016; v1 submitted 15 April, 2015;
originally announced April 2015.
-
Distributed Storage Allocations for Neighborhood-based Data Access
Authors:
Dusan Jakovetic,
Aleksandar Minja,
Dragana Bajovic,
Dejan Vukobratovic
Abstract:
We introduce a neighborhood-based data access model for distributed coded storage allocation. Storage nodes are connected in a generic network and data is accessed locally: a user accesses a randomly chosen storage node, which subsequently queries its neighborhood to recover the data object. We aim at finding an optimal allocation that minimizes the overall storage budget while ensuring recovery w…
▽ More
We introduce a neighborhood-based data access model for distributed coded storage allocation. Storage nodes are connected in a generic network and data is accessed locally: a user accesses a randomly chosen storage node, which subsequently queries its neighborhood to recover the data object. We aim at finding an optimal allocation that minimizes the overall storage budget while ensuring recovery with probability one. We show that the problem reduces to finding the fractional dominating set of the underlying network. Furthermore, we develop a fully distributed algorithm where each storage node communicates only with its neighborhood in order to find its optimal storage allocation. The proposed algorithm is based upon the recently proposed proximal center method--an efficient dual decomposition based on accelerated dual gradient method. We show that our algorithm achieves a $(1+ε)$-approximation ratio in $O(d_{\mathrm{max}}^{3/2}/ε)$ iterations and per-node communications, where $d_{\mathrm{max}}$ is the maximal degree across nodes. Simulations demonstrate the effectiveness of the algorithm.
△ Less
Submitted 11 November, 2014;
originally announced November 2014.
-
Cooperative Slotted Aloha for Multi-Base Station Systems
Authors:
Dusan Jakovetic,
Dragana Bajovic,
Dejan Vukobratovic,
Vladimir Crnojevic
Abstract:
We introduce a framework to study slotted Aloha with cooperative base stations. Assuming a geographic-proximity communication model, we propose several decoding algorithmswith different degrees of base stations' cooperation (non-cooperative, spatial, temporal, and spatio-temporal). With spatial cooperation, neighboring base stations inform each other whenever they collect a user within their cover…
▽ More
We introduce a framework to study slotted Aloha with cooperative base stations. Assuming a geographic-proximity communication model, we propose several decoding algorithmswith different degrees of base stations' cooperation (non-cooperative, spatial, temporal, and spatio-temporal). With spatial cooperation, neighboring base stations inform each other whenever they collect a user within their coverage overlap; temporal cooperation corresponds to (temporal) successive interference cancellation done locally at each station. We analyze the four decoding algorithms and establish several fundamental results. With all algorithms, the peak throughput (average number of decoded users per slot, across all base stations) increases linearly with the number of base stations. Further, temporal and spatio-temporal cooperations exhibit a threshold behavior with respect to the normalized load (number of users per station, per slot). There exists a positive load $G^\star$, such that, below $G^\star$, the decoding probability is asymptotically maximal possible, equal the probability that a user is heard by at least one base station; with non-cooperative decoding and spatial cooperation, we show that $G^\star$ is zero. Finally, with spatio-temporal cooperation, we optimize the degree distribution according to which users transmit their packet replicas; the optimum is in general very different from the corresponding optimal distribution of the single-base station system.
△ Less
Submitted 29 January, 2015; v1 submitted 3 July, 2014;
originally announced July 2014.
-
Slotted Aloha for Networked Base Stations with Spatial and Temporal Diversity
Authors:
Dusan Jakovetic,
Dragana Bajovic,
Dejan Vukobratovic,
Vladimir Crnojevic
Abstract:
We consider framed slotted Aloha where $m$ base stations cooperate to decode messages from $n$ users. Users and base stations are placed uniformly at random over an area. At each frame, each user sends multiple replicas of its packet according to a prescribed distribution, and it is heard by all base stations within the communication radius $r$. Base stations employ a decoding algorithm that utili…
▽ More
We consider framed slotted Aloha where $m$ base stations cooperate to decode messages from $n$ users. Users and base stations are placed uniformly at random over an area. At each frame, each user sends multiple replicas of its packet according to a prescribed distribution, and it is heard by all base stations within the communication radius $r$. Base stations employ a decoding algorithm that utilizes the successive interference cancellation mechanism, both in space--across neighboring base stations, and in time--across different slots, locally at each base station. We show that there exists a threshold on the normalized load $G=n/(τm)$, where $τ$ is the number of slots per frame, below which decoding probability converges asymptotically (as $n,m,τ\rightarrow \infty$, $r\rightarrow 0$) to the maximal possible value--the probability that a user is heard by at least one base station, and we find a lower bound on the threshold. Further, we give a heuristic evaluation of the decoding probability based on the and-or-tree analysis. Finally, we show that the peak throughput increases linearly in the number of base stations.
△ Less
Submitted 27 January, 2014;
originally announced January 2014.
-
Slotted Aloha for Networked Base Stations
Authors:
Dragana Bajovic,
Dusan Jakovetic,
Dejan Vukobratovic,
Vladimir Crnojevic
Abstract:
We study multiple base station, multi-access systems in which the user-base station adjacency is induced by geographical proximity. At each slot, each user transmits (is active) with a certain probability, independently of other users, and is heard by all base stations within the distance $r$. Both the users and base stations are placed uniformly at random over the (unit) area. We first consider a…
▽ More
We study multiple base station, multi-access systems in which the user-base station adjacency is induced by geographical proximity. At each slot, each user transmits (is active) with a certain probability, independently of other users, and is heard by all base stations within the distance $r$. Both the users and base stations are placed uniformly at random over the (unit) area. We first consider a non-cooperative decoding where base stations work in isolation, but a user is decoded as soon as one of its nearby base stations reads a clean signal from it. We find the decoding probability and quantify the gains introduced by multiple base stations. Specifically, the peak throughput increases linearly with the number of base stations $m$ and is roughly $m/4$ larger than the throughput of a single-base station that uses standard slotted Aloha. Next, we propose a cooperative decoding, where the mutually close base stations inform each other whenever they decode a user inside their coverage overlap. At each base station, the messages received from the nearby stations help resolve collisions by the interference cancellation mechanism. Building from our exact formulas for the non-cooperative case, we provide a heuristic formula for the cooperative decoding probability that reflects well the actual performance. Finally, we demonstrate by simulation significant gains of cooperation with respect to the non-cooperative decoding.
△ Less
Submitted 27 January, 2014;
originally announced January 2014.
-
Consensus and Products of Random Stochastic Matrices: Exact Rate for Convergence in Probability
Authors:
Dragana Bajovic,
Joao Xavier,
Jose M. F. Moura,
Bruno Sinopoli
Abstract:
Distributed consensus and other linear systems with system stochastic matrices $W_k$ emerge in various settings, like opinion formation in social networks, rendezvous of robots, and distributed inference in sensor networks. The matrices $W_k$ are often random, due to, e.g., random packet dropouts in wireless sensor networks. Key in analyzing the performance of such systems is studying convergence…
▽ More
Distributed consensus and other linear systems with system stochastic matrices $W_k$ emerge in various settings, like opinion formation in social networks, rendezvous of robots, and distributed inference in sensor networks. The matrices $W_k$ are often random, due to, e.g., random packet dropouts in wireless sensor networks. Key in analyzing the performance of such systems is studying convergence of matrix products $W_kW_{k-1}... W_1$. In this paper, we find the exact exponential rate $I$ for the convergence in probability of the product of such matrices when time $k$ grows large, under the assumption that the $W_k$'s are symmetric and independent identically distributed in time. Further, for commonly used random models like with gossip and link failure, we show that the rate $I$ is found by solving a min-cut problem and, hence, easily computable. Finally, we apply our results to optimally allocate the sensors' transmission power in consensus+innovations distributed detection.
△ Less
Submitted 28 February, 2012;
originally announced February 2012.
-
Large Deviations Performance of Consensus+Innovations Distributed Detection with Non-Gaussian Observations
Authors:
Dragana Bajovic,
Dusan Jakovetic,
Jose M. F. Moura,
Joao Xavier,
Bruno Sinopoli
Abstract:
We establish the large deviations asymptotic performance (error exponent) of consensus+innovations distributed detection over random networks with generic (non-Gaussian) sensor observations. At each time instant, sensors 1) combine theirs with the decision variables of their neighbors (consensus) and 2) assimilate their new observations (innovations). This paper shows for general non-Gaussian dist…
▽ More
We establish the large deviations asymptotic performance (error exponent) of consensus+innovations distributed detection over random networks with generic (non-Gaussian) sensor observations. At each time instant, sensors 1) combine theirs with the decision variables of their neighbors (consensus) and 2) assimilate their new observations (innovations). This paper shows for general non-Gaussian distributions that consensus+innovations distributed detection exhibits a phase transition behavior with respect to the network degree of connectivity. Above a threshold, distributed is as good as centralized, with the same optimal asymptotic detection performance, but, below the threshold, distributed detection is suboptimal with respect to centralized detection. We determine this threshold and quantify the performance loss below threshold. Finally, we show the dependence of the threshold and performance on the distribution of the observations: distributed detectors over the same random network, but with different observations' distributions, for example, Gaussian, Laplace, or quantized, may have different asymptotic performance, even when the corresponding centralized detectors have the same asymptotic performance.
△ Less
Submitted 15 April, 2012; v1 submitted 19 November, 2011;
originally announced November 2011.
-
Distributed Detection over Random Networks: Large Deviations Performance Analysis
Authors:
Dragana Bajovic,
Dusan Jakovetic,
Joao Xavier,
Bruno Sinopoli,
Jose M. F. Moura
Abstract:
We study the large deviations performance, i.e., the exponential decay rate of the error probability, of distributed detection algorithms over random networks. At each time step $k$ each sensor: 1) averages its decision variable with the neighbors' decision variables; and 2) accounts on-the-fly for its new observation. We show that distributed detection exhibits a "phase change" behavior. When the…
▽ More
We study the large deviations performance, i.e., the exponential decay rate of the error probability, of distributed detection algorithms over random networks. At each time step $k$ each sensor: 1) averages its decision variable with the neighbors' decision variables; and 2) accounts on-the-fly for its new observation. We show that distributed detection exhibits a "phase change" behavior. When the rate of network information flow (the speed of averaging) is above a threshold, then distributed detection is asymptotically equivalent to the optimal centralized detection, i.e., the exponential decay rate of the error probability for distributed detection equals the Chernoff information. When the rate of information flow is below a threshold, distributed detection achieves only a fraction of the Chernoff information rate; we quantify this achievable rate as a function of the network rate of information flow. Simulation examples demonstrate our theoretical findings on the behavior of distributed detection over random networks.
△ Less
Submitted 21 December, 2010;
originally announced December 2010.
-
Sensor Selection for Event Detection in Wireless Sensor Networks
Authors:
Dragana Bajovic,
Bruno Sinopoli,
Joao Xavier
Abstract:
We consider the problem of sensor selection for event detection in wireless sensor networks (WSNs). We want to choose a subset of p out of n sensors that yields the best detection performance. As the sensor selection optimality criteria, we propose the Kullback-Leibler and Chernoff distances between the distributions of the selected measurements under the two hypothesis. We formulate the maxmin ro…
▽ More
We consider the problem of sensor selection for event detection in wireless sensor networks (WSNs). We want to choose a subset of p out of n sensors that yields the best detection performance. As the sensor selection optimality criteria, we propose the Kullback-Leibler and Chernoff distances between the distributions of the selected measurements under the two hypothesis. We formulate the maxmin robust sensor selection problem to cope with the uncertainties in distribution means. We prove that the sensor selection problem is NP hard, for both Kullback-Leibler and Chernoff criteria. To (sub)optimally solve the sensor selection problem, we propose an algorithm of affordable complexity. Extensive numerical simulations on moderate size problem instances (when the optimum by exhaustive search is feasible to compute) demonstrate the algorithm's near optimality in a very large portion of problem instances. For larger problems, extensive simulations demonstrate that our algorithm outperforms random searches, once an upper bound on computational time is set. We corroborate numerically the validity of the Kullback-Leibler and Chernoff sensor selection criteria, by showing that they lead to sensor selections nearly optimal both in the Neyman-Pearson and Bayes sense.
△ Less
Submitted 22 November, 2010;
originally announced November 2010.
-
Distributed Detection over Time Varying Networks: Large Deviations Analysis
Authors:
Dragana Bajovic,
Dusan Jakovetic,
Joao Xavier,
Bruno Sinopoli,
Jose M. F. Moura
Abstract:
We apply large deviations theory to study asymptotic performance of running consensus distributed detection in sensor networks. Running consensus is a stochastic approximation type algorithm, recently proposed. At each time step k, the state at each sensor is updated by a local averaging of the sensor's own state and the states of its neighbors (consensus) and by accounting for the new observation…
▽ More
We apply large deviations theory to study asymptotic performance of running consensus distributed detection in sensor networks. Running consensus is a stochastic approximation type algorithm, recently proposed. At each time step k, the state at each sensor is updated by a local averaging of the sensor's own state and the states of its neighbors (consensus) and by accounting for the new observations (innovation). We assume Gaussian, spatially correlated observations. We allow the underlying network be time varying, provided that the graph that collects the union of links that are online at least once over a finite time window is connected. This paper shows through large deviations that, under stated assumptions on the network connectivity and sensors' observations, the running consensus detection asymptotically approaches in performance the optimal centralized detection. That is, the Bayes probability of detection error (with the running consensus detector) decays exponentially to zero as k goes to infinity at the Chernoff information rate-the best achievable rate of the asymptotically optimal centralized detector.
△ Less
Submitted 25 October, 2010;
originally announced October 2010.
-
Distributed Detection over Random Networks: Large Deviations Analysis
Authors:
Dragana Bajovic,
Dusan Jakovetic,
Joao Xavier,
Bruno Sinopoli,
Jose M. F. Moura
Abstract:
We show by large deviations theory that the performance of running consensus is asymptotically equivalent to the performance of the (asymptotically) optimal centralized detector. Running consensus is a stochastic approximation type algorithm for distributed detection in sensor networks, recently proposed. At each time step, the state at each sensor is updated by a local averaging of its own state…
▽ More
We show by large deviations theory that the performance of running consensus is asymptotically equivalent to the performance of the (asymptotically) optimal centralized detector. Running consensus is a stochastic approximation type algorithm for distributed detection in sensor networks, recently proposed. At each time step, the state at each sensor is updated by a local averaging of its own state and the states of its neighbors (consensus) and by accounting for the new observations (innovation). We assume Gaussian, spatially correlated observations, and we allow for the underlying network to be randomly varying. This paper shows through large deviations that the Bayes probability of detection error, for the distributed detector, decays at the best achievable rate, namely, the Chernoff information rate. Numerical examples illustrate the behavior of the distributed detector for finite number of observations.
△ Less
Submitted 27 October, 2010; v1 submitted 22 July, 2010;
originally announced July 2010.