-
Fair Allocation in Crowd-Sourced Systems
Authors:
Mishal Assif P K,
William Kennedy,
Iraj Saniee
Abstract:
In this paper, we address the problem of fair sharing of the total value of a crowd-sourced network system between major participants (founders) and minor participants (crowd) using cooperative game theory. Shapley allocation is regarded as a fair way for computing the shares of all participants in a cooperative game when the values of all possible coalitions could be quantified. We define a class…
▽ More
In this paper, we address the problem of fair sharing of the total value of a crowd-sourced network system between major participants (founders) and minor participants (crowd) using cooperative game theory. Shapley allocation is regarded as a fair way for computing the shares of all participants in a cooperative game when the values of all possible coalitions could be quantified. We define a class of value functions for crowd-sourced systems which capture the contributions of the founders and the crowd plausibly and derive closed-form expressions for Shapley allocations to both. These value functions are defined for different scenarios, such as presence of oligopolies or geographic spread of the crowd, taking network effects, including Metcalfe's law, into account. A key result we obtain is that under quite general conditions, the crowd participants are collectively owed a share between $\frac{1}{2}$ to $\frac{2}{3}$ of the total value of the crowd-sourced system. We close with an empirical analysis demonstrating consistency of our results with the compensation offered to the crowd participants in some public internet content sharing companies.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Efficient Deep Learning of GMMs
Authors:
Shirin Jalali,
Carl Nuzman,
Iraj Saniee
Abstract:
We show that a collection of Gaussian mixture models (GMMs) in $R^{n}$ can be optimally classified using $O(n)$ neurons in a neural network with two hidden layers (deep neural network), whereas in contrast, a neural network with a single hidden layer (shallow neural network) would require at least $O(\exp(n))$ neurons or possibly exponentially large coefficients. Given the universality of the Gaus…
▽ More
We show that a collection of Gaussian mixture models (GMMs) in $R^{n}$ can be optimally classified using $O(n)$ neurons in a neural network with two hidden layers (deep neural network), whereas in contrast, a neural network with a single hidden layer (shallow neural network) would require at least $O(\exp(n))$ neurons or possibly exponentially large coefficients. Given the universality of the Gaussian distribution in the feature spaces of data, e.g., in speech, image and text, our result sheds light on the observed efficiency of deep neural networks in practical classification problems.
△ Less
Submitted 15 February, 2019;
originally announced February 2019.
-
Linear Time Clustering for High Dimensional Mixtures of Gaussian Clouds
Authors:
Dan Kushnir,
Shirin Jalali,
Iraj Saniee
Abstract:
Clustering mixtures of Gaussian distributions is a fundamental and challenging problem that is ubiquitous in various high-dimensional data processing tasks. While state-of-the-art work on learning Gaussian mixture models has focused primarily on improving separation bounds and their generalization to arbitrary classes of mixture models, less emphasis has been paid to practical computational effici…
▽ More
Clustering mixtures of Gaussian distributions is a fundamental and challenging problem that is ubiquitous in various high-dimensional data processing tasks. While state-of-the-art work on learning Gaussian mixture models has focused primarily on improving separation bounds and their generalization to arbitrary classes of mixture models, less emphasis has been paid to practical computational efficiency of the proposed solutions. In this paper, we propose a novel and highly efficient clustering algorithm for $n$ points drawn from a mixture of two arbitrary Gaussian distributions in $\mathbb{R}^p$. The algorithm involves performing random 1-dimensional projections until a direction is found that yields a user-specified clustering error $e$. For a 1-dimensional separation parameter $γ$ satisfying $γ=Q^{-1}(e)$, the expected number of such projections is shown to be bounded by $o(\ln p)$, when $γ$ satisfies $γ\leq c\sqrt{\ln{\ln{p}}}$, with $c$ as the separability parameter of the two Gaussians in $\mathbb{R}^p$. Consequently, the expected overall running time of the algorithm is linear in $n$ and quasi-linear in $p$ at $o(\ln{p})O(np)$, and the sample complexity is independent of $p$. This result stands in contrast to prior works which provide polynomial, with at-best quadratic, running time in $p$ and $n$. We show that our bound on the expected number of 1-dimensional projections extends to the case of three or more Gaussian components, and we present a generalization of our results to mixture distributions beyond the Gaussian model.
△ Less
Submitted 1 March, 2018; v1 submitted 19 December, 2017;
originally announced December 2017.
-
A New Family of Near-metrics for Universal Similarity
Authors:
Chu Wang,
Iraj Saniee,
William S. Kennedy,
Chris A. White
Abstract:
We propose a family of near-metrics based on local graph diffusion to capture similarity for a wide class of data sets. These quasi-metametrics, as their names suggest, dispense with one or two standard axioms of metric spaces, specifically distinguishability and symmetry, so that similarity between data points of arbitrary type and form could be measured broadly and effectively. The proposed near…
▽ More
We propose a family of near-metrics based on local graph diffusion to capture similarity for a wide class of data sets. These quasi-metametrics, as their names suggest, dispense with one or two standard axioms of metric spaces, specifically distinguishability and symmetry, so that similarity between data points of arbitrary type and form could be measured broadly and effectively. The proposed near-metric family includes the forward k-step diffusion and its reverse, typically on the graph consisting of data objects and their features. By construction, this family of near-metrics is particularly appropriate for categorical data, continuous data, and vector representations of images and text extracted via deep learning approaches. We conduct extensive experiments to evaluate the performance of this family of similarity measures and compare and contrast with traditional measures of similarity used for each specific application and with the ground truth when available. We show that for structured data including categorical and continuous data, the near-metrics corresponding to normalized forward k-step diffusion (k small) work as one of the best performing similarity measures; for vector representations of text and images including those extracted from deep learning, the near-metrics derived from normalized and reverse k-step graph diffusion (k very small) exhibit outstanding ability to distinguish data points from different classes.
△ Less
Submitted 17 October, 2017; v1 submitted 21 July, 2017;
originally announced July 2017.
-
Quantifying the Benefits of Infrastructure Sharing
Authors:
Matthew Andrews,
Milan Bradonjic,
Iraj Saniee
Abstract:
We analyze the benefits of network sharing between telecommunications operators. Sharing is seen as one way to speed the roll out of expensive technologies such as 5G since it allows the service providers to divide the cost of providing ubiquitous coverage. Our theoretical analysis focuses on scenarios with two service providers and compares the system dynamics when they are competing with the dyn…
▽ More
We analyze the benefits of network sharing between telecommunications operators. Sharing is seen as one way to speed the roll out of expensive technologies such as 5G since it allows the service providers to divide the cost of providing ubiquitous coverage. Our theoretical analysis focuses on scenarios with two service providers and compares the system dynamics when they are competing with the dynamics when they are cooperating. We show that sharing can be beneficial to a service provider even when it has the power to drive the other service provider out of the market, a byproduct of a non-convex cost function. A key element of this study is an analysis of the competitive equilibria for both cooperative and non-cooperative 2-person games in the presence of (non-convex) cost functions that involve a fixed cost component.
△ Less
Submitted 18 June, 2017;
originally announced June 2017.
-
Fast approximation algorithms for $p$-centres in large $δ$-hyperbolic graphs
Authors:
Katherine Edwards,
W. Sean Kennedy,
Iraj Saniee
Abstract:
We provide a quasilinear time algorithm for the $p$-center problem with an additive error less than or equal to 3 times the input graph's hyperbolic constant. Specifically, for the graph $G=(V,E)$ with $n$ vertices, $m$ edges and hyperbolic constant $δ$, we construct an algorithm for $p$-centers in time $O(p(δ+1)(n+m)\log(n))$ with radius not exceeding $r_p + δ$ when $p \leq 2$ and $r_p + 3δ$ when…
▽ More
We provide a quasilinear time algorithm for the $p$-center problem with an additive error less than or equal to 3 times the input graph's hyperbolic constant. Specifically, for the graph $G=(V,E)$ with $n$ vertices, $m$ edges and hyperbolic constant $δ$, we construct an algorithm for $p$-centers in time $O(p(δ+1)(n+m)\log(n))$ with radius not exceeding $r_p + δ$ when $p \leq 2$ and $r_p + 3δ$ when $p \geq 3$, where $r_p$ are the optimal radii. Prior work identified $p$-centers with accuracy $r_p+δ$ but with time complexity $O((n^3\log n + n^2m)\log(diam(G)))$ which is impractical for large graphs.
△ Less
Submitted 25 April, 2016;
originally announced April 2016.
-
A Geometric Distance Oracle for Large Real-World Graphs
Authors:
Deepak Ajwani,
W. Sean Kennedy,
Alessandra Sala,
Iraj Saniee
Abstract:
Many graph processing algorithms require determination of shortest-path distances between arbitrary numbers of node pairs. Since computation of exact distances between all node-pairs of a large graph, e.g., 10M nodes and up, is prohibitively expensive both in computational time and storage space, distance approximation is often used in place of exact computation. In this paper, we present a novel…
▽ More
Many graph processing algorithms require determination of shortest-path distances between arbitrary numbers of node pairs. Since computation of exact distances between all node-pairs of a large graph, e.g., 10M nodes and up, is prohibitively expensive both in computational time and storage space, distance approximation is often used in place of exact computation. In this paper, we present a novel and scalable distance oracle that leverages the hyperbolic core of real-world large graphs for fast and scalable distance approximation. We show empirically that the proposed oracle significantly outperforms prior oracles on a random set of test cases drawn from public domain graph libraries. There are two sets of prior work against which we benchmark our approach. The first set, which often outperforms other oracles, employs embedding of the graph into low dimensional Euclidean spaces with carefully constructed hyperbolic distances, but provides no guarantees on the distance estimation error. The second set leverages Gromov-type tree contraction of the graph with the additive error guaranteed not to exceed $2δ\log{n}$, where $δ$ is the hyperbolic constant of the graph. We show that our proposed oracle 1) is significantly faster than those oracles that use hyperbolic embedding (first set) with similar approximation error and, perhaps surprisingly, 2) exhibits substantially lower average estimation error compared to Gromov-like tree contractions (second set). We substantiate our claims through numerical computations on a collection of a dozen real world networks and synthetic test cases from multiple domains, ranging in size from 10s of thousand to 10s of millions of nodes.
△ Less
Submitted 19 April, 2014;
originally announced April 2014.
-
Bootstrap Percolation on Periodic Trees
Authors:
Milan Bradonjić,
Iraj Saniee
Abstract:
We study bootstrap percolation with the threshold parameter $θ\geq 2$ and the initial probability $p$ on infinite periodic trees that are defined as follows. Each node of a tree has degree selected from a finite predefined set of non-negative integers and starting from any node, all nodes at the same graph distance from it have the same degree. We show the existence of the critical threshold…
▽ More
We study bootstrap percolation with the threshold parameter $θ\geq 2$ and the initial probability $p$ on infinite periodic trees that are defined as follows. Each node of a tree has degree selected from a finite predefined set of non-negative integers and starting from any node, all nodes at the same graph distance from it have the same degree. We show the existence of the critical threshold $p_f(θ) \in (0,1)$ such that with high probability, (i) if $p > p_f(θ)$ then the periodic tree becomes fully active, while (ii) if $p < p_f(θ)$ then a periodic tree does not become fully active. We also derive a system of recurrence equations for the critical threshold $p_f(θ)$ and compute these numerically for a collection of periodic trees and various values of $θ$, thus extending previous results for regular (homogeneous) trees.
△ Less
Submitted 28 November, 2013;
originally announced November 2013.
-
Congestion Due to Random Walk Routing
Authors:
Onuttom Narayan,
Iraj Saniee,
Vladimir Marbukh
Abstract:
In this paper we derive an analytical expression for the mean load at each node of an arbitrary undirected graph for the uniform multicommodity flow problem under random walk routing. We show the mean load is linearly dependent on the nodal degree with a common multiplier equal to the sum of the inverses of the non-zero eigenvalue of the graph Laplacian. Even though some aspects of the mean load v…
▽ More
In this paper we derive an analytical expression for the mean load at each node of an arbitrary undirected graph for the uniform multicommodity flow problem under random walk routing. We show the mean load is linearly dependent on the nodal degree with a common multiplier equal to the sum of the inverses of the non-zero eigenvalue of the graph Laplacian. Even though some aspects of the mean load value, such as linear dependence on the nodal degree, are intuitive and may be derived from the equilibrium distribution of the random walk on the undirected graph, the exact expression for the mean load in terms of the full spectrum of the graph has not been known before. Using the explicit expression for the mean load, we give asymptotic estimates for the load on a variety of graphs whose spectral density are well known. We conclude with numerical computation of the mean load for other well-known graphs without known spectral densities.
△ Less
Submitted 31 August, 2013;
originally announced September 2013.
-
On the Hyperbolicity of Large-Scale Networks
Authors:
W. Sean Kennedy,
Onuttom Narayan,
Iraj Saniee
Abstract:
Through detailed analysis of scores of publicly available data sets corresponding to a wide range of large-scale networks, from communication and road networks to various forms of social networks, we explore a little-studied geometric characteristic of real-life networks, namely their hyperbolicity. In smooth geometry, hyperbolicity captures the notion of negative curvature; within the more abstra…
▽ More
Through detailed analysis of scores of publicly available data sets corresponding to a wide range of large-scale networks, from communication and road networks to various forms of social networks, we explore a little-studied geometric characteristic of real-life networks, namely their hyperbolicity. In smooth geometry, hyperbolicity captures the notion of negative curvature; within the more abstract context of metric spaces, it can be generalized as d-hyperbolicity. This generalized definition can be applied to graphs, which we explore in this report. We provide strong evidence that communication and social networks exhibit this fundamental property, and through extensive computations we quantify the degree of hyperbolicity of each network in comparison to its diameter. By contrast, and as evidence of the validity of the methodology, applying the same methods to the road networks shows that they are not hyperbolic, which is as expected. Finally, we present practical computational means for detection of hyperbolicity and show how the test itself may be scaled to much larger graphs than those we examined via renormalization group methodology. Using well-understood mechanisms, we provide evidence through synthetically generated graphs that hyperbolicity is preserved and indeed amplified by renormalization. This allows us to detect hyperbolicity in large networks efficiently, through much smaller renormalized versions. These observations indicate that d-hyperbolicity is a common feature of large-scale networks. We propose that d-hyperbolicity in conjunction with other local characteristics of networks, such as the degree distribution and clustering coefficients, provide a more complete unifying picture of networks, and helps classify in a parsimonious way what is otherwise a bewildering and complex array of features and characteristics specific to each natural and man-made network.
△ Less
Submitted 28 June, 2013;
originally announced July 2013.
-
Scaling of Congestion in Small World Networks
Authors:
Iraj Saniee,
Gabriel H. Tucci
Abstract:
In this report we show that in a planar exponentially growing network consisting of $N$ nodes, congestion scales as $O(N^2/\log(N))$ independently of how flows may be routed. This is in contrast to the $O(N^{3/2})$ scaling of congestion in a flat polynomially growing network. We also show that without the planarity condition, congestion in a small world network could scale as low as $O(N^{1+ε})$,…
▽ More
In this report we show that in a planar exponentially growing network consisting of $N$ nodes, congestion scales as $O(N^2/\log(N))$ independently of how flows may be routed. This is in contrast to the $O(N^{3/2})$ scaling of congestion in a flat polynomially growing network. We also show that without the planarity condition, congestion in a small world network could scale as low as $O(N^{1+ε})$, for arbitrarily small $ε$. These extreme results demonstrate that the small world property by itself cannot provide guidance on the level of congestion in a network and other characteristics are needed for better resolution. Finally, we investigate scaling of congestion under the geodesic flow, that is, when flows are routed on shortest paths based on a link metric. Here we prove that if the link weights are scaled by arbitrarily small or large multipliers then considerable changes in congestion may occur. However, if we constrain the link-weight multipliers to be bounded away from both zero and infinity, then variations in congestion due to such remetrization are negligible.
△ Less
Submitted 20 January, 2012;
originally announced January 2012.
-
Bootstrap Percolation on Random Geometric Graphs
Authors:
Milan Bradonjić,
Iraj Saniee
Abstract:
Bootstrap percolation has been used effectively to model phenomena as diverse as emergence of magnetism in materials, spread of infection, diffusion of software viruses in computer networks, adoption of new technologies, and emergence of collective action and cultural fads in human societies. It is defined on an (arbitrary) network of interacting agents whose state is determined by the state of th…
▽ More
Bootstrap percolation has been used effectively to model phenomena as diverse as emergence of magnetism in materials, spread of infection, diffusion of software viruses in computer networks, adoption of new technologies, and emergence of collective action and cultural fads in human societies. It is defined on an (arbitrary) network of interacting agents whose state is determined by the state of their neighbors according to a threshold rule. In a typical setting, bootstrap percolation starts by random and independent "activation" of nodes with a fixed probability $p$, followed by a deterministic process for additional activations based on the density of active nodes in each neighborhood ($θ$ activated nodes). Here, we study bootstrap percolation on random geometric graphs in the regime when the latter are (almost surely) connected. Random geometric graphs provide an appropriate model in settings where the neighborhood structure of each node is determined by geographical distance, as in wireless {\it ad hoc} and sensor networks as well as in contagion. We derive bounds on the critical thresholds $p_c', p_c"$ such that for all $p > p"_c(θ)$ full percolation takes place, whereas for $p < p'_c(θ)$ it does not. We conclude with simulations that compare numerical thresholds with those obtained analytically.
△ Less
Submitted 14 June, 2012; v1 submitted 13 January, 2012;
originally announced January 2012.
-
Spectral analysis of communication networks using Dirichlet eigenvalues
Authors:
Alexander Tsiatas,
Iraj Saniee,
Onuttom Narayan,
Matthew Andrews
Abstract:
The spectral gap of the graph Laplacian with Dirichlet boundary conditions is computed for the graphs of several communication networks at the IP-layer, which are subgraphs of the much larger global IP-layer network. We show that the Dirichlet spectral gap of these networks is substantially larger than the standard spectral gap and is likely to remain non-zero in the infinite graph limit. We first…
▽ More
The spectral gap of the graph Laplacian with Dirichlet boundary conditions is computed for the graphs of several communication networks at the IP-layer, which are subgraphs of the much larger global IP-layer network. We show that the Dirichlet spectral gap of these networks is substantially larger than the standard spectral gap and is likely to remain non-zero in the infinite graph limit. We first prove this result for finite regular trees, and show that the Dirichlet spectral gap in the infinite tree limit converges to the spectral gap of the infinite tree. We also perform Dirichlet spectral clustering on the IP-layer networks and show that it often yields cuts near the network core that create genuine single-component clusters. This is much better than traditional spectral clustering where several disjoint fragments near the periphery are liable to be misleadingly classified as a single cluster. Spectral clustering is often used to identify bottlenecks or congestion; since congestion in these networks is known to peak at the core, our results suggest that Dirichlet spectral clustering may be better at finding bona-fide bottlenecks.
△ Less
Submitted 7 May, 2012; v1 submitted 17 February, 2011;
originally announced February 2011.