-
Towards singular optimality in the presence of local initial knowledge
Authors:
Hongyan Ji,
Sriram V. Pemmaraju
Abstract:
The Knowledge Till rho CONGEST model is a variant of the classical CONGEST model of distributed computing in which each vertex v has initial knowledge of the radius-rho ball centered at v. The most commonly studied variants of the CONGEST model are KT0 CONGEST in which nodes initially know nothing about their neighbors and KT1 CONGEST in which nodes initially know the IDs of all their neighbors. I…
▽ More
The Knowledge Till rho CONGEST model is a variant of the classical CONGEST model of distributed computing in which each vertex v has initial knowledge of the radius-rho ball centered at v. The most commonly studied variants of the CONGEST model are KT0 CONGEST in which nodes initially know nothing about their neighbors and KT1 CONGEST in which nodes initially know the IDs of all their neighbors. It has been shown that having access to neighbors' IDs (as in the KT1 CONGEST model) can substantially reduce the message complexity of algorithms for fundamental problems such as BROADCAST and MST. For example, King, Kutten, and Thorup (PODC 2015) show how to construct an MST using just Otilde(n) messages in the KT1 CONGEST model, whereas there is an Omega(m) message lower bound for MST in the KT0 CONGEST model. Building on this result, Gmyr and Pandurangen (DISC 2018) present a family of distributed randomized algorithms for various global problems that exhibit a trade-off between message and round complexity. These algorithms are based on constructing a sparse, spanning subgraph called a danner. Specifically, given a graph G and any delta in [0,1], their algorithm constructs (with high probability) a danner that has diameter Otilde(D + n^{1-delta}) and Otilde(min{m,n^{1+delta}}) edges in Otilde(n^{1-delta}) rounds while using Otilde(min{m,n^{1+δ}}) messages, where n, m, and D are the number of nodes, edges, and the diameter of G, respectively. In the main result of this paper, we show that if we assume the KT2 CONGEST model, it is possible to substantially improve the time-message trade-off in constructing a danner. Specifically, we show in the KT2 CONGEST model, how to construct a danner that has diameter Otilde(D + n^{1-2delta}) and Otilde(min{m,n^{1+delta}}) edges in Otilde(n^{1-2delta}) rounds while using Otilde(min{m,n^{1+δ}}) messages for any delta in [0,1/2].
△ Less
Submitted 22 February, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
The Message Complexity of Distributed Graph Optimization
Authors:
Fabien Dufoulon,
Shreyas Pai,
Gopal Pandurangan,
Sriram V. Pemmaraju,
Peter Robinson
Abstract:
The message complexity of a distributed algorithm is the total number of messages sent by all nodes over the course of the algorithm. This paper studies the message complexity of distributed algorithms for fundamental graph optimization problems. We focus on four classical graph optimization problems: Maximum Matching (MaxM), Minimum Vertex Cover (MVC), Minimum Dominating Set (MDS), and Maximum In…
▽ More
The message complexity of a distributed algorithm is the total number of messages sent by all nodes over the course of the algorithm. This paper studies the message complexity of distributed algorithms for fundamental graph optimization problems. We focus on four classical graph optimization problems: Maximum Matching (MaxM), Minimum Vertex Cover (MVC), Minimum Dominating Set (MDS), and Maximum Independent Set (MaxIS). In the sequential setting, these problems are representative of a wide spectrum of hardness of approximation. While there has been some progress in understanding the round complexity of distributed algorithms (for both exact and approximate versions) for these problems, much less is known about their message complexity and its relation with the quality of approximation. We almost fully quantify the message complexity of distributed graph optimization by showing the following results...[see paper for full abstract]
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
Dynamic Healthcare Embeddings for Improving Patient Care
Authors:
Hankyu Jang,
Sulyun Lee,
D. M. Hasibul Hasan,
Philip M. Polgreen,
Sriram V. Pemmaraju,
Bijaya Adhikari
Abstract:
As hospitals move towards automating and integrating their computing systems, more fine-grained hospital operations data are becoming available. These data include hospital architectural drawings, logs of interactions between patients and healthcare professionals, prescription data, procedures data, and data on patient admission, discharge, and transfers. This has opened up many fascinating avenue…
▽ More
As hospitals move towards automating and integrating their computing systems, more fine-grained hospital operations data are becoming available. These data include hospital architectural drawings, logs of interactions between patients and healthcare professionals, prescription data, procedures data, and data on patient admission, discharge, and transfers. This has opened up many fascinating avenues for healthcare-related prediction tasks for improving patient care. However, in order to leverage off-the-shelf machine learning software for these tasks, one needs to learn structured representations of entities involved from heterogeneous, dynamic data streams. Here, we propose DECENT, an auto-encoding heterogeneous co-evolving dynamic neural network, for learning heterogeneous dynamic embeddings of patients, doctors, rooms, and medications from diverse data streams. These embeddings capture similarities among doctors, rooms, patients, and medications based on static attributes and dynamic interactions. DECENT enables several applications in healthcare prediction, such as predicting mortality risk and case severity of patients, adverse events (e.g., transfer back into an intensive care unit), and future healthcare-associated infections. The results of using the learned patient embeddings in predictive modeling show that DECENT has a gain of up to 48.1% on the mortality risk prediction task, 12.6% on the case severity prediction task, 6.4% on the medical intensive care unit transfer task, and 3.8% on the Clostridioides difficile (C.diff) Infection (CDI) prediction task over the state-of-the-art baselines. In addition, case studies on the learned doctor, medication, and room embeddings show that our approach learns meaningful and interpretable embeddings.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Exact Distributed Sampling
Authors:
Sriram V. Pemmaraju,
Joshua Z. Sobel
Abstract:
Fast distributed algorithms that output a feasible solution for constraint satisfaction problems, such as maximal independent sets, have been heavily studied. There has been much less research on distributed sampling problems, where one wants to sample from a distribution over all feasible solutions (e.g., uniformly sampling a feasible solution). Recent work (Feng, Sun, Yin PODC 2017; Fischer and…
▽ More
Fast distributed algorithms that output a feasible solution for constraint satisfaction problems, such as maximal independent sets, have been heavily studied. There has been much less research on distributed sampling problems, where one wants to sample from a distribution over all feasible solutions (e.g., uniformly sampling a feasible solution). Recent work (Feng, Sun, Yin PODC 2017; Fischer and Ghaffari DISC 2018; Feng, Hayes, and Yin arXiv 2018) has shown that for some constraint satisfaction problems there are distributed Markov chains that mix in $O(\log n)$ rounds in the classical LOCAL model of distributed computation. However, these methods return samples from a distribution close to the desired distribution, but with some small amount of error. In this paper, we focus on the problem of exact distributed sampling. Our main contribution is to show that these distributed Markov chains in tandem with techniques from the sequential setting, namely coupling from the past and bounding chains, can be used to design $O(\log n)$-round LOCAL model exact sampling algorithms for a class of weighted local constraint satisfaction problems. This general result leads to $O(\log n)$-round exact sampling algorithms that use small messages (i.e., run in the CONGEST model) and polynomial-time local computation for some important special cases, such as sampling weighted independent sets (aka the hardcore model) and weighted dominating sets.
△ Less
Submitted 5 March, 2023;
originally announced March 2023.
-
Deterministic Massively Parallel Algorithms for Ruling Sets
Authors:
Shreyas Pai,
Sriram V. Pemmaraju
Abstract:
In this paper we present a deterministic $O(\log\log n)$-round algorithm for the 2-ruling set problem in the Massively Parallel Computation model with $\tilde{O}(n)$ memory; this algorithm also runs in $O(\log\log n)$ rounds in the Congested Clique model. This is exponentially faster than the fastest known deterministic 2-ruling set algorithm for these models, which is simply the $O(\log Δ)$-round…
▽ More
In this paper we present a deterministic $O(\log\log n)$-round algorithm for the 2-ruling set problem in the Massively Parallel Computation model with $\tilde{O}(n)$ memory; this algorithm also runs in $O(\log\log n)$ rounds in the Congested Clique model. This is exponentially faster than the fastest known deterministic 2-ruling set algorithm for these models, which is simply the $O(\log Δ)$-round deterministic Maximal Independent Set algorithm due to Czumaj, Davies, and Parter (SPAA 2020). Our result is obtained by derandomizing the 2-ruling set algorithm of Kothapalli and Pemmaraju (FSTTCS 2012).
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
Can We Break Symmetry with o(m) Communication?
Authors:
Shreyas Pai,
Gopal Pandurangan,
Sriram V. Pemmaraju,
Peter Robinson
Abstract:
We study the communication cost (or message complexity) of fundamental distributed symmetry breaking problems, namely, coloring and MIS. While significant progress has been made in understanding and improving the running time of such problems, much less is known about the message complexity of these problems. In fact, all known algorithms need at least $Ω(m)$ communication for these problems, wher…
▽ More
We study the communication cost (or message complexity) of fundamental distributed symmetry breaking problems, namely, coloring and MIS. While significant progress has been made in understanding and improving the running time of such problems, much less is known about the message complexity of these problems. In fact, all known algorithms need at least $Ω(m)$ communication for these problems, where $m$ is the number of edges in the graph. We address the following question in this paper: can we solve problems such as coloring and MIS using sublinear, i.e., $o(m)$ communication, and if so under what conditions? [See full abstract in pdf]
△ Less
Submitted 19 May, 2021;
originally announced May 2021.
-
Modeling and Evaluation of Clustering Patient Care into Bubbles
Authors:
D. M. Hasibul Hasan,
Alex Rohwer,
Hankyu Jang,
Ted Herman,
Philip M. Polgreen,
Daniel K. Sewell,
Bijaya Adhikari,
Sriram V. Pemmaraju
Abstract:
COVID-19 has caused an enormous burden on healthcare facilities around the world. Cohorting patients and healthcare professionals (HCPs) into "bubbles" has been proposed as an infection-control mechanism. In this paper, we present a novel and flexible model for clustering patient care in healthcare facilities into bubbles in order to minimize infection spread. Our model aims to control a variety o…
▽ More
COVID-19 has caused an enormous burden on healthcare facilities around the world. Cohorting patients and healthcare professionals (HCPs) into "bubbles" has been proposed as an infection-control mechanism. In this paper, we present a novel and flexible model for clustering patient care in healthcare facilities into bubbles in order to minimize infection spread. Our model aims to control a variety of costs to patients/residents and HCPs so as to avoid hidden, downstream adverse effects of clustering patient care. This model leads to a discrete optimization problem that we call the BubbleClustering problem. This problem takes as input a temporal visit graph, representing HCP mobility, including visits by HCPs to patient/resident rooms. The output of the problem is a rewired visit graph, obtained by partitioning HCPs and patient rooms into bubbles and rewiring HCP visits to patient rooms so that patient-care is largely confined to the constructed bubbles. Even though the BubbleClustering problem is intractable in general, we present an integer linear programming (ILP) formulation of the problem that can be solved optimally for problem instances that arise from typical hospital units and long-term-care facilities. We call our overall solution approach Cost-aware Rewiring of Networks (CoRN). We evaluate CoRN using fine-grained-movement data from a hospital-medical-intensive-care unit as well as two long-term-care facilities. These data were obtained using sensor systems we built and deployed. The main takeaway from our experimental results is that it is possible to use CoRN to substantially reduce infection spread by cohorting patients and HCPs without sacrificing patient-care, and with minimal excess costs to HCPs in terms of time and distances traveled during a shift.
△ Less
Submitted 13 May, 2021;
originally announced May 2021.
-
The Complexity of Symmetry Breaking in Massive Graphs
Authors:
Christian Konrad,
Sriram V. Pemmaraju,
Talal Riaz,
Peter Robinson
Abstract:
The goal of this paper is to understand the complexity of symmetry breaking problems, specifically maximal independent set (MIS) and the closely related $β$-ruling set problem, in two computational models suited for large-scale graph processing, namely the $k$-machine model and the graph streaming model. We present a number of results. For MIS in the $k$-machine model, we improve the…
▽ More
The goal of this paper is to understand the complexity of symmetry breaking problems, specifically maximal independent set (MIS) and the closely related $β$-ruling set problem, in two computational models suited for large-scale graph processing, namely the $k$-machine model and the graph streaming model. We present a number of results. For MIS in the $k$-machine model, we improve the $\tilde{O}(m/k^2 + Δ/k)$-round upper bound of Klauck et al. (SODA 2015) by presenting an $\tilde{O}(m/k^2)$-round algorithm. We also present an $\tildeΩ(n/k^2)$ round lower bound for MIS, the first lower bound for a symmetry breaking problem in the $k$-machine model. For $β$-ruling sets, we use hierarchical sampling to obtain more efficient algorithms in the $k$-machine model and also in the graph streaming model. More specifically, we obtain a $k$-machine algorithm that runs in $\tilde{O}(βnΔ^{1/β}/k^2)$ rounds and, by using a similar hierarchical sampling technique, we obtain one-pass algorithms for both insertion-only and insertion-deletion streams that use $O(β\cdot n^{1+1/2^{β-1}})$ space. The latter result establishes a clear separation between MIS, which is known to require $Ω(n^2)$ space (Cormode et al., ICALP 2019), and $β$-ruling sets, even for $β= 2$. Finally, we present an even faster 2-ruling set algorithm in the $k$-machine model, one that runs in $\tilde{O}(n/k^{2-ε} + k^{1-ε})$ rounds for any $ε$, $0 \le ε\le 1$.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
Sample-and-Gather: Fast Ruling Set Algorithms in the Low-Memory MPC Model
Authors:
Kishore Kothapalli,
Shreyas Pai,
Sriram V. Pemmaraju
Abstract:
Motivated by recent progress on symmetry breaking problems such as maximal independent set (MIS) and maximal matching in the low-memory Massively Parallel Computation (MPC) model (e.g., Behnezhad et al.~PODC 2019; Ghaffari-Uitto SODA 2019), we investigate the complexity of ruling set problems in this model. The MPC model has become very popular as a model for large-scale distributed computing and…
▽ More
Motivated by recent progress on symmetry breaking problems such as maximal independent set (MIS) and maximal matching in the low-memory Massively Parallel Computation (MPC) model (e.g., Behnezhad et al.~PODC 2019; Ghaffari-Uitto SODA 2019), we investigate the complexity of ruling set problems in this model. The MPC model has become very popular as a model for large-scale distributed computing and it comes with the constraint that the memory-per-machine is strongly sublinear in the input size. For graph problems, extremely fast MPC algorithms have been designed assuming $\tildeΩ(n)$ memory-per-machine, where $n$ is the number of nodes in the graph (e.g., the $O(\log\log n)$ MIS algorithm of Ghaffari et al., PODC 2018). However, it has proven much more difficult to design fast MPC algorithms for graph problems in the low-memory MPC model, where the memory-per-machine is restricted to being strongly sublinear in the number of nodes, i.e., $O(n^\eps)$ for $0 < \eps < 1$.
In this paper, we present an algorithm for the 2-ruling set problem, running in $\tilde{O}(\log^{1/6} Δ)$ rounds whp, in the low-memory MPC model. We then extend this result to $β$-ruling sets for any integer $β> 1$. Specifically, we show that a $β$-ruling set can be computed in the low-memory MPC model with $O(n^\eps)$ memory-per-machine in $\tilde{O}(β\cdot \log^{1/(2^{β+1}-2)} Δ)$ rounds, whp. From this it immediately follows that a $β$-ruling set for $β= Ω(\log\log\log Δ)$-ruling set can be computed in in just $O(β\log\log n)$ rounds whp. The above results assume a total memory of $\tilde{O}(m + n^{1+\eps})$. We also present algorithms for $β$-ruling sets in the low-memory MPC model assuming that the total memory over all machines is restricted to $\tilde{O}(m)$.
△ Less
Submitted 25 September, 2020;
originally announced September 2020.
-
Distributed Approximation on Power Graphs
Authors:
Reuven Bar-Yehuda,
Keren Censor-Hillel,
Yannic Maus,
Shreyas Pai,
Sriram V. Pemmaraju
Abstract:
We investigate graph problems in the following setting: we are given a graph $G$ and we are required to solve a problem on $G^2$. While we focus mostly on exploring this theme in the distributed CONGEST model, we show new results and surprising connections to the centralized model of computation. In the CONGEST model, it is natural to expect that problems on $G^2$ would be quite difficult to solve…
▽ More
We investigate graph problems in the following setting: we are given a graph $G$ and we are required to solve a problem on $G^2$. While we focus mostly on exploring this theme in the distributed CONGEST model, we show new results and surprising connections to the centralized model of computation. In the CONGEST model, it is natural to expect that problems on $G^2$ would be quite difficult to solve efficiently on $G$, due to congestion. However, we show that the picture is both more complicated and more interesting.
Specifically, we encounter two phenomena acting in opposing directions: (i) slowdown due to congestion and (ii) speedup due to structural properties of $G^2$.
We demonstrate these two phenomena via two fundamental graph problems, namely, Minimum Vertex Cover (MVC) and Minimum Dominating Set (MDS). Among our many contributions, the highlights are the following.
- In the CONGEST model, we show an $O(n/ε)$-round $(1+ε)$-approximation algorithm for MVC on $G^2$, while no $o(n^2)$-round algorithm is known for any better-than-2 approximation for MVC on $G$.
- We show a centralized polynomial time $5/3$-approximation algorithm for MVC on $G^2$, whereas a better-than-2 approximation is UGC-hard for $G$.
- In contrast, for MDS, in the CONGEST model, we show an $\tildeΩ(n^2)$ lower bound for a constant approximation factor for MDS on $G^2$, whereas an $Ω(n^2)$ lower bound for MDS on $G$ is known only for exact computation.
In addition to these highlighted results, we prove a number of other results in the distributed CONGEST model including an $\tildeΩ(n^2)$ lower bound for computing an exact solution to MVC on $G^2$, a conditional hardness result for obtaining a $(1+ε)$-approximation to MVC on $G^2$, and an $O(\log Δ)$-approximation to the MDS problem on $G^2$ in $\mbox{poly}\log n$ rounds.
△ Less
Submitted 5 June, 2020;
originally announced June 2020.
-
Connectivity Lower Bounds in Broadcast Congested Clique
Authors:
Shreyas Pai,
Sriram V. Pemmaraju
Abstract:
We prove three new lower bounds for graph connectivity in the $1$-bit broadcast congested clique model, BCC$(1)$. First, in the KT-$0$ version of BCC$(1)$, in which nodes are aware of neighbors only through port numbers, we show an $Ω(\log n)$ round lower bound for CONNECTIVITY even for constant-error randomized Monte Carlo algorithms. The deterministic version of this result can be obtained via t…
▽ More
We prove three new lower bounds for graph connectivity in the $1$-bit broadcast congested clique model, BCC$(1)$. First, in the KT-$0$ version of BCC$(1)$, in which nodes are aware of neighbors only through port numbers, we show an $Ω(\log n)$ round lower bound for CONNECTIVITY even for constant-error randomized Monte Carlo algorithms. The deterministic version of this result can be obtained via the well-known "edge-crossing" argument, but, the randomized version of this result requires establishing new combinatorial results regarding the indistinguishability graph induced by inputs. In our second result, we show that the $Ω(\log n)$ lower bound result extends to the KT-$1$ version of the BCC$(1)$ model, in which nodes are aware of IDs of all neighbors, though our proof works only for deterministic algorithms. Since nodes know IDs of their neighbors in the KT-$1$ model, it is no longer possible to play "edge-crossing" tricks; instead we present a reduction from the 2-party communication complexity problem PARTITION in which Alice and Bob are give two set partitions on $[n]$ and are required to determine if the join of these two set partitions equals the trivial one-part set partition. While our KT-$1$ CONNECTIVITY lower bound holds only for deterministic algorithms, in our third result we extend this $Ω(\log n)$ KT-1 lower bound to constant-error Monte Carlo algorithms for the closely related CONNECTED COMPONENTS problem. We use information-theoretic techniques to obtain this result. All our results hold for the seemingly easy special case of CONNECTIVITY in which an algorithm has to distinguish an instance with one cycle from an instance with multiple cycles. Our results showcase three rather different lower bound techniques and lay the groundwork for further improvements in lower bounds for CONNECTIVITY in the BCC$(1)$ model.
△ Less
Submitted 22 May, 2019;
originally announced May 2019.
-
Large-Scale Distributed Algorithms for Facility Location with Outliers
Authors:
Tanmay Inamdar,
Shreyas Pai,
Sriram V. Pemmaraju
Abstract:
This paper presents fast, distributed, $O(1)$-approximation algorithms for metric facility location problems with outliers in the Congested Clique model, Massively Parallel Computation (MPC) model, and in the $k$-machine model. The paper considers Robust Facility Location and Facility Location with Penalties, two versions of the facility location problem with outliers proposed by Charikar et al. (…
▽ More
This paper presents fast, distributed, $O(1)$-approximation algorithms for metric facility location problems with outliers in the Congested Clique model, Massively Parallel Computation (MPC) model, and in the $k$-machine model. The paper considers Robust Facility Location and Facility Location with Penalties, two versions of the facility location problem with outliers proposed by Charikar et al. (SODA 2001). The paper also considers two alternatives for specifying the input: the input metric can be provided explicitly (as an $n \times n$ matrix distributed among the machines) or implicitly as the shortest path metric of a given edge-weighted graph. The results in the paper are:
- Implicit metric: For both problems, $O(1)$-approximation algorithms running in $O(\mbox{poly}(\log n))$ rounds in the Congested Clique and the MPC model and $O(1)$-approximation algorithms running in $\tilde{O}(n/k)$ rounds in the $k$-machine model.
- Explicit metric: For both problems, $O(1)$-approximation algorithms running in $O(\log\log\log n)$ rounds in the Congested Clique and the MPC model and $O(1)$-approximation algorithms running in $\tilde{O}(n/k)$ rounds in the $k$-machine model.
Our main contribution is to show the existence of Mettu-Plaxton-style $O(1)$-approximation algorithms for both Facility Location with outlier problems. As shown in our previous work (Berns et al., ICALP 2012, Bandyapadhyay et al., ICDCN 2018) Mettu-Plaxton style algorithms are more easily amenable to being implemented efficiently in distributed and large-scale models of computation.
△ Less
Submitted 15 November, 2018;
originally announced November 2018.
-
Near-Optimal Clustering in the $k$-machine model
Authors:
Sayan Bandyapadhyay,
Tanmay Inamdar,
Shreyas Pai,
Sriram V. Pemmaraju
Abstract:
The clustering problem, in its many variants, has numerous applications in operations research and computer science (e.g., in applications in bioinformatics, image processing, social network analysis, etc.). As sizes of data sets have grown rapidly, researchers have focused on designing algorithms for clustering problems in models of computation suited for large-scale computation such as MapReduce…
▽ More
The clustering problem, in its many variants, has numerous applications in operations research and computer science (e.g., in applications in bioinformatics, image processing, social network analysis, etc.). As sizes of data sets have grown rapidly, researchers have focused on designing algorithms for clustering problems in models of computation suited for large-scale computation such as MapReduce, Pregel, and streaming models. The $k$-machine model (Klauck et al., SODA 2015) is a simple, message-passing model for large-scale distributed graph processing. This paper considers three of the most prominent examples of clustering problems: the uncapacitated facility location problem, the $p$-median problem, and the $p$-center problem and presents $O(1)$-factor approximation algorithms for these problems running in $\tilde{O}(n/k)$ rounds in the $k$-machine model. These algorithms are optimal up to polylogarithmic factors because this paper also shows $\tildeΩ(n/k)$ lower bounds for obtaining polynomial-factor approximation algorithms for these problems. These are the first results for clustering problems in the $k$-machine model.
We assume that the metric provided as input for these clustering problems in only implicitly provided, as an edge-weighted graph and in a nutshell, our main technical contribution is to show that constant-factor approximation algorithms for all three clustering problems can be obtained by learning only a small portion of the input metric.
△ Less
Submitted 23 October, 2017;
originally announced October 2017.
-
Benchmark Results and Theoretical Treatments for Valence-to-Core X-ray Emission Spectroscopy in Transition Metal Compounds
Authors:
D. R. Mortensen,
G. T. Seidler,
J. J. Kas,
Niran** Govind,
C. P. Schwartz,
Sri Pemmaraju,
D. G. Prendergrast
Abstract:
We report measurement of the valence-to-core (VTC) region of the K-shell x-ray emission spectra from several Zn and Fe inorganic compounds, and their critical comparison with several existing theoretical treatments. We find generally good agreement between the respective theories and experiment, and in particular find an important admixture of dipole and quadrupole character for Zn materials that…
▽ More
We report measurement of the valence-to-core (VTC) region of the K-shell x-ray emission spectra from several Zn and Fe inorganic compounds, and their critical comparison with several existing theoretical treatments. We find generally good agreement between the respective theories and experiment, and in particular find an important admixture of dipole and quadrupole character for Zn materials that is much weaker in Fe-based systems. These results on materials whose simple crystal structures should not, a prior, pose deep challenges to theory, will prove useful in guiding the further development of DFT and time-dependent DFT methods for VTC-XES predictions and their comparison to experiment.
△ Less
Submitted 27 June, 2017;
originally announced June 2017.
-
Symmetry Breaking in the Congest Model: Time- and Message-Efficient Algorithms for Ruling Sets
Authors:
Shreyas Pai,
Gopal Pandurangan,
Sriram V. Pemmaraju,
Talal Riaz,
Peter Robinson
Abstract:
We study local symmetry breaking problems in the CONGEST model, focusing on ruling set problems, which generalize the fundamental Maximal Independent Set (MIS) problem. A $β$-ruling set is an independent set such that every node in the graph is at most $β$ hops from a node in the independent set. Our work is motivated by the following central question: can we break the $Θ(\log n)$ time complexity…
▽ More
We study local symmetry breaking problems in the CONGEST model, focusing on ruling set problems, which generalize the fundamental Maximal Independent Set (MIS) problem. A $β$-ruling set is an independent set such that every node in the graph is at most $β$ hops from a node in the independent set. Our work is motivated by the following central question: can we break the $Θ(\log n)$ time complexity barrier and the $Θ(m)$ message complexity barrier in the CONGEST model for MIS or closely-related symmetry breaking problems? We present the following results:
- Time Complexity: We show that we can break the $O(\log n)$ "barrier" for 2- and 3-ruling sets. We compute 3-ruling sets in $O\left(\frac{\log n}{\log \log n}\right)$ rounds with high probability (whp). More generally we show that 2-ruling sets can be computed in $O\left(\log Δ\cdot (\log n)^{1/2 + \varepsilon} + \frac{\log n}{\log\log n}\right)$ rounds for any $\varepsilon > 0$, which is $o(\log n)$ for a wide range of $Δ$ values (e.g., $Δ= 2^{(\log n)^{1/2-\varepsilon}}$). These are the first 2- and 3-ruling set algorithms to improve over the $O(\log n)$-round complexity of Luby's algorithm in the CONGEST model.
- Message Complexity: We show an $Ω(n^2)$ lower bound on the message complexity of computing an MIS (i.e., 1-ruling set) which holds also for randomized algorithms and present a contrast to this by showing a randomized algorithm for 2-ruling sets that, whp, uses only $O(n \log^2 n)$ messages and runs in $O(Δ\log n)$ rounds. This is the first message-efficient algorithm known for ruling sets, which has message complexity nearly linear in $n$ (which is optimal up to a polylogarithmic factor).
△ Less
Submitted 22 May, 2017;
originally announced May 2017.
-
Super-fast MST Algorithms in the Congested Clique using $o(m)$ Messages
Authors:
Sriram V. Pemmaraju,
Vivek B. Sardeshmukh
Abstract:
In a sequence of recent results (PODC 2015 and PODC 2016), the running time of the fastest algorithm for the \emph{minimum spanning tree (MST)} problem in the \emph{Congested Clique} model was first improved to $O(\log \log \log n)$ from $O(\log \log n)$ (Hegeman et al., PODC 2015) and then to $O(\log^* n)$ (Ghaffari and Parter, PODC 2016). All of these algorithms use $Θ(n^2)$ messages independent…
▽ More
In a sequence of recent results (PODC 2015 and PODC 2016), the running time of the fastest algorithm for the \emph{minimum spanning tree (MST)} problem in the \emph{Congested Clique} model was first improved to $O(\log \log \log n)$ from $O(\log \log n)$ (Hegeman et al., PODC 2015) and then to $O(\log^* n)$ (Ghaffari and Parter, PODC 2016). All of these algorithms use $Θ(n^2)$ messages independent of the number of edges in the input graph.
This paper positively answers a question raised in Hegeman et al., and presents the first "super-fast" MST algorithm with $o(m)$ message complexity for input graphs with $m$ edges. Specifically, we present an algorithm running in $O(\log^* n)$ rounds, with message complexity $\tilde{O}(\sqrt{m \cdot n})$ and then build on this algorithm to derive a family of algorithms, containing for any $\varepsilon$, $0 < \varepsilon \le 1$, an algorithm running in $O(\log^* n/\varepsilon)$ rounds, using $\tilde{O}(n^{1 + \varepsilon}/\varepsilon)$ messages. Setting $\varepsilon = \log\log n/\log n$ leads to the first sub-logarithmic round Congested Clique MST algorithm that uses only $\tilde{O}(n)$ messages.
Our primary tools in achieving these results are (i) a component-wise bound on the number of candidates for MST edges, extending the sampling lemma of Karger, Klein, and Tarjan (Karger, Klein, and Tarjan, JACM 1995) and (ii) $Θ(\log n)$-wise-independent linear graph sketches (Cormode and Firmani, Dist.~Par.~Databases, 2014) for generating MST candidate edges.
△ Less
Submitted 17 October, 2016; v1 submitted 12 October, 2016;
originally announced October 2016.
-
Using Read-$k$ Inequalities to Analyze a Distributed MIS Algorithm
Authors:
Sriram Pemmaraju,
Talal Riaz
Abstract:
Until recently, the fastest distributed MIS algorithm, even for simple graphs, e.g., unoriented trees has been the simple randomized algorithm discovered the 80s. This algorithm (commonly called Luby's algorithm) computes an MIS in $O(\log n)$ rounds (with high probability). This situation changed when Lenzen and Wattenhofer (PODC 2011) presented a randomized $O(\sqrt{\log n}\cdot \log\log n)$-rou…
▽ More
Until recently, the fastest distributed MIS algorithm, even for simple graphs, e.g., unoriented trees has been the simple randomized algorithm discovered the 80s. This algorithm (commonly called Luby's algorithm) computes an MIS in $O(\log n)$ rounds (with high probability). This situation changed when Lenzen and Wattenhofer (PODC 2011) presented a randomized $O(\sqrt{\log n}\cdot \log\log n)$-round MIS algorithm for unoriented trees. This algorithm was improved by Barenboim et al. (FOCS 2012), resulting in an $O(\sqrt{\log n \cdot \log\log n})$-round MIS algorithm.
The analyses of these tree MIS algorithms depends on "near independence" of probabilistic events, a feature of the tree structure of the network. In their paper, Lenzen and Wattenhofer hope that their algorithm and analysis could be extended to graphs with bounded arboricity. We show how to do this. By using a new tail inequality for read-k families of random variables due to Gavinsky et al. (Random Struct Algorithms, 2015), we show how to deal with dependencies induced by the recent tree MIS algorithms when they are executed on bounded arboricity graphs. Specifically, we analyze a version of the tree MIS algorithm of Barenboim et al. and show that it runs in $O(\mbox{poly}(α) \cdot \sqrt{\log n \cdot \log\log n})$ rounds in the $\mathcal{CONGEST}$ model for graphs with arboricity $α$.
While the main thrust of this paper is the new probabilistic analysis via read-$k$ inequalities, for small values of $α$, this algorithm is faster than the bounded arboricity MIS algorithm of Barenboim et al. We also note that recently (SODA 2016), Gaffari presented a novel MIS algorithm for general graphs that runs in $O(\log Δ) + 2^{O(\sqrt{\log\log n})}$ rounds; a corollary of this algorithm is an $O(\log α+ \sqrt{\log n})$-round MIS algorithm on arboricity-$α$ graphs.
△ Less
Submitted 20 May, 2016;
originally announced May 2016.
-
Minimum-weight Spanning Tree Construction in $O(\log \log \log n)$ Rounds on the Congested Clique
Authors:
Sriram V. Pemmaraju,
Vivek B. Sardeshmukh
Abstract:
This paper considers the \textit{minimum spanning tree (MST)} problem in the Congested Clique model and presents an algorithm that runs in $O(\log \log \log n)$ rounds, with high probability. Prior to this, the fastest MST algorithm in this model was a deterministic algorithm due to Lotker et al.~(SIAM J on Comp, 2005) from about a decade ago. A key step along the way to designing this MST algorit…
▽ More
This paper considers the \textit{minimum spanning tree (MST)} problem in the Congested Clique model and presents an algorithm that runs in $O(\log \log \log n)$ rounds, with high probability. Prior to this, the fastest MST algorithm in this model was a deterministic algorithm due to Lotker et al.~(SIAM J on Comp, 2005) from about a decade ago. A key step along the way to designing this MST algorithm is a \textit{connectivity verification} algorithm that not only runs in $O(\log \log \log n)$ rounds with high probability, but also has low message complexity. This allows the fast computation of an MST by running multiple instances of the connectivity verification algorithm in parallel. These results depend on a new edge-sampling theorem, developed in the paper, that says that if each edge $e = \{u, v\}$ is sampled independently with probability $c \log^2 n/\min\{\mbox{degree}(u), \mbox{degree}(v)\}$ (for a large enough constant $c$) then all cuts of size at least $n$ are approximated in the sampled graph. This sampling theorem is inspired by series of papers on graph sparsification via random edge sampling due to Karger~(STOC 1994), Benczúr and Karger~(STOC 1996, arxiv 2002), and Fung et al.~(STOC 2011). The edge sampling techniques in these papers use probabilities that are functions of edge-connectivity or a related measure called edge-strength. For the purposes of this paper, these edge-connectivity measures seem too costly to compute and the main technical contribution of this paper is to show that degree-based edge-sampling suffices to approximate large cuts.
△ Less
Submitted 7 December, 2014;
originally announced December 2014.
-
Near-Constant-Time Distributed Algorithms on a Congested Clique
Authors:
James W. Hegeman,
Sriram V. Pemmaraju,
Vivek B. Sardeshmukh
Abstract:
This paper presents constant-time and near-constant-time distributed algorithms for a variety of problems in the congested clique model. We show how to compute a 3-ruling set in expected $O(\log \log \log n)$ rounds and using this, we obtain a constant-approximation to metric facility location, also in expected $O(\log \log \log n)$ rounds. In addition, assuming an input metric space of constant d…
▽ More
This paper presents constant-time and near-constant-time distributed algorithms for a variety of problems in the congested clique model. We show how to compute a 3-ruling set in expected $O(\log \log \log n)$ rounds and using this, we obtain a constant-approximation to metric facility location, also in expected $O(\log \log \log n)$ rounds. In addition, assuming an input metric space of constant doubling dimension, we obtain constant-round algorithms to compute constant-factor approximations to the minimum spanning tree and the metric facility location problems. These results significantly improve on the running time of the fastest known algorithms for these problems in the congested clique setting.
△ Less
Submitted 10 September, 2018; v1 submitted 9 August, 2014;
originally announced August 2014.
-
Lessons from the Congested Clique Applied to MapReduce
Authors:
James W. Hegeman,
Sriram V. Pemmaraju
Abstract:
The main results of this paper are (I) a simulation algorithm which, under quite general constraints, transforms algorithms running on the Congested Clique into algorithms running in the MapReduce model, and (II) a distributed $O(Δ)$-coloring algorithm running on the Congested Clique which has an expected running time of (i) $O(1)$ rounds, if $Δ\geq Θ(\log^4 n)$; and (ii) $O(\log \log n)$ rounds o…
▽ More
The main results of this paper are (I) a simulation algorithm which, under quite general constraints, transforms algorithms running on the Congested Clique into algorithms running in the MapReduce model, and (II) a distributed $O(Δ)$-coloring algorithm running on the Congested Clique which has an expected running time of (i) $O(1)$ rounds, if $Δ\geq Θ(\log^4 n)$; and (ii) $O(\log \log n)$ rounds otherwise. Applying the simulation theorem to the Congested-Clique $O(Δ)$-coloring algorithm yields an $O(1)$-round $O(Δ)$-coloring algorithm in the MapReduce model.
Our simulation algorithm illustrates a natural correspondence between per-node bandwidth in the Congested Clique model and memory per machine in the MapReduce model. In the Congested Clique (and more generally, any network in the $\mathcal{CONGEST}$ model), the major impediment to constructing fast algorithms is the $O(\log n)$ restriction on message sizes. Similarly, in the MapReduce model, the combined restrictions on memory per machine and total system memory have a dominant effect on algorithm design. In showing a fairly general simulation algorithm, we highlight the similarities and differences between these models.
△ Less
Submitted 19 June, 2014; v1 submitted 17 May, 2014;
originally announced May 2014.
-
A Super-Fast Distributed Algorithm for Bipartite Metric Facility Location
Authors:
James Hegeman,
Sriram V. Pemmaraju
Abstract:
The \textit{facility location} problem consists of a set of \textit{facilities} $\mathcal{F}$, a set of \textit{clients} $\mathcal{C}$, an \textit{opening cost} $f_i$ associated with each facility $x_i$, and a \textit{connection cost} $D(x_i,y_j)$ between each facility $x_i$ and client $y_j$. The goal is to find a subset of facilities to \textit{open}, and to connect each client to an open facilit…
▽ More
The \textit{facility location} problem consists of a set of \textit{facilities} $\mathcal{F}$, a set of \textit{clients} $\mathcal{C}$, an \textit{opening cost} $f_i$ associated with each facility $x_i$, and a \textit{connection cost} $D(x_i,y_j)$ between each facility $x_i$ and client $y_j$. The goal is to find a subset of facilities to \textit{open}, and to connect each client to an open facility, so as to minimize the total facility opening costs plus connection costs. This paper presents the first expected-sub-logarithmic-round distributed O(1)-approximation algorithm in the $\mathcal{CONGEST}$ model for the \textit{metric} facility location problem on the complete bipartite network with parts $\mathcal{F}$ and $\mathcal{C}$. Our algorithm has an expected running time of $O((\log \log n)^3)$ rounds, where $n = |\mathcal{F}| + |\mathcal{C}|$. This result can be viewed as a continuation of our recent work (ICALP 2012) in which we presented the first sub-logarithmic-round distributed O(1)-approximation algorithm for metric facility location on a \textit{clique} network. The bipartite setting presents several new challenges not present in the problem on a clique network. We present two new techniques to overcome these challenges. (i) In order to deal with the problem of not being able to choose appropriate probabilities (due to lack of adequate knowledge), we design an algorithm that performs a random walk over a probability space and analyze the progress our algorithm makes as the random walk proceeds. (ii) In order to deal with a problem of quickly disseminating a collection of messages, possibly containing many duplicates, over the bipartite network, we design a probabilistic hashing scheme that delivers all of the messages in expected-$O(\log \log n)$ rounds.
△ Less
Submitted 12 August, 2013;
originally announced August 2013.
-
Super-Fast Distributed Algorithms for Metric Facility Location
Authors:
Andrew Berns,
James Hegeman,
Sriram V. Pemmaraju
Abstract:
This paper presents a distributed O(1)-approximation algorithm, with expected-$O(\log \log n)$ running time, in the $\mathcal{CONGEST}$ model for the metric facility location problem on a size-$n$ clique network. Though metric facility location has been considered by a number of researchers in low-diameter settings, this is the first sub-logarithmic-round algorithm for the problem that yields an O…
▽ More
This paper presents a distributed O(1)-approximation algorithm, with expected-$O(\log \log n)$ running time, in the $\mathcal{CONGEST}$ model for the metric facility location problem on a size-$n$ clique network. Though metric facility location has been considered by a number of researchers in low-diameter settings, this is the first sub-logarithmic-round algorithm for the problem that yields an O(1)-approximation in the setting of non-uniform facility opening costs. In order to obtain this result, our paper makes three main technical contributions. First, we show a new lower bound for metric facility location, extending the lower bound of Bădoiu et al. (ICALP 2005) that applies only to the special case of uniform facility opening costs. Next, we demonstrate a reduction of the distributed metric facility location problem to the problem of computing an O(1)-ruling set of an appropriate spanning subgraph. Finally, we present a sub-logarithmic-round (in expectation) algorithm for computing a 2-ruling set in a spanning subgraph of a clique. Our algorithm accomplishes this by using a combination of randomized and deterministic sparsification.
△ Less
Submitted 12 August, 2013;
originally announced August 2013.
-
On the Analysis of a Label Propagation Algorithm for Community Detection
Authors:
Kishore Kothapalli,
Sriram V. Pemmaraju,
Vivek Sardeshmukh
Abstract:
This paper initiates formal analysis of a simple, distributed algorithm for community detection on networks. We analyze an algorithm that we call \textsc{Max-LPA}, both in terms of its convergence time and in terms of the "quality" of the communities detected. \textsc{Max-LPA} is an instance of a class of community detection algorithms called \textit{label propagation} algorithms. As far as we kno…
▽ More
This paper initiates formal analysis of a simple, distributed algorithm for community detection on networks. We analyze an algorithm that we call \textsc{Max-LPA}, both in terms of its convergence time and in terms of the "quality" of the communities detected. \textsc{Max-LPA} is an instance of a class of community detection algorithms called \textit{label propagation} algorithms. As far as we know, most analysis of label propagation algorithms thus far has been empirical in nature and in this paper we seek a theoretical understanding of label propagation algorithms. In our main result, we define a clustered version of \er random graphs with clusters $V_1, V_2,..., V_k$ where the probability $p$, of an edge connecting nodes within a cluster $V_i$ is higher than $p'$, the probability of an edge connecting nodes in distinct clusters. We show that even with fairly general restrictions on $p$ and $p'$ ($p = Ω(\frac{1}{n^{1/4-ε}})$ for any $ε> 0$, $p' = O(p^2)$, where $n$ is the number of nodes), \textsc{Max-LPA} detects the clusters $V_1, V_2,..., V_n$ in just two rounds. Based on this and on empirical results, we conjecture that \textsc{Max-LPA} can correctly and quickly identify communities on clustered \er graphs even when the clusters are much sparser, i.e., with $p = \frac{c\log n}{n}$ for some $c > 1$.
△ Less
Submitted 13 October, 2012;
originally announced October 2012.
-
Super-Fast 3-Ruling Sets
Authors:
Kishore Kothapalli,
Sriram Pemmaraju
Abstract:
A $t$-ruling set of a graph $G = (V, E)$ is a vertex-subset $S \subseteq V$ that is independent and satisfies the property that every vertex $v \in V$ is at a distance of at most $t$ from some vertex in $S$. A \textit{maximal independent set (MIS)} is a 1-ruling set. The problem of computing an MIS on a network is a fundamental problem in distributed algorithms and the fastest algorithm for this p…
▽ More
A $t$-ruling set of a graph $G = (V, E)$ is a vertex-subset $S \subseteq V$ that is independent and satisfies the property that every vertex $v \in V$ is at a distance of at most $t$ from some vertex in $S$. A \textit{maximal independent set (MIS)} is a 1-ruling set. The problem of computing an MIS on a network is a fundamental problem in distributed algorithms and the fastest algorithm for this problem is the $O(\log n)$-round algorithm due to Luby (SICOMP 1986) and Alon et al. (J. Algorithms 1986) from more than 25 years ago. Since then the problem has resisted all efforts to yield to a sub-logarithmic algorithm. There has been recent progress on this problem, most importantly an $O(\log Δ\cdot \sqrt{\log n})$-round algorithm on graphs with $n$ vertices and maximum degree $Δ$, due to Barenboim et al. (Barenboim, Elkin, Pettie, and Schneider, April 2012, arxiv 1202.1983; to appear FOCS 2012).
We approach the MIS problem from a different angle and ask if O(1)-ruling sets can be computed much more efficiently than an MIS? As an answer to this question, we show how to compute a 2-ruling set of an $n$-vertex graph in $O((\log n)^{3/4})$ rounds. We also show that the above result can be improved for special classes of graphs such as graphs with high girth, trees, and graphs of bounded arboricity.
Our main technique involves randomized sparsification that rapidly reduces the graph degree while ensuring that every deleted vertex is close to some vertex that remains. This technique may have further applications in other contexts, e.g., in designing sub-logarithmic distributed approximation algorithms. Our results raise intriguing questions about how quickly an MIS (or 1-ruling sets) can be computed, given that 2-ruling sets can be computed in sub-logarithmic rounds.
△ Less
Submitted 12 July, 2012;
originally announced July 2012.
-
Localized Spanners for Wireless Networks
Authors:
Mirela Damian,
Sriram V. Pemmaraju
Abstract:
We present a new efficient localized algorithm to construct, for any given quasi-unit disk graph G=(V,E) and any e > 0, a (1+e)-spanner for G of maximum degree O(1) and total weight O(w(MST)), where w(MST) denotes the weight of a minimum spanning tree for V. We further show that similar localized techniques can be used to construct, for a given unit disk graph G = (V, E), a planar Cdel(1+e)(1+pi…
▽ More
We present a new efficient localized algorithm to construct, for any given quasi-unit disk graph G=(V,E) and any e > 0, a (1+e)-spanner for G of maximum degree O(1) and total weight O(w(MST)), where w(MST) denotes the weight of a minimum spanning tree for V. We further show that similar localized techniques can be used to construct, for a given unit disk graph G = (V, E), a planar Cdel(1+e)(1+pi/2)-spanner for G of maximum degree O(1) and total weight O(w(MST)). Here Cdel denotes the stretch factor of the unit Delaunay triangulation for V. Both constructions can be completed in O(1) communication rounds, and require each node to know its own coordinates.
△ Less
Submitted 25 June, 2008;
originally announced June 2008.
-
Local Approximation Schemes for Topology Control
Authors:
Mirela Damian,
Saurav Pandit,
Sriram Pemmaraju
Abstract:
This paper presents a distributed algorithm on wireless ad-hoc networks that runs in polylogarithmic number of rounds in the size of the network and constructs a linear size, lightweight, (1+ε)-spanner for any given ε> 0. A wireless network is modeled by a d-dimensional α-quasi unit ball graph (α-UBG), which is a higher dimensional generalization of the standard unit disk graph (UDG) model. The…
▽ More
This paper presents a distributed algorithm on wireless ad-hoc networks that runs in polylogarithmic number of rounds in the size of the network and constructs a linear size, lightweight, (1+ε)-spanner for any given ε> 0. A wireless network is modeled by a d-dimensional α-quasi unit ball graph (α-UBG), which is a higher dimensional generalization of the standard unit disk graph (UDG) model. The d-dimensional α-UBG model goes beyond the unrealistic ``flat world'' assumption of UDGs and also takes into account transmission errors, fading signal strength, and physical obstructions. The main result in the paper is this: for any fixed ε> 0, 0 < α\le 1, and d \ge 2, there is a distributed algorithm running in O(\log n \log^* n) communication rounds on an n-node, d-dimensional α-UBG G that computes a (1+ε)-spanner G' of G with maximum degree Δ(G') = O(1) and total weight w(G') = O(w(MST(G)). This result is motivated by the topology control problem in wireless ad-hoc networks and improves on existing topology control algorithms along several dimensions. The technical contributions of the paper include a new, sequential, greedy algorithm with relaxed edge ordering and lazy updating, and clustering techniques for filtering out unnecessary edges.
△ Less
Submitted 14 March, 2008;
originally announced March 2008.
-
On the existence of the self map v_2^9 on the Smith-Toda complex V(1) at the prime 3
Authors:
Mark Behrens,
Satya Pemmaraju
Abstract:
Let V(1) be the Smith-Toda complex at the prime 3. We prove that there exists a map v_2^9: Σ^{144}V(1) \to V(1) that is a K(2) equivalence. This map is used to construct various v_2-periodic infinite families in the 3-primary stable homotopy groups of spheres.
Let V(1) be the Smith-Toda complex at the prime 3. We prove that there exists a map v_2^9: Σ^{144}V(1) \to V(1) that is a K(2) equivalence. This map is used to construct various v_2-periodic infinite families in the 3-primary stable homotopy groups of spheres.
△ Less
Submitted 18 March, 2003;
originally announced March 2003.