Search | arXiv e-print repository

Contrastive explainable clustering with differential privacy

Authors: Dung Nguyen, Ariel Vetzler, Sarit Kraus, Anil Vullikanti

Abstract: This paper presents a novel approach in Explainable AI (XAI), integrating contrastive explanations with differential privacy in clustering methods. For several basic clustering problems, including $k$-median and $k$-means, we give efficient differential private contrastive explanations that achieve essentially the same explanations as those that non-private clustering explanations can obtain. We d… ▽ More This paper presents a novel approach in Explainable AI (XAI), integrating contrastive explanations with differential privacy in clustering methods. For several basic clustering problems, including $k$-median and $k$-means, we give efficient differential private contrastive explanations that achieve essentially the same explanations as those that non-private clustering explanations can obtain. We define contrastive explanations as the utility difference between the original clustering utility and utility from clustering with a specifically fixed centroid. In each contrastive scenario, we designate a specific data point as the fixed centroid position, enabling us to measure the impact of this constraint on clustering utility under differential privacy. Extensive experiments across various datasets show our method's effectiveness in providing meaningful explanations without significantly compromising data privacy or clustering utility. This underscores our contribution to privacy-aware machine learning, demonstrating the feasibility of achieving a balance between privacy and utility in the explanation of clustering tasks. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.02644 [pdf, ps, other]

Differentially private exact recovery for stochastic block models

Authors: Dung Nguyen, Anil Vullikanti

Abstract: Stochastic block models (SBMs) are a very commonly studied network model for community detection algorithms. In the standard form of an SBM, the $n$ vertices (or nodes) of a graph are generally divided into multiple pre-determined communities (or clusters). Connections between pairs of vertices are generated randomly and independently with pre-defined probabilities, which depend on the communities… ▽ More Stochastic block models (SBMs) are a very commonly studied network model for community detection algorithms. In the standard form of an SBM, the $n$ vertices (or nodes) of a graph are generally divided into multiple pre-determined communities (or clusters). Connections between pairs of vertices are generated randomly and independently with pre-defined probabilities, which depend on the communities containing the two nodes. A fundamental problem in SBMs is the recovery of the community structure, and sharp information-theoretic bounds are known for recoverability for many versions of SBMs. Our focus here is the recoverability problem in SBMs when the network is private. Under the edge differential privacy model, we derive conditions for exact recoverability in three different versions of SBMs, namely Asymmetric SBM (when communities have non-uniform sizes), General Structure SBM (with outliers), and Censored SBM (with edge features). Our private algorithms have polynomial running time w.r.t. the input graph's size, and match the recovery thresholds of the non-private setting when $ε\rightarrow\infty$. In contrast, the previous best results for recoverability in SBMs only hold for the symmetric case (equal size communities), and run in quasi-polynomial time, or in polynomial time with recovery thresholds being tight up to some constants from the non-private settings. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by ICML 2024

arXiv:2405.06884 [pdf, other]

Efficient PAC Learnability of Dynamical Systems Over Multilayer Networks

Authors: Zirou Qiu, Abhi** Adiga, Madhav V. Marathe, S. S. Ravi, Daniel J. Rosenkrantz, Richard E. Stearns, Anil Vullikanti

Abstract: Networked dynamical systems are widely used as formal models of real-world cascading phenomena, such as the spread of diseases and information. Prior research has addressed the problem of learning the behavior of an unknown dynamical system when the underlying network has a single layer. In this work, we study the learnability of dynamical systems over multilayer networks, which are more realistic… ▽ More Networked dynamical systems are widely used as formal models of real-world cascading phenomena, such as the spread of diseases and information. Prior research has addressed the problem of learning the behavior of an unknown dynamical system when the underlying network has a single layer. In this work, we study the learnability of dynamical systems over multilayer networks, which are more realistic and challenging. First, we present an efficient PAC learning algorithm with provable guarantees to show that the learner only requires a small number of training examples to infer an unknown system. We further provide a tight analysis of the Natarajan dimension which measures the model complexity. Asymptotically, our bound on the Nararajan dimension is tight for almost all multilayer graphs. The techniques and insights from our work provide the theoretical foundations for future investigations of learning problems for multilayer dynamical systems. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: Accepted at ICML 2024

arXiv:2404.01101 [pdf, other]

UFID: A Unified Framework for Input-level Backdoor Detection on Diffusion Models

Authors: Zihan Guan, Mengxuan Hu, Sheng Li, Anil Vullikanti

Abstract: Diffusion Models are vulnerable to backdoor attacks, where malicious attackers inject backdoors by poisoning some parts of the training samples during the training stage. This poses a serious threat to the downstream users, who query the diffusion models through the API or directly download them from the internet. To mitigate the threat of backdoor attacks, there have been a plethora of investigat… ▽ More Diffusion Models are vulnerable to backdoor attacks, where malicious attackers inject backdoors by poisoning some parts of the training samples during the training stage. This poses a serious threat to the downstream users, who query the diffusion models through the API or directly download them from the internet. To mitigate the threat of backdoor attacks, there have been a plethora of investigations on backdoor detections. However, none of them designed a specialized backdoor detection method for diffusion models, rendering the area much under-explored. Moreover, these prior methods mainly focus on the traditional neural networks in the classification task, which cannot be adapted to the backdoor detections on the generative task easily. Additionally, most of the prior methods require white-box access to model weights and architectures, or the probability logits as additional information, which are not always practical. In this paper, we propose a Unified Framework for Input-level backdoor Detection (UFID) on the diffusion models, which is motivated by observations in the diffusion models and further validated with a theoretical causality analysis. Extensive experiments across different datasets on both conditional and unconditional diffusion models show that our method achieves a superb performance on detection effectiveness and run-time efficiency. The code is available at https://github.com/GuanZihan/official_UFID. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 20 pages,18 figures

arXiv:2402.11686 [pdf, other]

Learning the Topology and Behavior of Discrete Dynamical Systems

Authors: Zirou Qiu, Abhi** Adiga, Madhav V. Marathe, S. S. Ravi, Daniel J. Rosenkrantz, Richard E. Stearns, Anil Vullikanti

Abstract: Discrete dynamical systems are commonly used to model the spread of contagions on real-world networks. Under the PAC framework, existing research has studied the problem of learning the behavior of a system, assuming that the underlying network is known. In this work, we focus on a more challenging setting: to learn both the behavior and the underlying topology of a black-box system. We show that,… ▽ More Discrete dynamical systems are commonly used to model the spread of contagions on real-world networks. Under the PAC framework, existing research has studied the problem of learning the behavior of a system, assuming that the underlying network is known. In this work, we focus on a more challenging setting: to learn both the behavior and the underlying topology of a black-box system. We show that, in general, this learning problem is computationally intractable. On the positive side, we present efficient learning methods under the PAC model when the underlying graph of the dynamical system belongs to some classes. Further, we examine a relaxed setting where the topology of an unknown system is partially observed. For this case, we develop an efficient PAC learner to infer the system and establish the sample complexity. Lastly, we present a formal analysis of the expressive power of the hypothesis class of dynamical systems where both the topology and behavior are unknown, using the well-known formalism of the Natarajan dimension. Our results provide a theoretical foundation for learning both the behavior and topology of discrete dynamical systems. △ Less

Submitted 29 March, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

Comments: Accepted at AAAI-24

arXiv:2402.06576 [pdf, other]

Value-based Resource Matching with Fairness Criteria: Application to Agricultural Water Trading

Authors: Abhi** Adiga, Yohai Trabelsi, Tanvir Ferdousi, Madhav Marathe, S. S. Ravi, Samarth Swarup, Anil Kumar Vullikanti, Mandy L. Wilson, Sarit Kraus, Reetwika Basu, Supriya Savalkar, Matthew Yourek, Michael Brady, Kirti Rajagopalan, Jonathan Yoder

Abstract: Optimal allocation of agricultural water in the event of droughts is an important global problem. In addressing this problem, many aspects, including the welfare of farmers, the economy, and the environment, must be considered. Under this backdrop, our work focuses on several resource-matching problems accounting for agents with multi-crop portfolios, geographic constraints, and fairness. First, w… ▽ More Optimal allocation of agricultural water in the event of droughts is an important global problem. In addressing this problem, many aspects, including the welfare of farmers, the economy, and the environment, must be considered. Under this backdrop, our work focuses on several resource-matching problems accounting for agents with multi-crop portfolios, geographic constraints, and fairness. First, we address a matching problem where the goal is to maximize a welfare function in two-sided markets where buyers' requirements and sellers' supplies are represented by value functions that assign prices (or costs) to specified volumes of water. For the setting where the value functions satisfy certain monotonicity properties, we present an efficient algorithm that maximizes a social welfare function. When there are minimum water requirement constraints, we present a randomized algorithm which ensures that the constraints are satisfied in expectation. For a single seller--multiple buyers setting with fairness constraints, we design an efficient algorithm that maximizes the minimum level of satisfaction of any buyer. We also present computational complexity results that highlight the limits on the generalizability of our results. We evaluate the algorithms developed in our work with experiments on both real-world and synthetic data sets with respect to drought severity, value functions, and seniority of agents. △ Less

Submitted 11 February, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

arXiv:2401.04371 [pdf, other]

Strategic Routing and Scheduling for Evacuations

Authors: Kazi Ashik Islam, Da Qi Chen, Madhav Marathe, Henning Mortveit, Samarth Swarup, Anil Vullikanti

Abstract: Evacuation planning is an essential part of disaster management where the goal is to relocate people under imminent danger to safety. Although government authorities may prescribe routes and a schedule, evacuees generally behave as self-interested agents and may choose their action according to their own selfish interests. It is crucial to understand the degree of inefficiency this can cause to th… ▽ More Evacuation planning is an essential part of disaster management where the goal is to relocate people under imminent danger to safety. Although government authorities may prescribe routes and a schedule, evacuees generally behave as self-interested agents and may choose their action according to their own selfish interests. It is crucial to understand the degree of inefficiency this can cause to the evacuation process. However, existing research has mainly focused on selfish routing, i.e., they consider route selection as the only strategic action. In this paper, we present a strategic routing and scheduling game, named the Evacuation Planning Game (EPG), where evacuees choose both their route and the time of departure. We focus on confluent evacuation plans, where, if two routes meet at a node then their remaining portion is identical. We also use dynamic flows to model the time-varying traffic on roads during evacuation. We show that every instance of EPG has at least one pure strategy Nash equilibrium. We then present a polynomial time algorithm, the Sequential Action Algorithm (SAA), for finding equilibria in a given instance. Additionally, we provide bounds on how bad an equilibrium state can be compared to a socially optimal state. Finally, We use Harris County of Houston, Texas as our study area and construct a game instance for it. Our results show that, by utilizing SAA, we can efficiently find equilibria in this instance that have social objective close to the optimal value. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2311.02349 [pdf, other]

Sample Complexity of Opinion Formation on Networks

Authors: Haolin Liu, Rajmohan Rajaraman, Ravi Sundaram, Anil Vullikanti, Omer Wasim, Haifeng Xu

Abstract: Consider public health officials aiming to spread awareness about a new vaccine in a community interconnected by a social network. How can they distribute information with minimal resources, ensuring community-wide understanding that aligns with the actual facts? This concern mirrors numerous real-world situations. In this paper, we initialize the study of sample complexity in opinion formation to… ▽ More Consider public health officials aiming to spread awareness about a new vaccine in a community interconnected by a social network. How can they distribute information with minimal resources, ensuring community-wide understanding that aligns with the actual facts? This concern mirrors numerous real-world situations. In this paper, we initialize the study of sample complexity in opinion formation to solve this problem. Our model is built on the recognized opinion formation game, where we regard each agent's opinion as a data-derived model parameter, not just a real number as in prior studies. Such an extension offers a wider understanding of opinion formation and ties closely with federated learning. Through this formulation, we characterize the sample complexity bounds for any network and also show asymptotically tight bounds for specific network structures. Intriguingly, we discover optimal strategies often allocate samples inversely to the degree, hinting at vital policy implications. Our findings are empirically validated on both synthesized and real-world networks. △ Less

Submitted 4 November, 2023; originally announced November 2023.

arXiv:2307.08237 [pdf, other]

doi 10.1145/3580305.3599763

A Look into Causal Effects under Entangled Treatment in Graphs: Investigating the Impact of Contact on MRSA Infection

Authors: **g Ma, Chen Chen, Anil Vullikanti, Ritwick Mishra, Gregory Madden, Daniel Borrajo, Jundong Li

Abstract: Methicillin-resistant Staphylococcus aureus (MRSA) is a type of bacteria resistant to certain antibiotics, making it difficult to prevent MRSA infections. Among decades of efforts to conquer infectious diseases caused by MRSA, many studies have been proposed to estimate the causal effects of close contact (treatment) on MRSA infection (outcome) from observational data. In this problem, the treatme… ▽ More Methicillin-resistant Staphylococcus aureus (MRSA) is a type of bacteria resistant to certain antibiotics, making it difficult to prevent MRSA infections. Among decades of efforts to conquer infectious diseases caused by MRSA, many studies have been proposed to estimate the causal effects of close contact (treatment) on MRSA infection (outcome) from observational data. In this problem, the treatment assignment mechanism plays a key role as it determines the patterns of missing counterfactuals -- the fundamental challenge of causal effect estimation. Most existing observational studies for causal effect learning assume that the treatment is assigned individually for each unit. However, on many occasions, the treatments are pairwisely assigned for units that are connected in graphs, i.e., the treatments of different units are entangled. Neglecting the entangled treatments can impede the causal effect estimation. In this paper, we study the problem of causal effect estimation with treatment entangled in a graph. Despite a few explorations for entangled treatments, this problem still remains challenging due to the following challenges: (1) the entanglement brings difficulties in modeling and leveraging the unknown treatment assignment mechanism; (2) there may exist hidden confounders which lead to confounding biases in causal effect estimation; (3) the observational data is often time-varying. To tackle these challenges, we propose a novel method NEAT, which explicitly leverages the graph structure to model the treatment assignment mechanism, and mitigates confounding biases based on the treatment assignment modeling. We also extend our method into a dynamic setting to handle time-varying observational data. Experiments on both synthetic datasets and a real-world MRSA dataset validate the effectiveness of the proposed method, and provide insights for future applications. △ Less

Submitted 17 July, 2023; originally announced July 2023.

arXiv:2305.01761 [pdf, other]

Spatial-Temporal Networks for Antibiogram Pattern Prediction

Authors: Xingbo Fu, Chen Chen, Yushun Dong, Anil Vullikanti, Eili Klein, Gregory Madden, Jundong Li

Abstract: An antibiogram is a periodic summary of antibiotic resistance results of organisms from infected patients to selected antimicrobial drugs. Antibiograms help clinicians to understand regional resistance rates and select appropriate antibiotics in prescriptions. In practice, significant combinations of antibiotic resistance may appear in different antibiograms, forming antibiogram patterns. Such pat… ▽ More An antibiogram is a periodic summary of antibiotic resistance results of organisms from infected patients to selected antimicrobial drugs. Antibiograms help clinicians to understand regional resistance rates and select appropriate antibiotics in prescriptions. In practice, significant combinations of antibiotic resistance may appear in different antibiograms, forming antibiogram patterns. Such patterns may imply the prevalence of some infectious diseases in certain regions. Thus it is of crucial importance to monitor antibiotic resistance trends and track the spread of multi-drug resistant organisms. In this paper, we propose a novel problem of antibiogram pattern prediction that aims to predict which patterns will appear in the future. Despite its importance, tackling this problem encounters a series of challenges and has not yet been explored in the literature. First of all, antibiogram patterns are not i.i.d as they may have strong relations with each other due to genomic similarities of the underlying organisms. Second, antibiogram patterns are often temporally dependent on the ones that are previously detected. Furthermore, the spread of antibiotic resistance can be significantly influenced by nearby or similar regions. To address the above challenges, we propose a novel Spatial-Temporal Antibiogram Pattern Prediction framework, STAPP, that can effectively leverage the pattern correlations and exploit the temporal and spatial information. We conduct extensive experiments on a real-world dataset with antibiogram reports of patients from 1999 to 2012 for 203 cities in the United States. The experimental results show the superiority of STAPP against several competitive baselines. △ Less

Submitted 2 May, 2023; originally announced May 2023.

Comments: Accepted by the 11th IEEE International Conference on Healthcare Informatics (IEEE ICHI 2023)

arXiv:2301.04090 [pdf, other]

Finding Nontrivial Minimum Fixed Points in Discrete Dynamical Systems

Authors: Zirou Qiu, Chen Chen, Madhav V. Marathe, S. S. Ravi, Daniel J. Rosenkrantz, Richard E. Stearns, Anil Vullikanti

Abstract: Networked discrete dynamical systems are often used to model the spread of contagions and decision-making by agents in coordination games. Fixed points of such dynamical systems represent configurations to which the system converges. In the dissemination of undesirable contagions (such as rumors and misinformation), convergence to fixed points with a small number of affected nodes is a desirable g… ▽ More Networked discrete dynamical systems are often used to model the spread of contagions and decision-making by agents in coordination games. Fixed points of such dynamical systems represent configurations to which the system converges. In the dissemination of undesirable contagions (such as rumors and misinformation), convergence to fixed points with a small number of affected nodes is a desirable goal. Motivated by such considerations, we formulate a novel optimization problem of finding a nontrivial fixed point of the system with the minimum number of affected nodes. We establish that, unless P = NP, there is no polynomial time algorithm for approximating a solution to this problem to within the factor n^1-εfor any constant epsilon > 0. To cope with this computational intractability, we identify several special cases for which the problem can be solved efficiently. Further, we introduce an integer linear program to address the problem for networks of reasonable sizes. For solving the problem on larger networks, we propose a general heuristic framework along with greedy selection methods. Extensive experimental results on real-world networks demonstrate the effectiveness of the proposed heuristics. △ Less

Submitted 29 March, 2024; v1 submitted 6 January, 2023; originally announced January 2023.

Comments: Accepted at AAAI-22

arXiv:2301.02889 [pdf, other]

Networked Anti-Coordination Games Meet Graphical Dynamical Systems: Equilibria and Convergence

Authors: Zirou Qiu, Chen Chen, Madhav V. Marathe, S. S. Ravi, Daniel J. Rosenkrantz, Richard E. Stearns, Anil Vullikanti

Abstract: Evolutionary anti-coordination games on networks capture real-world strategic situations such as traffic routing and market competition. In such games, agents maximize their utility by choosing actions that differ from their neighbors' actions. Two important problems concerning evolutionary games are the existence of a pure Nash equilibrium (NE) and the convergence time of the dynamics. In this wo… ▽ More Evolutionary anti-coordination games on networks capture real-world strategic situations such as traffic routing and market competition. In such games, agents maximize their utility by choosing actions that differ from their neighbors' actions. Two important problems concerning evolutionary games are the existence of a pure Nash equilibrium (NE) and the convergence time of the dynamics. In this work, we study these two problems for anti-coordination games under sequential and synchronous update schemes. For each update scheme, we examine two decision modes based on whether an agent considers its own previous action (self essential ) or not (self non-essential ) in choosing its next action. Using a relationship between games and dynamical systems, we show that for both update schemes, finding an NE can be done efficiently under the self non-essential mode but is computationally intractable under the self essential mode. To cope with this hardness, we identify special cases for which an NE can be obtained efficiently. For convergence time, we show that the best-response dynamics converges in a polynomial number of steps in the synchronous scheme for both modes; for the sequential scheme, the convergence time is polynomial only under the self non-essential mode. Through experiments, we empirically examine the convergence time and the equilibria for both synthetic and real-world networks. △ Less

Submitted 29 March, 2024; v1 submitted 7 January, 2023; originally announced January 2023.

Comments: Accepted at AAAI-23

arXiv:2301.02876 [pdf, other]

Assigning Agents to Increase Network-Based Neighborhood Diversity

Authors: Zirou Qiu, Andrew Yuan, Chen Chen, Madhav V. Marathe, S. S. Ravi, Daniel J. Rosenkrantz, Richard E. Stearns, Anil Vullikanti

Abstract: Motivated by real-world applications such as the allocation of public housing, we examine the problem of assigning a group of agents to vertices (e.g., spatial locations) of a network so that the diversity level is maximized. Specifically, agents are of two types (characterized by features), and we measure diversity by the number of agents who have at least one neighbor of a different type. This p… ▽ More Motivated by real-world applications such as the allocation of public housing, we examine the problem of assigning a group of agents to vertices (e.g., spatial locations) of a network so that the diversity level is maximized. Specifically, agents are of two types (characterized by features), and we measure diversity by the number of agents who have at least one neighbor of a different type. This problem is known to be NP-hard, and we focus on develo** approximation algorithms with provable performance guarantees. We first present a local-improvement algorithm for general graphs that provides an approximation factor of 1/2. For the special case where the sizes of agent subgroups are similar, we present a randomized approach based on semidefinite programming that yields an approximation factor better than 1/2. Further, we show that the problem can be solved efficiently when the underlying graph is treewidth-bounded and obtain a polynomial time approximation scheme (PTAS) for the problem on planar graphs. Lastly, we conduct experiments to evaluate the per-performance of the proposed algorithms on synthetic and real-world networks. △ Less

Submitted 29 March, 2024; v1 submitted 7 January, 2023; originally announced January 2023.

Comments: Accepted at AAMAS-23

arXiv:2210.08103 [pdf, other]

High-resolution synthetic residential energy use profiles for the United States

Authors: Swapna Thorve, Young Yun Baek, Samarth Swarup, Henning Mortveit, Achla Marathe, Anil Vullikanti, Madhav Marathe

Abstract: Efficient energy consumption is crucial for achieving sustainable energy goals in the era of climate change and grid modernization. Thus, it is vital to understand how energy is consumed at finer resolutions such as household in order to plan demand-response events or analyze the impacts of weather, electricity prices, electric vehicles, solar, and occupancy schedules on energy consumption. Howeve… ▽ More Efficient energy consumption is crucial for achieving sustainable energy goals in the era of climate change and grid modernization. Thus, it is vital to understand how energy is consumed at finer resolutions such as household in order to plan demand-response events or analyze the impacts of weather, electricity prices, electric vehicles, solar, and occupancy schedules on energy consumption. However, availability and access to detailed energy-use data, which would enable detailed studies, has been rare. In this paper, we release a unique, large-scale, synthetic, residential energy-use dataset for the residential sector across the contiguous United States covering millions of households. The data comprise of hourly energy use profiles for synthetic households, disaggregated into Thermostatically Controlled Loads (TCL) and appliance use. The underlying framework is constructed using a bottom-up approach. Diverse open-source surveys and first principles models are used for end-use modeling. Extensive validation of the synthetic dataset has been conducted through comparisons with reported energy-use data. We present a detailed, open, high-resolution, residential energy-use dataset for the United States. △ Less

Submitted 15 December, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: The paper has been accepted for publication in Nature Scientific Data

arXiv:2209.01535 [pdf, other]

Simulation-Assisted Optimization for Large-Scale Evacuation Planning with Congestion-Dependent Delays

Authors: Kazi Ashik Islam, Da Qi Chen, Madhav Marathe, Henning Mortveit, Samarth Swarup, Anil Vullikanti

Abstract: Evacuation planning is a crucial part of disaster management. However, joint optimization of its two essential components, routing and scheduling, with objectives such as minimizing average evacuation time or evacuation completion time, is a computationally hard problem. To approach it, we present MIP-LNS, a scalable optimization method that utilizes heuristic search with mathematical optimization… ▽ More Evacuation planning is a crucial part of disaster management. However, joint optimization of its two essential components, routing and scheduling, with objectives such as minimizing average evacuation time or evacuation completion time, is a computationally hard problem. To approach it, we present MIP-LNS, a scalable optimization method that utilizes heuristic search with mathematical optimization and can optimize a variety of objective functions. We also present the method MIP-LNS-SIM, where we combine agent-based simulation with MIP-LNS to estimate delays due to congestion, as well as, find optimized plans considering such delays. We use Harris County in Houston, Texas, as our study area. We show that, within a given time limit, MIP-LNS finds better solutions than existing methods in terms of three different metrics. However, when congestion dependent delay is considered, MIP-LNS-SIM outperforms MIP-LNS in multiple performance metrics. In addition, MIP-LNS-SIM has a significantly lower percent error in estimated evacuation completion time compared to MIP-LNS. △ Less

Submitted 7 June, 2023; v1 submitted 4 September, 2022; originally announced September 2022.

arXiv:2208.06384 [pdf, other]

doi 10.1073/pnas.2205772119

Ensembles of Realistic Power Distribution Networks

Authors: Rounak Meyur, Anil Vullikanti, Samarth Swarup, Henning Mortveit, Virgilio Centeno, Arun Phadke, H. Vincent Poor, Madhav Marathe

Abstract: The power grid is going through significant changes with the introduction of renewable energy sources and incorporation of smart grid technologies. These rapid advancements necessitate new models and analyses to keep up with the various emergent phenomena they induce. A major prerequisite of such work is the acquisition of well-constructed and accurate network datasets for the power grid infrastru… ▽ More The power grid is going through significant changes with the introduction of renewable energy sources and incorporation of smart grid technologies. These rapid advancements necessitate new models and analyses to keep up with the various emergent phenomena they induce. A major prerequisite of such work is the acquisition of well-constructed and accurate network datasets for the power grid infrastructure. In this paper, we propose a robust, scalable framework to synthesize power distribution networks which resemble their physical counterparts for a given region. We use openly available information about interdependent road and building infrastructures to construct the networks. In contrast to prior work based on network statistics, we incorporate engineering and economic constraints to create the networks. Additionally, we provide a framework to create ensembles of power distribution networks to generate multiple possible instances of the network for a given region. The comprehensive dataset consists of nodes with attributes such as geo-coordinates, type of node (residence, transformer, or substation), and edges with attributes such as geometry, type of line (feeder lines, primary or secondary) and line parameters. For validation, we provide detailed comparisons of the generated networks with actual distribution networks. The generated datasets represent realistic test systems (as compared to standard IEEE test cases) that can be used by network scientists to analyze complex events in power grids and to perform detailed sensitivity and statistical analyses over ensembles of networks. △ Less

Submitted 24 September, 2022; v1 submitted 12 July, 2022; originally announced August 2022.

arXiv:2207.10240 [pdf, other]

Differentially Private Partial Set Cover with Applications to Facility Location

Authors: George Z. Li, Dung Nguyen, Anil Vullikanti

Abstract: It was observed in \citet{gupta2009differentially} that the Set Cover problem has strong impossibility results under differential privacy. In our work, we observe that these hardness results dissolve when we turn to the Partial Set Cover problem, where we only need to cover a $ρ$-fraction of the elements in the universe, for some $ρ\in(0,1)$. We show that this relaxation enables us to avoid the im… ▽ More It was observed in \citet{gupta2009differentially} that the Set Cover problem has strong impossibility results under differential privacy. In our work, we observe that these hardness results dissolve when we turn to the Partial Set Cover problem, where we only need to cover a $ρ$-fraction of the elements in the universe, for some $ρ\in(0,1)$. We show that this relaxation enables us to avoid the impossibility results: under loose conditions on the input set system, we give differentially private algorithms which output an explicit set cover with non-trivial approximation guarantees. In particular, this is the first differentially private algorithm which outputs an explicit set cover. Using our algorithm for Partial Set Cover as a subroutine, we give a differentially private (bicriteria) approximation algorithm for a facility location problem which generalizes $k$-center/$k$-supplier with outliers. Like with the Set Cover problem, no algorithm has been able to give non-trivial guarantees for $k$-center/$k$-supplier-type facility location problems due to the high sensitivity and impossibility results. Our algorithm shows that relaxing the covering requirement to serving only a $ρ$-fraction of the population, for $ρ\in(0,1)$, enables us to circumvent the inherent hardness. Overall, our work is an important step in tackling and understanding impossibility results in private combinatorial optimization. △ Less

Submitted 21 August, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

Comments: 11 pages, 2 figures. Full version of IJCAI 2023 publication

arXiv:2202.08296 [pdf, ps, other]

Controlling Epidemic Spread using Probabilistic Diffusion Models on Networks

Authors: Amy Babay, Michael Dinitz, Aravind Srinivasan, Leonidas Tsepenekas, Anil Vullikanti

Abstract: The spread of an epidemic is often modeled by an SIR random process on a social network graph. The MinINF problem for optimal social distancing involves minimizing the expected number of infections, when we are allowed to break at most $B$ edges; similarly the MinINFNode problem involves removing at most $B$ vertices. These are fundamental problems in epidemiology and network science. While a numb… ▽ More The spread of an epidemic is often modeled by an SIR random process on a social network graph. The MinINF problem for optimal social distancing involves minimizing the expected number of infections, when we are allowed to break at most $B$ edges; similarly the MinINFNode problem involves removing at most $B$ vertices. These are fundamental problems in epidemiology and network science. While a number of heuristics have been considered, the complexity of these problems remains generally open. In this paper, we present two bicriteria approximation algorithms for MinINF, which give the first non-trivial approximations for this problem. The first is based on the cut sparsification result of Karger \cite{karger:mathor99}, and works when the transmission probabilities are not too small. The second is a Sample Average Approximation (SAA) based algorithm, which we analyze for the Chung-Lu random graph model. We also extend some of our results to tackle the MinINFNode problem. △ Less

Submitted 16 February, 2022; originally announced February 2022.

Comments: To appear at AISTATS 2022

arXiv:2202.07092 [pdf, other]

doi 10.24963/ijcai.2022/710

A Reliability-aware Distributed Framework to Schedule Residential Charging of Electric Vehicles

Authors: Rounak Meyur, Swapna Thorve, Madhav Marathe, Anil Vullikanti, Samarth Swarup, Henning Mortveit

Abstract: Residential consumers have become active participants in the power distribution network after being equipped with residential EV charging provisions. This creates a challenge for the network operator tasked with dispatching electric power to the residential consumers through the existing distribution network infrastructure in a reliable manner. In this paper, we address the problem of scheduling r… ▽ More Residential consumers have become active participants in the power distribution network after being equipped with residential EV charging provisions. This creates a challenge for the network operator tasked with dispatching electric power to the residential consumers through the existing distribution network infrastructure in a reliable manner. In this paper, we address the problem of scheduling residential EV charging for multiple consumers while maintaining network reliability. An additional challenge is the restricted exchange of information: where the consumers do not have access to network information and the network operator does not have access to consumer load parameters. We propose a distributed framework which generates an optimal EV charging schedule for individual residential consumers based on their preferences and iteratively updates it until the network reliability constraints set by the operator are satisfied. We validate the proposed approach for different EV adoption levels in a synthetically created digital twin of an actual power distribution network. The results demonstrate that the new approach can achieve a higher level of network reliability compared to the case where residential consumers charge EVs based solely on their individual preferences, thus providing a solution for the existing grid to keep up with increased adoption rates without significant investments in increasing grid capacity. △ Less

Submitted 14 February, 2022; originally announced February 2022.

Comments: 6 pages main conference paper, 3 pages appendix

arXiv:2202.04705 [pdf, other]

Deploying Vaccine Distribution Sites for Improved Accessibility and Equity to Support Pandemic Response

Authors: George Li, Ann Li, Madhav Marathe, Aravind Srinivasan, Leonidas Tsepenekas, Anil Vullikanti

Abstract: In response to COVID-19, many countries have mandated social distancing and banned large group gatherings in order to slow down the spread of SARS-CoV-2. These social interventions along with vaccines remain the best way forward to reduce the spread of SARS CoV-2. In order to increase vaccine accessibility, states such as Virginia have deployed mobile vaccination centers to distribute vaccines acr… ▽ More In response to COVID-19, many countries have mandated social distancing and banned large group gatherings in order to slow down the spread of SARS-CoV-2. These social interventions along with vaccines remain the best way forward to reduce the spread of SARS CoV-2. In order to increase vaccine accessibility, states such as Virginia have deployed mobile vaccination centers to distribute vaccines across the state. When choosing where to place these sites, there are two important factors to take into account: accessibility and equity. We formulate a combinatorial problem that captures these factors and then develop efficient algorithms with theoretical guarantees on both of these aspects. Furthermore, we study the inherent hardness of the problem, and demonstrate strong impossibility results. Finally, we run computational experiments on real-world data to show the efficacy of our methods. △ Less

Submitted 9 February, 2022; originally announced February 2022.

Comments: 14 pages, 4 figures, to appear at AAMAS 2022

arXiv:2202.00636 [pdf, other]

Differentially Private Community Detection for Stochastic Block Models

Authors: Mohamed Seif, Dung Nguyen, Anil Vullikanti, Ravi Tandon

Abstract: The goal of community detection over graphs is to recover underlying labels/attributes of users (e.g., political affiliation) given the connectivity between users (represented by adjacency matrix of a graph). There has been significant recent progress on understanding the fundamental limits of community detection when the graph is generated from a stochastic block model (SBM). Specifically, sharp… ▽ More The goal of community detection over graphs is to recover underlying labels/attributes of users (e.g., political affiliation) given the connectivity between users (represented by adjacency matrix of a graph). There has been significant recent progress on understanding the fundamental limits of community detection when the graph is generated from a stochastic block model (SBM). Specifically, sharp information theoretic limits and efficient algorithms have been obtained for SBMs as a function of $p$ and $q$, which represent the intra-community and inter-community connection probabilities. In this paper, we study the community detection problem while preserving the privacy of the individual connections (edges) between the vertices. Focusing on the notion of $(ε, δ)$-edge differential privacy (DP), we seek to understand the fundamental tradeoffs between $(p, q)$, DP budget $(ε, δ)$, and computational efficiency for exact recovery of the community labels. To this end, we present and analyze the associated information-theoretic tradeoffs for three broad classes of differentially private community recovery mechanisms: a) stability based mechanism; b) sampling based mechanisms; and c) graph perturbation mechanisms. Our main findings are that stability and sampling based mechanisms lead to a superior tradeoff between $(p,q)$ and the privacy budget $(ε, δ)$; however this comes at the expense of higher computational complexity. On the other hand, albeit low complexity, graph perturbation mechanisms require the privacy budget $ε$ to scale as $Ω(\log(n))$ for exact recovery. To the best of our knowledge, this is the first work to study the impact of privacy constraints on the fundamental limits for community detection. △ Less

Submitted 17 August, 2023; v1 submitted 31 January, 2022; originally announced February 2022.

Comments: ICML 2022. https://proceedings.mlr.press/v162/mohamed22a.html

arXiv:2112.15547 [pdf, other]

A Markov Decision Process Framework for Efficient and Implementable Contact Tracing and Isolation

Authors: George Li, Arash Haddadan, Ann Li, Madhav Marathe, Aravind Srinivasan, Anil Vullikanti, Zeyu Zhao

Abstract: Efficient contact tracing and isolation is an effective strategy to control epidemics. It was used effectively during the Ebola epidemic and successfully implemented in several parts of the world during the ongoing COVID-19 pandemic. An important consideration in contact tracing is the budget on the number of individuals asked to quarantine -- the budget is limited for socioeconomic reasons. In th… ▽ More Efficient contact tracing and isolation is an effective strategy to control epidemics. It was used effectively during the Ebola epidemic and successfully implemented in several parts of the world during the ongoing COVID-19 pandemic. An important consideration in contact tracing is the budget on the number of individuals asked to quarantine -- the budget is limited for socioeconomic reasons. In this paper, we present a Markov Decision Process (MDP) framework to formulate the problem of using contact tracing to reduce the size of an outbreak while asking a limited number of people to quarantine. We formulate each step of the MDP as a combinatorial problem, MinExposed, which we demonstrate is NP-Hard; as a result, we develop an LP-based approximation algorithm. Though this algorithm directly solves MinExposed, it is often impractical in the real world due to information constraints. To this end, we develop a greedy approach based on insights from the analysis of the previous algorithm, which we show is more interpretable. A key feature of the greedy algorithm is that it does not need complete information of the underlying social contact network. This makes the heuristic implementable in practice and is an important consideration. Finally, we carry out experiments on simulations of the MDP run on real-world networks, and show how the algorithms can help in bending the epidemic curve while limiting the number of isolated individuals. Our experimental results demonstrate that the greedy algorithm and its variants are especially effective, robust, and practical in a variety of realistic scenarios, such as when the contact graph and specific transmission probabilities are not known. All code can be found in our GitHub repository: https://github.com/gzli929/ContactTracing. △ Less

Submitted 13 January, 2022; v1 submitted 31 December, 2021; originally announced December 2021.

Comments: 24 pages, 11 figures, to be published in AAMAS-22 as a 2 page extended abstract. This version has additional fairness guarantees, and fixes some typos

arXiv:2106.05424 [pdf, ps, other]

Fair Disaster Containment via Graph-Cut Problems

Authors: Michael Dinitz, Aravind Srinivasan, Leonidas Tsepenekas, Anil Vullikanti

Abstract: Graph cut problems are fundamental in Combinatorial Optimization, and are a central object of study in both theory and practice. Furthermore, the study of \emph{fairness} in Algorithmic Design and Machine Learning has recently received significant attention, with many different notions proposed and analyzed for a variety of contexts. In this paper we initiate the study of fairness for graph cut pr… ▽ More Graph cut problems are fundamental in Combinatorial Optimization, and are a central object of study in both theory and practice. Furthermore, the study of \emph{fairness} in Algorithmic Design and Machine Learning has recently received significant attention, with many different notions proposed and analyzed for a variety of contexts. In this paper we initiate the study of fairness for graph cut problems by giving the first fair definitions for them, and subsequently we demonstrate appropriate algorithmic techniques that yield a rigorous theoretical analysis. Specifically, we incorporate two different notions of fairness, namely \emph{demographic} and \emph{probabilistic individual} fairness, in a particular cut problem that models disaster containment scenarios. Our results include a variety of approximation algorithms with provable theoretical guarantees. △ Less

Submitted 16 February, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: To appear at AISTATS 2022

arXiv:2105.13287 [pdf, other]

Differentially Private Densest Subgraph Detection

Authors: Dung Nguyen, Anil Vullikanti

Abstract: Densest subgraph detection is a fundamental graph mining problem, with a large number of applications. There has been a lot of work on efficient algorithms for finding the densest subgraph in massive networks. However, in many domains, the network is private, and returning a densest subgraph can reveal information about the network. Differential privacy is a powerful framework to handle such setti… ▽ More Densest subgraph detection is a fundamental graph mining problem, with a large number of applications. There has been a lot of work on efficient algorithms for finding the densest subgraph in massive networks. However, in many domains, the network is private, and returning a densest subgraph can reveal information about the network. Differential privacy is a powerful framework to handle such settings. We study the densest subgraph problem in the edge privacy model, in which the edges of the graph are private. We present the first sequential and parallel differentially private algorithms for this problem. We show that our algorithms have an additive approximation guarantee. We evaluate our algorithms on a large number of real-world networks, and observe a good privacy-accuracy tradeoff when the network has high density. △ Less

Submitted 18 June, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

Comments: Accepted by ICML 2021

arXiv:2103.00154 [pdf, other]

Parallel Algorithms for Densest Subgraph Discovery Using Shared Memory Model

Authors: B. D. M. De Zoysa, Y. A. M. M. A. Ali, M. D. I. Maduranga, Indika Perera, Saliya Ekanayake, Anil Vullikanti

Abstract: The problem of finding dense components of a graph is a widely explored area in data analysis, with diverse applications in fields and branches of study including community mining, spam detection, computer security and bioinformatics. This research project explores previously available algorithms in order to study them and identify potential modifications that could result in an improved version w… ▽ More The problem of finding dense components of a graph is a widely explored area in data analysis, with diverse applications in fields and branches of study including community mining, spam detection, computer security and bioinformatics. This research project explores previously available algorithms in order to study them and identify potential modifications that could result in an improved version with considerable performance and efficiency leap. Furthermore, efforts were also steered towards devising a novel algorithm for the problem of densest subgraph discovery. This paper presents an improved implementation of a widely used densest subgraph discovery algorithm and a novel parallel algorithm which produces better results than a 2-approximation. △ Less

Submitted 27 February, 2021; originally announced March 2021.

arXiv:2012.03413 [pdf, other]

Map** Network States Using Connectivity Queries

Authors: Alexander Rodríguez, Bijaya Adhikari, Andrés D. González, Charles Nicholson, Anil Vullikanti, B. Aditya Prakash

Abstract: Can we infer all the failed components of an infrastructure network, given a sample of reachable nodes from supply nodes? One of the most critical post-disruption processes after a natural disaster is to quickly determine the damage or failure states of critical infrastructure components. However, this is non-trivial, considering that often only a fraction of components may be accessible or observ… ▽ More Can we infer all the failed components of an infrastructure network, given a sample of reachable nodes from supply nodes? One of the most critical post-disruption processes after a natural disaster is to quickly determine the damage or failure states of critical infrastructure components. However, this is non-trivial, considering that often only a fraction of components may be accessible or observable after a disruptive event. Past work has looked into inferring failed components given point probes, i.e. with a direct sample of failed components. In contrast, we study the harder problem of inferring failed components given partial information of some `serviceable' reachable nodes and a small sample of point probes, being the first often more practical to obtain. We formulate this novel problem using the Minimum Description Length (MDL) principle, and then present a greedy algorithm that minimizes MDL cost effectively. We evaluate our algorithm on domain-expert simulations of real networks in the aftermath of an earthquake. Our algorithm successfully identify failed components, especially the critical ones affecting the overall system performance. △ Less

Submitted 23 December, 2020; v1 submitted 6 December, 2020; originally announced December 2020.

Comments: Appears in IEEE BigData 2020. Preliminary version in NeurIPS 2020 AI + HADR

arXiv:2012.03310 [pdf, other]

PAC-Learning for Strategic Classification

Authors: Ravi Sundaram, Anil Vullikanti, Haifeng Xu, Fan Yao

Abstract: The study of strategic or adversarial manipulation of testing data to fool a classifier has attracted much recent attention. Most previous works have focused on two extreme situations where any testing data point either is completely adversarial or always equally prefers the positive label. In this paper, we generalize both of these through a unified framework for strategic classification, and int… ▽ More The study of strategic or adversarial manipulation of testing data to fool a classifier has attracted much recent attention. Most previous works have focused on two extreme situations where any testing data point either is completely adversarial or always equally prefers the positive label. In this paper, we generalize both of these through a unified framework for strategic classification, and introduce the notion of strategic VC-dimension (SVC) to capture the PAC-learnability in our general strategic setup. SVC provably generalizes the recent concept of adversarial VC-dimension (AVC) introduced by Cullina et al. arXiv:1806.01471. We instantiate our framework for the fundamental strategic linear classification problem. We fully characterize: (1) the statistical learnability of linear classifiers by pinning down its SVC; (2) its computational tractability by pinning down the complexity of the empirical risk minimization problem. Interestingly, the SVC of linear classifiers is always upper bounded by its standard VC-dimension. This characterization also strictly generalizes the AVC bound for linear classifiers in arXiv:1806.01471. △ Less

Submitted 11 June, 2021; v1 submitted 6 December, 2020; originally announced December 2020.

arXiv:2009.11665 [pdf, other]

doi 10.1109/BigData47090.2019.9006037

SubGraph2Vec: Highly-Vectorized Tree-likeSubgraph Counting

Authors: Langshi Chen, Jiayu Li, Ariful Azad, Cenk Sahinalp, Madhav Marathe, Anil Vullikanti, Andrey Nikolaev, Egor Smirnov, Ruslan Israfilov, Judy Qiu

Abstract: Subgraph counting aims to count occurrences of a template T in a given network G(V, E). It is a powerful graph analysis tool and has found real-world applications in diverse domains. Scaling subgraph counting problems is known to be memory bounded and computationally challenging with exponential complexity. Although scalable parallel algorithms are known for several graph problems such as Triangle… ▽ More Subgraph counting aims to count occurrences of a template T in a given network G(V, E). It is a powerful graph analysis tool and has found real-world applications in diverse domains. Scaling subgraph counting problems is known to be memory bounded and computationally challenging with exponential complexity. Although scalable parallel algorithms are known for several graph problems such as Triangle Counting and PageRank, this is not common for counting complex subgraphs. Here we address this challenge and study connected acyclic graphs or trees. We propose a novel vectorized subgraph counting algorithm, named Subgraph2Vec, as well as both shared memory and distributed implementations: 1) reducing algorithmic complexity by minimizing neighbor traversal; 2) achieving a highly-vectorized implementation upon linear algebra kernels to significantly improve performance and hardware utilization. 3) Subgraph2Vec improves the overall performance over the state-of-the-art work by orders of magnitude and up to 660x on a single node. 4) Subgraph2Vec in distributed mode can scale up the template size to 20 and maintain good strong scalability. 5) enabling portability to both CPU and GPU. △ Less

Submitted 4 October, 2020; v1 submitted 23 September, 2020; originally announced September 2020.

Comments: arXiv admin note: text overlap with arXiv:1903.04395

Journal ref: 2019 IEEE International Conference on Big Data (Big Data)

arXiv:2009.10018 [pdf, other]

Data-driven modeling for different stages of pandemic response

Authors: Aniruddha Adiga, Jiangzhuo Chen, Madhav Marathe, Henning Mortveit, Srinivasan Venkatramanan, Anil Vullikanti

Abstract: Some of the key questions of interest during the COVID-19 pandemic (and all outbreaks) include: where did the disease start, how is it spreading, who is at risk, and how to control the spread. There are a large number of complex factors driving the spread of pandemics, and, as a result, multiple modeling techniques play an increasingly important role in sha** public policy and decision making. A… ▽ More Some of the key questions of interest during the COVID-19 pandemic (and all outbreaks) include: where did the disease start, how is it spreading, who is at risk, and how to control the spread. There are a large number of complex factors driving the spread of pandemics, and, as a result, multiple modeling techniques play an increasingly important role in sha** public policy and decision making. As different countries and regions go through phases of the pandemic, the questions and data availability also changes. Especially of interest is aligning model development and data collection to support response efforts at each stage of the pandemic. The COVID-19 pandemic has been unprecedented in terms of real-time collection and dissemination of a number of diverse datasets, ranging from disease outcomes, to mobility, behaviors, and socio-economic factors. The data sets have been critical from the perspective of disease modeling and analytics to support policymakers in real-time. In this overview article, we survey the data landscape around COVID-19, with a focus on how such datasets have aided modeling and response through different stages so far in the pandemic. We also discuss some of the current challenges and the needs that will arise as we plan our way out of the pandemic. △ Less

Submitted 21 September, 2020; originally announced September 2020.

arXiv:2009.10014 [pdf, other]

Models for COVID-19 Pandemic: A Comparative Analysis

Authors: Aniruddha Adiga, Devdatt Dubhashi, Bryan Lewis, Madhav Marathe, Srinivasan Venkatramanan, Anil Vullikanti

Abstract: COVID-19 pandemic represents an unprecedented global health crisis in the last 100 years. Its economic, social and health impact continues to grow and is likely to end up as one of the worst global disasters since the 1918 pandemic and the World Wars. Mathematical models have played an important role in the ongoing crisis; they have been used to inform public policies and have been instrumental in… ▽ More COVID-19 pandemic represents an unprecedented global health crisis in the last 100 years. Its economic, social and health impact continues to grow and is likely to end up as one of the worst global disasters since the 1918 pandemic and the World Wars. Mathematical models have played an important role in the ongoing crisis; they have been used to inform public policies and have been instrumental in many of the social distancing measures that were instituted worldwide. In this article we review some of the important mathematical models used to support the ongoing planning and response efforts. These models differ in their use, their mathematical form and their scope. △ Less

Submitted 21 September, 2020; originally announced September 2020.

arXiv:2008.03325 [pdf, ps, other]

Stochastic Optimization and Learning for Two-Stage Supplier Problems

Authors: Brian Brubach, Nathaniel Grammel, David G. Harris, Aravind Srinivasan, Leonidas Tsepenekas, Anil Vullikanti

Abstract: The main focus of this paper is radius-based (supplier) clustering in the two-stage stochastic setting with recourse, where the inherent stochasticity of the model comes in the form of a budget constraint. In addition to the standard (homogeneous) setting where all clients must be within a distance $R$ of the nearest facility, we provide results for the more general problem where the radius demand… ▽ More The main focus of this paper is radius-based (supplier) clustering in the two-stage stochastic setting with recourse, where the inherent stochasticity of the model comes in the form of a budget constraint. In addition to the standard (homogeneous) setting where all clients must be within a distance $R$ of the nearest facility, we provide results for the more general problem where the radius demands may be inhomogeneous (i.e., different for each client). We also explore a number of variants where additional constraints are imposed on the first-stage decisions, specifically matroid and multi-knapsack constraints, and provide results for these settings. We derive results for the most general distributional setting, where there is only black-box access to the underlying distribution. To accomplish this, we first develop algorithms for the polynomial scenarios setting; we then employ a novel scenario-discarding variant of the standard Sample Average Approximation (SAA) method, which crucially exploits properties of the restricted-case algorithms. We note that the scenario-discarding modification to the SAA method is necessary in order to optimize over the radius. △ Less

Submitted 7 April, 2024; v1 submitted 7 August, 2020; originally announced August 2020.

arXiv:2002.02487 [pdf, other]

Efficient Algorithms for Generating Provably Near-Optimal Cluster Descriptors for Explainability

Authors: Prathyush Sambaturu, Aparna Gupta, Ian Davidson, S. S. Ravi, Anil Vullikanti, Andrew Warren

Abstract: Improving the explainability of the results from machine learning methods has become an important research goal. Here, we study the problem of making clusters more interpretable by extending a recent approach of [Davidson et al., NeurIPS 2018] for constructing succinct representations for clusters. Given a set of objects $S$, a partition $π$ of $S$ (into clusters), and a universe $T$ of tags such… ▽ More Improving the explainability of the results from machine learning methods has become an important research goal. Here, we study the problem of making clusters more interpretable by extending a recent approach of [Davidson et al., NeurIPS 2018] for constructing succinct representations for clusters. Given a set of objects $S$, a partition $π$ of $S$ (into clusters), and a universe $T$ of tags such that each element in $S$ is associated with a subset of tags, the goal is to find a representative set of tags for each cluster such that those sets are pairwise-disjoint and the total size of all the representatives is minimized. Since this problem is NP-hard in general, we develop approximation algorithms with provable performance guarantees for the problem. We also show applications to explain clusters from datasets, including clusters of genomic sequences that represent different threat levels. △ Less

Submitted 6 February, 2020; originally announced February 2020.

MSC Class: 68W25; 68T01; 68R05 ACM Class: G.2; I.2; F.2

arXiv:2001.09130 [pdf, ps, other]

Creating Realistic Power Distribution Networks using Interdependent Road Infrastructure

Authors: Rounak Meyur, Madhav Marathe, Anil Vullikanti, Henning Mortveit, Virgilio Centeno, Arun Phadke

Abstract: It is well known that physical interdependencies exist between networked civil infrastructures such as transportation and power system networks. In order to analyze complex nonlinear correlations between such networks, datasets pertaining to such real infrastructures are required. However, such data are not readily available due to their proprietary nature. This work proposes a methodology to gene… ▽ More It is well known that physical interdependencies exist between networked civil infrastructures such as transportation and power system networks. In order to analyze complex nonlinear correlations between such networks, datasets pertaining to such real infrastructures are required. However, such data are not readily available due to their proprietary nature. This work proposes a methodology to generate realistic synthetic power distribution networks for a given geographical region. A network generated in this manner is not the actual distribution system, but its functionality is very similar to the real distribution network. The synthetic network connects high voltage substations to individual residential consumers through primary and secondary distribution networks. Here, the distribution network is generated by solving an optimization problem which minimizes the overall length of the network subject to structural and power flow constraints. This work also incorporates identification of long high voltage feeders originating from substations and connecting remotely situated customers in rural geographic locations while maintaining voltage regulation within acceptable limits. The proposed methodology is applied to the state of Virginia and creates synthetic distribution networks which are validated by comparing them to actual power distribution networks at the same location. △ Less

Submitted 11 November, 2020; v1 submitted 24 January, 2020; originally announced January 2020.

Comments: Accepted for presentation at the IEEE Big Data Conference 2020

arXiv:1903.04395 [pdf, other]

A GraphBLAS Approach for Subgraph Counting

Authors: Langshi Chen, Jiayu Li, Ariful Azad, Lei Jiang, Madhav Marathe, Anil Vullikanti, Andrey Nikolaev, Egor Smirnov, Ruslan Israfilov, Judy Qiu

Abstract: Subgraph counting aims to count the occurrences of a subgraph template T in a given network G. The basic problem of computing structural properties such as counting triangles and other subgraphs has found applications in diverse domains. Recent biological, social, cybersecurity and sensor network applications have motivated solving such problems on massive networks with billions of vertices. The l… ▽ More Subgraph counting aims to count the occurrences of a subgraph template T in a given network G. The basic problem of computing structural properties such as counting triangles and other subgraphs has found applications in diverse domains. Recent biological, social, cybersecurity and sensor network applications have motivated solving such problems on massive networks with billions of vertices. The larger subgraph problem is known to be memory bounded and computationally challenging to scale; the complexity grows both as a function of T and G. In this paper, we study the non-induced tree subgraph counting problem, propose a novel layered softwarehardware co-design approach, and implement a shared-memory multi-threaded algorithm: 1) reducing the complexity of the parallel color-coding algorithm by identifying and pruning redundant graph traversal; 2) achieving a fully-vectorized implementation upon linear algebra kernels inspired by GraphBLAS, which significantly improves cache usage and maximizes memory bandwidth utilization. Experiments show that our implementation improves the overall performance over the state-of-the-art work by orders of magnitude and up to 660x for subgraph templates with size over 12 on a dual-socket Intel(R) Xeon(R) Platinum 8160 server. We believe our approach using GraphBLAS with optimized sparse linear algebra can be applied to other massive subgraph counting problems and emerging high-memory bandwidth hardware architectures. △ Less

Submitted 11 March, 2019; originally announced March 2019.

Comments: 12 pages

arXiv:1804.09764 [pdf, other]

High-Performance Massive Subgraph Counting using Pipelined Adaptive-Group Communication

Authors: Langshi Chen, Bo Peng, Sabra Ossen, Anil Vullikanti, Madhav Marathe, Lei Jiang, Judy Qiu

Abstract: Subgraph counting aims to count the number of occurrences of a subgraph T (aka as a template) in a given graph G. The basic problem has found applications in diverse domains. The problem is known to be computationally challenging - the complexity grows both as a function of T and G. Recent applications have motivated solving such problems on massive networks with billions of vertices. In this chap… ▽ More Subgraph counting aims to count the number of occurrences of a subgraph T (aka as a template) in a given graph G. The basic problem has found applications in diverse domains. The problem is known to be computationally challenging - the complexity grows both as a function of T and G. Recent applications have motivated solving such problems on massive networks with billions of vertices. In this chapter, we study the subgraph counting problem from a parallel computing perspective. We discuss efficient parallel algorithms for approximately resolving subgraph counting problems by using the color-coding technique. We then present several system-level strategies to substantially improve the overall performance of the algorithm in massive subgraph counting problems. We propose: 1) a novel pipelined Adaptive-Group communication pattern to improve inter-node scalability, 2) a fine-grained pipeline design to effectively reduce the memory space of intermediate results, 3) partitioning neighbor lists of subgraph vertices to achieve better thread concurrency and workload balance. Experimentation on an Intel Xeon E5 cluster shows that our implementation achieves 5x speedup of performance compared to the state-of-the-art work while reduces the peak memory utilization by a factor of 2 on large templates of 12 to 15 vertices and input graphs of 2 to 5 billions of edges. △ Less

Submitted 25 April, 2018; originally announced April 2018.

arXiv:1604.00033 [pdf, other]

EMBERS at 4 years: Experiences operating an Open Source Indicators Forecasting System

Authors: Sathappan Muthiah, Patrick Butler, Rupinder Paul Khandpur, Parang Saraf, Nathan Self, Alla Rozovskaya, Liang Zhao, Jose Cadena, Chang-Tien Lu, Anil Vullikanti, Achla Marathe, Kristen Summers, Graham Katz, Andy Doyle, Jaime Arredondo, Dipak K. Gupta, David Mares, Naren Ramakrishnan

Abstract: EMBERS is an anticipatory intelligence system forecasting population-level events in multiple countries of Latin America. A deployed system from 2012, EMBERS has been generating alerts 24x7 by ingesting a broad range of data sources including news, blogs, tweets, machine coded events, currency rates, and food prices. In this paper, we describe our experiences operating EMBERS continuously for near… ▽ More EMBERS is an anticipatory intelligence system forecasting population-level events in multiple countries of Latin America. A deployed system from 2012, EMBERS has been generating alerts 24x7 by ingesting a broad range of data sources including news, blogs, tweets, machine coded events, currency rates, and food prices. In this paper, we describe our experiences operating EMBERS continuously for nearly 4 years, with specific attention to the discoveries it has enabled, correct as well as missed forecasts, and lessons learnt from participating in a forecasting tournament including our perspectives on the limits of forecasting and ethical considerations. △ Less

Submitted 31 March, 2016; originally announced April 2016.

Comments: Submitted to a conference

arXiv:1602.06866 [pdf, other]

Forecasting the Flu: Designing Social Network Sensors for Epidemics

Authors: Huijuan Shao, K. S. M. Tozammel Hossain, Hao Wu, Maleq Khan, Anil Vullikanti, B. Aditya Prakash, Madhav Marathe, Naren Ramakrishnan

Abstract: Early detection and modeling of a contagious epidemic can provide important guidance about quelling the contagion, controlling its spread, or the effective design of countermeasures. A topic of recent interest has been to design social network sensors, i.e., identifying a small set of people who can be monitored to provide insight into the emergence of an epidemic in a larger population. We formal… ▽ More Early detection and modeling of a contagious epidemic can provide important guidance about quelling the contagion, controlling its spread, or the effective design of countermeasures. A topic of recent interest has been to design social network sensors, i.e., identifying a small set of people who can be monitored to provide insight into the emergence of an epidemic in a larger population. We formally pose the problem of designing social network sensors for flu epidemics and identify two different objectives that could be targeted in such sensor design problems. Using the graph theoretic notion of dominators we develop an efficient and effective heuristic for forecasting epidemics at lead time. Using six city-scale datasets generated by extensive microscopic epidemiological simulations involving millions of individuals, we illustrate the practical applicability of our methods and show significant benefits (up to twenty-two days more lead time) compared to other competitors. Most importantly, we demonstrate the use of surrogates or proxies for policy makers for designing social network sensors that require from nonintrusive knowledge of people to more information on the relationship among people. The results show that the more intrusive information we obtain, the longer lead time to predict the flu outbreak up to nine days. △ Less

Submitted 8 March, 2016; v1 submitted 22 February, 2016; originally announced February 2016.

Comments: The conference version of the paper is submitted for publication

arXiv:1501.06614 [pdf, ps, other]

Approximation Algorithms for Reducing the Spectral Radius to Control Epidemic Spread

Authors: Sudip Saha, Abhi** Adiga, B. Aditya Prakash, Anil Kumar S. Vullikanti

Abstract: The largest eigenvalue of the adjacency matrix of a network (referred to as the spectral radius) is an important metric in its own right. Further, for several models of epidemic spread on networks (e.g., the `flu-like' SIS model), it has been shown that an epidemic dies out quickly if the spectral radius of the graph is below a certain threshold that depends on the model parameters. This motivates… ▽ More The largest eigenvalue of the adjacency matrix of a network (referred to as the spectral radius) is an important metric in its own right. Further, for several models of epidemic spread on networks (e.g., the `flu-like' SIS model), it has been shown that an epidemic dies out quickly if the spectral radius of the graph is below a certain threshold that depends on the model parameters. This motivates a strategy to control epidemic spread by reducing the spectral radius of the underlying network. In this paper, we develop a suite of provable approximation algorithms for reducing the spectral radius by removing the minimum cost set of edges (modeling quarantining) or nodes (modeling vaccinations), with different time and quality tradeoffs. Our main algorithm, \textsc{GreedyWalk}, is based on the idea of hitting closed walks of a given length, and gives an $O(\log^2{n})$-approximation, where $n$ denotes the number of nodes; it also performs much better in practice compared to all prior heuristics proposed for this problem. We further present a novel sparsification method to improve its running time. In addition, we give a new primal-dual based algorithm with an even better approximation guarantee ($O(\log n)$), albeit with slower running time. We also give lower bounds on the worst-case performance of some of the popular heuristics. Finally we demonstrate the applicability of our algorithms and the properties of our solutions via extensive experiments on multiple synthetic and real networks. △ Less

Submitted 26 January, 2015; originally announced January 2015.

Comments: Part of it will be published in SDM,15 proceedings

arXiv:1402.7035 [pdf, ps, other]

'Beating the news' with EMBERS: Forecasting Civil Unrest using Open Source Indicators

Authors: Naren Ramakrishnan, Patrick Butler, Sathappan Muthiah, Nathan Self, Rupinder Khandpur, Parang Saraf, Wei Wang, Jose Cadena, Anil Vullikanti, Gizem Korkmaz, Chris Kuhlman, Achla Marathe, Liang Zhao, Ting Hua, Feng Chen, Chang-Tien Lu, Bert Huang, Aravind Srinivasan, Khoa Trinh, Lise Getoor, Graham Katz, Andy Doyle, Chris Ackermann, Ilya Zavorin, Jim Ford , et al. (5 additional authors not shown)

Abstract: We describe the design, implementation, and evaluation of EMBERS, an automated, 24x7 continuous system for forecasting civil unrest across 10 countries of Latin America using open source indicators such as tweets, news sources, blogs, economic indicators, and other data sources. Unlike retrospective studies, EMBERS has been making forecasts into the future since Nov 2012 which have been (and conti… ▽ More We describe the design, implementation, and evaluation of EMBERS, an automated, 24x7 continuous system for forecasting civil unrest across 10 countries of Latin America using open source indicators such as tweets, news sources, blogs, economic indicators, and other data sources. Unlike retrospective studies, EMBERS has been making forecasts into the future since Nov 2012 which have been (and continue to be) evaluated by an independent T&E team (MITRE). Of note, EMBERS has successfully forecast the uptick and downtick of incidents during the June 2013 protests in Brazil. We outline the system architecture of EMBERS, individual models that leverage specific data sources, and a fusion and suppression engine that supports trading off specific evaluation criteria. EMBERS also provides an audit trail interface that enables the investigation of why specific predictions were made along with the data utilized for forecasting. Through numerous evaluations, we demonstrate the superiority of EMBERS over baserate methods and its capability to forecast significant societal happenings. △ Less

Submitted 27 February, 2014; v1 submitted 27 February, 2014; originally announced February 2014.

ACM Class: K.4.1; J.4; I.2.7

arXiv:1208.0811 [pdf, ps, other]

Efficient Algorithms for Maximum Link Scheduling in Distributed Computing Models with SINR Constraints

Authors: Guanhong Pei, Anil Kumar S. Vullikanti

Abstract: A fundamental problem in wireless networks is the maximum link scheduling problem: given a set $L$ of links, compute the largest possible subset $L'\subseteq L$ of links that can be scheduled simultaneously without interference. This problem is particularly challenging in the physical interference model based on SINR constraints (referred to as the SINR model), which has gained a lot of interest i… ▽ More A fundamental problem in wireless networks is the maximum link scheduling problem: given a set $L$ of links, compute the largest possible subset $L'\subseteq L$ of links that can be scheduled simultaneously without interference. This problem is particularly challenging in the physical interference model based on SINR constraints (referred to as the SINR model), which has gained a lot of interest in recent years. Constant factor approximation algorithms have been developed for this problem, but low complexity distributed algorithms that give the same approximation guarantee in the SINR model are not known. Distributed algorithms are especially challenging in this model, because of its non-locality. In this paper, we develop a set of fast distributed algorithms in the SINR model, providing constant approximation for the maximum link scheduling problem under uniform power assignment. We find that different aspects of available technology, such as full/half-duplex communication, and non-adaptive/adaptive power control, have a significant impact on the performance of the algorithm; these issues have not been explored in the context of distributed algorithms in the SINR model before. Our algorithms' running time is $O(g(L) \log^c m)$, where $c=1,2,3$ for different problem instances, and $g(L)$ is the "link diversity" determined by the logarithmic scale of a communication link length. Since $g(L)$ is small and remains in a constant range in most cases, our algorithms serve as the first set of "sublinear" time distributed solution. The algorithms are randomized and crucially use physical carrier sensing in distributed communication steps. △ Less

Submitted 16 November, 2012; v1 submitted 3 August, 2012; originally announced August 2012.

Showing 1–40 of 40 results for author: Vullikanti, A