-
A Survey of Privacy Threats and Defense in Vertical Federated Learning: From Model Life Cycle Perspective
Authors:
Lei Yu,
Meng Han,
Yiming Li,
Changting Lin,
Yao Zhang,
Mingyang Zhang,
Yan Liu,
Haiqin Weng,
Yuseok Jeon,
Ka-Ho Chow,
Stacy Patterson
Abstract:
Vertical Federated Learning (VFL) is a federated learning paradigm where multiple participants, who share the same set of samples but hold different features, jointly train machine learning models. Although VFL enables collaborative machine learning without sharing raw data, it is still susceptible to various privacy threats. In this paper, we conduct the first comprehensive survey of the state-of…
▽ More
Vertical Federated Learning (VFL) is a federated learning paradigm where multiple participants, who share the same set of samples but hold different features, jointly train machine learning models. Although VFL enables collaborative machine learning without sharing raw data, it is still susceptible to various privacy threats. In this paper, we conduct the first comprehensive survey of the state-of-the-art in privacy attacks and defenses in VFL. We provide taxonomies for both attacks and defenses, based on their characterizations, and discuss open challenges and future research directions. Specifically, our discussion is structured around the model's life cycle, by delving into the privacy threats encountered during different stages of machine learning and their corresponding countermeasures. This survey not only serves as a resource for the research community but also offers clear guidance and actionable insights for practitioners to safeguard data privacy throughout the model's life cycle.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Towards Automatic Design of Factorio Blueprints
Authors:
Sean Patterson,
Joan Espasa,
Mun See Chang,
Ruth Hoffmann
Abstract:
Factorio is a 2D construction and management simulation video game about building automated factories to produce items of increasing complexity. A core feature of the game is its blueprint system, which allows players to easily save and replicate parts of their designs. Blueprints can reproduce any layout of objects in the game, but are typically used to encapsulate a complex behaviour, such as th…
▽ More
Factorio is a 2D construction and management simulation video game about building automated factories to produce items of increasing complexity. A core feature of the game is its blueprint system, which allows players to easily save and replicate parts of their designs. Blueprints can reproduce any layout of objects in the game, but are typically used to encapsulate a complex behaviour, such as the production of a non-basic object. Once created, these blueprints are then used as basic building blocks, allowing the player to create a layer of abstraction. The usage of blueprints not only eases the expansion of the factory but also allows the sharing of designs with the game's community. The layout in a blueprint can be optimised using various criteria, such as the total space used or the final production throughput. The design of an optimal blueprint is a hard combinatorial problem, interleaving elements of many well-studied problems such as bin-packing, routing or network design. This work presents a new challenging problem and explores the feasibility of a constraint model to optimise Factorio blueprints, balancing correctness, optimality, and performance.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning
Authors:
Timothy Castiglia,
Yi Zhou,
Shiqiang Wang,
Swanand Kadhe,
Nathalie Baracaldo,
Stacy Patterson
Abstract:
We propose LESS-VFL, a communication-efficient feature selection method for distributed systems with vertically partitioned data. We consider a system of a server and several parties with local datasets that share a sample ID space but have different feature sets. The parties wish to collaboratively train a model for a prediction task. As part of the training, the parties wish to remove unimportan…
▽ More
We propose LESS-VFL, a communication-efficient feature selection method for distributed systems with vertically partitioned data. We consider a system of a server and several parties with local datasets that share a sample ID space but have different feature sets. The parties wish to collaboratively train a model for a prediction task. As part of the training, the parties wish to remove unimportant features in the system to improve generalization, efficiency, and explainability. In LESS-VFL, after a short pre-training period, the server optimizes its part of the global model to determine the relevant outputs from party models. This information is shared with the parties to then allow local feature selection without communication. We analytically prove that LESS-VFL removes spurious features from model training. We provide extensive empirical evidence that LESS-VFL can achieve high accuracy and remove spurious features at a fraction of the communication cost of other feature selection approaches.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Enhancing Early Lung Cancer Detection on Chest Radiographs with AI-assistance: A Multi-Reader Study
Authors:
Gaetan Dissez,
Nicole Tay,
Tom Dyer,
Matthew Tam,
Richard Dittrich,
David Doyne,
James Hoare,
Jackson J. Pat,
Stephanie Patterson,
Amanda Stockham,
Qaiser Malik,
Tom Naunton Morgan,
Paul Williams,
Liliana Garcia-Mondragon,
Jordan Smith,
George Pearse,
Simon Rasalingham
Abstract:
Objectives: The present study evaluated the impact of a commercially available explainable AI algorithm in augmenting the ability of clinicians to identify lung cancer on chest X-rays (CXR).
Design: This retrospective study evaluated the performance of 11 clinicians for detecting lung cancer from chest radiographs, with and without assistance from a commercially available AI algorithm (red dot,…
▽ More
Objectives: The present study evaluated the impact of a commercially available explainable AI algorithm in augmenting the ability of clinicians to identify lung cancer on chest X-rays (CXR).
Design: This retrospective study evaluated the performance of 11 clinicians for detecting lung cancer from chest radiographs, with and without assistance from a commercially available AI algorithm (red dot, Behold.ai) that predicts suspected lung cancer from CXRs. Clinician performance was evaluated against clinically confirmed diagnoses.
Setting: The study analysed anonymised patient data from an NHS hospital; the dataset consisted of 400 chest radiographs from adult patients (18 years and above) who had a CXR performed in 2020, with corresponding clinical text reports.
Participants: A panel of readers consisting of 11 clinicians (consultant radiologists, radiologist trainees and reporting radiographers) participated in this study.
Main outcome measures: Overall accuracy, sensitivity, specificity and precision for detecting lung cancer on CXRs by clinicians, with and without AI input. Agreement rates between clinicians and performance standard deviation were also evaluated, with and without AI input.
Results: The use of the AI algorithm by clinicians led to an improved overall performance for lung tumour detection, achieving an overall increase of 17.4% of lung cancers being identified on CXRs which would have otherwise been missed, an overall increase in detection of smaller tumours, a 24% and 13% increased detection of stage 1 and stage 2 lung cancers respectively, and standardisation of clinician performance.
Conclusions: This study showed great promise in the clinical utility of AI algorithms in improving early lung cancer diagnosis and promoting health equity through overall improvement in reader performances, without impacting downstream imaging resources.
△ Less
Submitted 31 August, 2022;
originally announced August 2022.
-
Flexible Vertical Federated Learning with Heterogeneous Parties
Authors:
Timothy Castiglia,
Shiqiang Wang,
Stacy Patterson
Abstract:
We propose Flexible Vertical Federated Learning (Flex-VFL), a distributed machine algorithm that trains a smooth, non-convex function in a distributed system with vertically partitioned data. We consider a system with several parties that wish to collaboratively learn a global function. Each party holds a local dataset; the datasets have different features but share the same sample ID space. The p…
▽ More
We propose Flexible Vertical Federated Learning (Flex-VFL), a distributed machine algorithm that trains a smooth, non-convex function in a distributed system with vertically partitioned data. We consider a system with several parties that wish to collaboratively learn a global function. Each party holds a local dataset; the datasets have different features but share the same sample ID space. The parties are heterogeneous in nature: the parties' operating speeds, local model architectures, and optimizers may be different from one another and, further, they may change over time. To train a global model in such a system, Flex-VFL utilizes a form of parallel block coordinate descent, where parties train a partition of the global model via stochastic coordinate descent. We provide theoretical convergence analysis for Flex-VFL and show that the convergence rate is constrained by the party speeds and local optimizer parameters. We apply this analysis and extend our algorithm to adapt party learning rates in response to changing speeds and local optimizer parameters. Finally, we compare the convergence time of Flex-VFL against synchronous and asynchronous VFL algorithms, as well as illustrate the effectiveness of our adaptive extension.
△ Less
Submitted 30 August, 2023; v1 submitted 26 August, 2022;
originally announced August 2022.
-
A Sample-Based Algorithm for Approximately Testing $r$-Robustness of a Digraph
Authors:
Yuhao Yi,
Yuan Wang,
Xingkang He,
Stacy Patterson,
Karl H. Johansson
Abstract:
One of the intensely studied concepts of network robustness is $r$-robustness, which is a network topology property quantified by an integer $r$. It is required by mean subsequence reduced (MSR) algorithms and their variants to achieve resilient consensus. However, determining $r$-robustness is intractable for large networks. In this paper, we propose a sample-based algorithm to approximately test…
▽ More
One of the intensely studied concepts of network robustness is $r$-robustness, which is a network topology property quantified by an integer $r$. It is required by mean subsequence reduced (MSR) algorithms and their variants to achieve resilient consensus. However, determining $r$-robustness is intractable for large networks. In this paper, we propose a sample-based algorithm to approximately test $r$-robustness of a digraph with $n$ vertices and $m$ edges. For a digraph with a moderate assumption on the minimum in-degree, and an error parameter $0<ε\leq 1$, the proposed algorithm distinguishes $(r+εn)$-robust graphs from graphs which are not $r$-robust with probability $(1-δ)$. Our algorithm runs in $\exp(O((\ln{\frac{1}{εδ}})/ε^2))\cdot m$ time. The running time is linear in the number of edges if $ε$ is a constant.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Compressed-VFL: Communication-Efficient Learning with Vertically Partitioned Data
Authors:
Timothy Castiglia,
Anirban Das,
Shiqiang Wang,
Stacy Patterson
Abstract:
We propose Compressed Vertical Federated Learning (C-VFL) for communication-efficient training on vertically partitioned data. In C-VFL, a server and multiple parties collaboratively train a model on their respective features utilizing several local iterations and sharing compressed intermediate results periodically. Our work provides the first theoretical analysis of the effect message compressio…
▽ More
We propose Compressed Vertical Federated Learning (C-VFL) for communication-efficient training on vertically partitioned data. In C-VFL, a server and multiple parties collaboratively train a model on their respective features utilizing several local iterations and sharing compressed intermediate results periodically. Our work provides the first theoretical analysis of the effect message compression has on distributed training over vertically partitioned data. We prove convergence of non-convex objectives at a rate of $O(\frac{1}{\sqrt{T}})$ when the compression error is bounded over the course of training. We provide specific requirements for convergence with common compression techniques, such as quantization and top-$k$ sparsification. Finally, we experimentally show compression can reduce communication by over $90\%$ without a significant decrease in accuracy over VFL without compression.
△ Less
Submitted 28 March, 2023; v1 submitted 16 June, 2022;
originally announced June 2022.
-
A Continuum Approach for Collaborative Task Processing in UAV MEC Networks
Authors:
Lorson Blair,
Carlos A. Varela,
Stacy Patterson
Abstract:
Unmanned aerial vehicles (UAVs) are becoming a viable platform for sensing and estimation in a wide variety of applications including disaster response, search and rescue, and security monitoring. These sensing UAVs have limited battery and computational capabilities, and thus must offload their data so it can be processed to provide actionable intelligence. We consider a compute platform consisti…
▽ More
Unmanned aerial vehicles (UAVs) are becoming a viable platform for sensing and estimation in a wide variety of applications including disaster response, search and rescue, and security monitoring. These sensing UAVs have limited battery and computational capabilities, and thus must offload their data so it can be processed to provide actionable intelligence. We consider a compute platform consisting of a limited number of highly-resourced UAVs that act as mobile edge computing (MEC) servers to process the workload on premises. We propose a novel distributed solution to the collaborative processing problem that adaptively positions the MEC UAVs in response to the changing workload that arises both from the sensing UAVs' mobility and the task generation. Our solution consists of two key building blocks: (1) an efficient workload estimation process by which the UAVs estimate the task field - a continuous approximation of the number of tasks to be processed at each location in the airspace, and (2) a distributed optimization method by which the UAVs partition the task field so as to maximize the system throughput. We evaluate our proposed solution using realistic models of surveillance UAV mobility and show that our method achieves up to 28% improvement in throughput over a non-adaptive baseline approach.
△ Less
Submitted 6 June, 2022;
originally announced June 2022.
-
Formal Guarantees of Timely Progress for Distributed Knowledge Propagation
Authors:
Saswata Paul,
Stacy Patterson,
Carlos Varela
Abstract:
Autonomous air traffic management (ATM) operations for urban air mobility (UAM) will necessitate the use of distributed protocols for decentralized coordination between aircraft. As UAM operations are time-critical, it will be imperative to have formal guarantees of progress for the distributed protocols used in ATM. Under asynchronous settings, message transmission and processing delays are unbou…
▽ More
Autonomous air traffic management (ATM) operations for urban air mobility (UAM) will necessitate the use of distributed protocols for decentralized coordination between aircraft. As UAM operations are time-critical, it will be imperative to have formal guarantees of progress for the distributed protocols used in ATM. Under asynchronous settings, message transmission and processing delays are unbounded, making it impossible to provide deterministic bounds on the time required to make progress. We present an approach for formally guaranteeing timely progress in a Two-Phase Acknowledge distributed knowledge propagation protocol by probabilistically modeling the delays using theories of the Multicopy Two-Hop Relay protocol and the M/M/1 queue system. The guarantee states a probabilistic upper bound to the time for progress as a function of the probabilities of the total transmission and processing delays being less than two given values. We also showcase the development of a library of formal theories, that is tailored towards reasoning about timely progress in distributed protocols deployed in airborne networks, in the Athena proof assistant.
△ Less
Submitted 24 October, 2021;
originally announced October 2021.
-
Cross-Silo Federated Learning for Multi-Tier Networks with Vertical and Horizontal Data Partitioning
Authors:
Anirban Das,
Timothy Castiglia,
Shiqiang Wang,
Stacy Patterson
Abstract:
We consider federated learning in tiered communication networks. Our network model consists of a set of silos, each holding a vertical partition of the data. Each silo contains a hub and a set of clients, with the silo's vertical data shard partitioned horizontally across its clients. We propose Tiered Decentralized Coordinate Descent (TDCD), a communication-efficient decentralized training algori…
▽ More
We consider federated learning in tiered communication networks. Our network model consists of a set of silos, each holding a vertical partition of the data. Each silo contains a hub and a set of clients, with the silo's vertical data shard partitioned horizontally across its clients. We propose Tiered Decentralized Coordinate Descent (TDCD), a communication-efficient decentralized training algorithm for such two-tiered networks. The clients in each silo perform multiple local gradient steps before sharing updates with their hub to reduce communication overhead. Each hub adjusts its coordinates by averaging its workers' updates, and then hubs exchange intermediate updates with one another. We present a theoretical analysis of our algorithm and show the dependence of the convergence rate on the number of vertical partitions and the number of local updates. We further validate our approach empirically via simulation-based experiments using a variety of datasets and objectives.
△ Less
Submitted 25 April, 2024; v1 submitted 19 August, 2021;
originally announced August 2021.
-
Verification of Eventual Consensus in Synod Using a Failure-Aware Actor Model
Authors:
Saswata Paul,
Gul A. Agha,
Stacy Patterson,
Carlos A. Varela
Abstract:
Successfully attaining consensus in the absence of a centralized coordinator is a fundamental problem in distributed multi-agent systems. We analyze progress in the Synod consensus protocol -- which does not assume a unique leader -- under the assumptions of asynchronous communication and potential agent failures. We identify a set of sufficient conditions under which it is possible to guarantee t…
▽ More
Successfully attaining consensus in the absence of a centralized coordinator is a fundamental problem in distributed multi-agent systems. We analyze progress in the Synod consensus protocol -- which does not assume a unique leader -- under the assumptions of asynchronous communication and potential agent failures. We identify a set of sufficient conditions under which it is possible to guarantee that a set of agents will eventually attain consensus. First, a subset of the agents must behave correctly and not permanently fail until consensus is reached, and second, at least one proposal must be eventually uninterrupted by higher-numbered proposals. To formally reason about agent failures, we introduce a failure-aware actor model (FAM). Using FAM, we model the identified conditions and provide a formal proof of eventual progress in Synod. Our proof has been mechanically verified using the Athena proof assistant and, to the best of our knowledge, it is the first machine-checked proof of eventual progress in Synod.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
Multi-Tier Federated Learning for Vertically Partitioned Data
Authors:
Anirban Das,
Stacy Patterson
Abstract:
We consider decentralized model training in tiered communication networks. Our network model consists of a set of silos, each holding a vertical partition of the data. Each silo contains a hub and a set of clients, with the silo's vertical data shard partitioned horizontally across its clients. We propose Tiered Decentralized Coordinate Descent (TDCD), a communication-efficient decentralized train…
▽ More
We consider decentralized model training in tiered communication networks. Our network model consists of a set of silos, each holding a vertical partition of the data. Each silo contains a hub and a set of clients, with the silo's vertical data shard partitioned horizontally across its clients. We propose Tiered Decentralized Coordinate Descent (TDCD), a communication-efficient decentralized training algorithm for such two-tiered networks. To reduce communication overhead, the clients in each silo perform multiple local gradient steps before sharing updates with their hub. Each hub adjusts its coordinates by averaging its workers' updates, and then hubs exchange intermediate updates with one another. We present a theoretical analysis of our algorithm and show the dependence of the convergence rate on the number of vertical partitions, the number of local updates, and the number of clients in each hub. We further validate our approach empirically via simulation-based experiments using a variety of datasets and both convex and non-convex objectives.
△ Less
Submitted 6 February, 2021;
originally announced February 2021.
-
Multi-Level Local SGD for Heterogeneous Hierarchical Networks
Authors:
Timothy Castiglia,
Anirban Das,
Stacy Patterson
Abstract:
We propose Multi-Level Local SGD, a distributed gradient method for learning a smooth, non-convex objective in a heterogeneous multi-level network. Our network model consists of a set of disjoint sub-networks, with a single hub and multiple worker nodes; further, worker nodes may have different operating rates. The hubs exchange information with one another via a connected, but not necessarily com…
▽ More
We propose Multi-Level Local SGD, a distributed gradient method for learning a smooth, non-convex objective in a heterogeneous multi-level network. Our network model consists of a set of disjoint sub-networks, with a single hub and multiple worker nodes; further, worker nodes may have different operating rates. The hubs exchange information with one another via a connected, but not necessarily complete communication network. In our algorithm, sub-networks execute a distributed SGD algorithm, using a hub-and-spoke paradigm, and the hubs periodically average their models with neighboring hubs. We first provide a unified mathematical framework that describes the Multi-Level Local SGD algorithm. We then present a theoretical analysis of the algorithm; our analysis shows the dependence of the convergence error on the worker node heterogeneity, hub network topology, and the number of local, sub-network, and global iterations. We back up our theoretical results via simulation-based experiments using both convex and non-convex objectives.
△ Less
Submitted 18 February, 2022; v1 submitted 27 July, 2020;
originally announced July 2020.
-
Skedulix: Hybrid Cloud Scheduling for Cost-Efficient Execution of Serverless Applications
Authors:
Anirban Das,
Andrew Leaf,
Carlos A. Varela,
Stacy Patterson
Abstract:
We present a framework for scheduling multifunction serverless applications over a hybrid public-private cloud. A set of serverless jobs is input as a batch, and the objective is to schedule function executions over the hybrid platform to minimize the cost of public cloud use, while completing all jobs by a specified deadline. As this scheduling problem is NP-Hard, we propose a greedy algorithm th…
▽ More
We present a framework for scheduling multifunction serverless applications over a hybrid public-private cloud. A set of serverless jobs is input as a batch, and the objective is to schedule function executions over the hybrid platform to minimize the cost of public cloud use, while completing all jobs by a specified deadline. As this scheduling problem is NP-Hard, we propose a greedy algorithm that dynamically determines both the order and placement of each function execution using predictive models of function execution time and network latencies. We present a prototype implementation of our framework that uses AWS Lambda and OpenFaaS, for the public and private cloud, respectively. We evaluate our prototype in live experiments using a mixture of compute and I/O heavy serverless applications. Our results show that our framework can achieve a speedup in batch processing of up to 1.92 times that of an approach that uses only the private cloud, at 40.5% the cost of an approach that uses only the public cloud.
△ Less
Submitted 5 June, 2020;
originally announced June 2020.
-
A Hierarchical Model for Fast Distributed Consensus in Dynamic Networks
Authors:
Timothy Castiglia,
Colin Goldberg,
Stacy Patterson
Abstract:
We present two new consensus algorithms for dynamic networks. The first, Fast Raft, is a variation on the Raft consensus algorithm that reduces the number of message rounds in typical operation. Fast Raft is ideal for fast-paced distributed systems where membership changes over time and where sites must reach consensus quickly. The second, C-Raft, is targeted for distributed systems where sites ar…
▽ More
We present two new consensus algorithms for dynamic networks. The first, Fast Raft, is a variation on the Raft consensus algorithm that reduces the number of message rounds in typical operation. Fast Raft is ideal for fast-paced distributed systems where membership changes over time and where sites must reach consensus quickly. The second, C-Raft, is targeted for distributed systems where sites are grouped into clusters, with fast communication within clusters and slower communication between clusters. C-Raft uses Fast Raft as a building block and defines a hierarchical model of consensus to improve upon throughput in globally distributed systems. We prove the safety and liveness properties of each algorithm. Finally, we present an experimental evaluation of both algorithms in AWS.
△ Less
Submitted 27 July, 2020; v1 submitted 13 April, 2020;
originally announced April 2020.
-
Performance Optimization for Edge-Cloud Serverless Platforms via Dynamic Task Placement
Authors:
Anirban Das,
Shigeru Imai,
Mike P. Wittie,
Stacy Patterson
Abstract:
We present a framework for performance optimization in serverless edge-cloud platforms using dynamic task placement. We focus on applications for smart edge devices, for example, smart cameras or speakers, that need to perform processing tasks on input data in real to near-real time. Our framework allows the user to specify cost and latency requirements for each application task, and for each inpu…
▽ More
We present a framework for performance optimization in serverless edge-cloud platforms using dynamic task placement. We focus on applications for smart edge devices, for example, smart cameras or speakers, that need to perform processing tasks on input data in real to near-real time. Our framework allows the user to specify cost and latency requirements for each application task, and for each input, it determines whether to execute the task on the edge device or in the cloud. Further, for cloud executions, the framework identifies the container resource configuration needed to satisfy the performance goals. We have evaluated our framework in simulation using measurements collected from serverless applications in AWS Lambda and AWS Greengrass. In addition, we have implemented a prototype of our framework that runs in these same platforms. In experiments with our prototype, our models can predict average end-to-end latency with less than 6% error, and we obtain almost three orders of magnitude reduction in end-to-end latency compared to edge-only execution.
△ Less
Submitted 19 May, 2020; v1 submitted 2 March, 2020;
originally announced March 2020.
-
Diffusion and Consensus in a Weakly Coupled Network of Networks
Authors:
Yuhao Yi,
Anirban Das,
Stacy Patterson,
Bassam Bamieh,
Zhongzhi Zhang
Abstract:
We study diffusion and consensus dynamics in a Network of Networks model. In this model, there is a collection of sub-networks, connected to one another using a small number of links. We consider a setting where the links between networks have small weights, or are used less frequently than links within each sub-network. Using spectral perturbation theory, we analyze the diffusion rate and converg…
▽ More
We study diffusion and consensus dynamics in a Network of Networks model. In this model, there is a collection of sub-networks, connected to one another using a small number of links. We consider a setting where the links between networks have small weights, or are used less frequently than links within each sub-network. Using spectral perturbation theory, we analyze the diffusion rate and convergence rate of the investigated systems. Our analysis shows that the first order approximation of the diffusion and convergence rates is independent of the topologies of the individual graphs; the rates depend only on the number of nodes in each graph and the topology of the connecting edges. The second order analysis shows a relationship between the diffusion and convergence rates and the information centrality of the connecting nodes within each sub-network. We further highlight these theoretical results through numerical examples.
△ Less
Submitted 14 February, 2020;
originally announced February 2020.
-
Formalizing Event-Driven Behavior of Serverless Applications
Authors:
Matthew Obetz,
Stacy Patterson,
Ana Milanova
Abstract:
We present new operational semantics for serverless computing that model the event-driven relationships between serverless functions, as well as their interaction with platforms services such as databases and object stores. These semantics precisely encapsulate how control transfers between functions, both directly and through reads and writes to platform services. We use these semantics to define…
▽ More
We present new operational semantics for serverless computing that model the event-driven relationships between serverless functions, as well as their interaction with platforms services such as databases and object stores. These semantics precisely encapsulate how control transfers between functions, both directly and through reads and writes to platform services. We use these semantics to define the notion of the service call graph for serverless applications that captures program flows through functions and services. Finally, we construct service call graphs for twelve serverless JavaScript applications, using a prototype of our call graph construction algorithm, and we evaluate their accuracy.
△ Less
Submitted 7 December, 2019;
originally announced December 2019.
-
Disagreement and Polarization in Two-Party Social Networks
Authors:
Yuhao Yi,
Stacy Patterson
Abstract:
We investigate disagreement and polarization in a social network with two polarizing sources of information. First, we define disagreement and polarization indices in two-party leader-follower models of opinion dynamics. We then give expressions for the indices in terms of a graph Laplacian. The expressions show a relationship between these quantities and the concepts of resistance distance and bi…
▽ More
We investigate disagreement and polarization in a social network with two polarizing sources of information. First, we define disagreement and polarization indices in two-party leader-follower models of opinion dynamics. We then give expressions for the indices in terms of a graph Laplacian. The expressions show a relationship between these quantities and the concepts of resistance distance and biharmonic distance. We next study the problem of designing the network so as to minimize disagreement and polarization. We give conditions for optimal disagreement and polarization, and further, we show that a linear combination of disagreement and polarization of the follower nodes is a convex function of the edge weights between followers. We propose algorithms to address some related continuous and discrete optimization problems and also present analytic results for some interesting examples.
△ Less
Submitted 15 May, 2020; v1 submitted 25 November, 2019;
originally announced November 2019.
-
Shifting Opinions in a Social Network Through Leader Selection
Authors:
Yuhao Yi,
Timothy Castiglia,
Stacy Patterson
Abstract:
We study the French-DeGroot opinion dynamics in a social network with two polarizing parties. We consider a network in which the leaders of one party are given, and we pose the problem of selecting the leader set of the opposing party so as to shift the average opinion to a desired value. When each party has only one leader, we express the average opinion in terms of the transition matrix and the…
▽ More
We study the French-DeGroot opinion dynamics in a social network with two polarizing parties. We consider a network in which the leaders of one party are given, and we pose the problem of selecting the leader set of the opposing party so as to shift the average opinion to a desired value. When each party has only one leader, we express the average opinion in terms of the transition matrix and the stationary distribution of random walks in the network. The analysis shows balance of influence between the two leader nodes. We show that the problem of selecting at most $k$ absolute leaders to shift the average opinion is $\mathbf{NP}$-hard. Then, we reduce the problem to a problem of submodular maximization with a submodular knapsack constraint and an additional cardinality constraint and propose a greedy algorithm with upper bound search to approximate the optimum solution. We also conduct experiments in random networks and real-world networks to show the effectiveness of the algorithm.
△ Less
Submitted 15 May, 2020; v1 submitted 28 October, 2019;
originally announced October 2019.
-
On the Computational Complexity of Finding a Sparse Wasserstein Barycenter
Authors:
Steffen Borgwardt,
Stephan Patterson
Abstract:
The discrete Wasserstein barycenter problem is a minimum-cost mass transport problem for a set of probability measures with finite support. In this paper, we show that finding a barycenter of sparse support is hard, even in dimension 2 and for only 3 measures. We prove this claim by showing that a special case of an intimately related decision problem SCMP -- does there exist a measure with a non-…
▽ More
The discrete Wasserstein barycenter problem is a minimum-cost mass transport problem for a set of probability measures with finite support. In this paper, we show that finding a barycenter of sparse support is hard, even in dimension 2 and for only 3 measures. We prove this claim by showing that a special case of an intimately related decision problem SCMP -- does there exist a measure with a non-mass-splitting transport cost and support size below prescribed bounds? -- is NP-hard for all rational data. Our proof is based on a reduction from planar 3-dimensional matching and follows a strategy laid out by Spieksma and Woeginger (1996) for a reduction to planar, minimum circumference 3-dimensional matching. While we closely mirror the actual steps of their proof, the arguments themselves differ fundamentally due to the complex nature of the discrete barycenter problem. Containment of SCMP in NP will remain open. We prove that, for a given measure, sparsity and cost of an optimal transport to a set of measures can be verified in polynomial time in the size of a bit encoding of the measure. However, the encoding size of a barycenter may be exponential in the encoding size of the underlying measures.
△ Less
Submitted 8 February, 2022; v1 submitted 16 October, 2019;
originally announced October 2019.
-
EdgeBench: Benchmarking Edge Computing Platforms
Authors:
Anirban Das,
Stacy Patterson,
Mike P. Wittie
Abstract:
The emerging trend of edge computing has led several cloud providers to release their own platforms for performing computation at the 'edge' of the network. We compare two such platforms, Amazon AWS Greengrass and Microsoft Azure IoT Edge, using a new benchmark comprising a suite of performance metrics. We also compare the performance of the edge frameworks to cloud-only implementations available…
▽ More
The emerging trend of edge computing has led several cloud providers to release their own platforms for performing computation at the 'edge' of the network. We compare two such platforms, Amazon AWS Greengrass and Microsoft Azure IoT Edge, using a new benchmark comprising a suite of performance metrics. We also compare the performance of the edge frameworks to cloud-only implementations available in their respective cloud ecosystems. Amazon AWS Greengrass and Azure IoT Edge use different underlying technologies, edge Lambda functions vs. containers, and so we also elaborate on platform features available to developers. Our study shows that both of these edge platforms provide comparable performance, which nevertheless differs in important ways for key types of workloads used in edge applications. Finally, we discuss several current issues and challenges we faced in deploying these platforms.
△ Less
Submitted 14 November, 2018;
originally announced November 2018.
-
Maximizing Diversity of Opinion in Social Networks
Authors:
Erika Mackin,
Stacy Patterson
Abstract:
We study the problem of maximizing opinion diversity in a social network that includes opinion leaders with binary opposing opinions. The members of the network who are not leaders form their opinions using the French-DeGroot model of opinion dynamics. To quantify the diversity of such a system, we adapt two diversity measures from ecology to our setting, the Simpson Diversity Index and the Shanno…
▽ More
We study the problem of maximizing opinion diversity in a social network that includes opinion leaders with binary opposing opinions. The members of the network who are not leaders form their opinions using the French-DeGroot model of opinion dynamics. To quantify the diversity of such a system, we adapt two diversity measures from ecology to our setting, the Simpson Diversity Index and the Shannon Index. Using these two measures, we formalize the problem of how to place a single leader with opinion 1, given a network with a leader with opinion 0, so as to maximize the opinion diversity. We give analytical solutions to these problems for paths, cycles, and trees, and we highlight our results through a numerical example.
△ Less
Submitted 28 March, 2019; v1 submitted 8 November, 2018;
originally announced November 2018.
-
BubbleTouch: A Quasi-Static Tactile Skin Simulator
Authors:
Brayden Hollis,
Stacy Patterson,
**da Cui,
Jeff Trinkle
Abstract:
We present BubbleTouch, an open source quasi-static simulator for robotic tactile skins. BubbleTouch can be used to simulate contact with a robot's tactile skin patches as it interacts with humans and objects. The simulator creates detailed traces of contact forces that can be used in experiments in tactile contact activities. We summarize the design of BubbleTouch and highlight our recent work th…
▽ More
We present BubbleTouch, an open source quasi-static simulator for robotic tactile skins. BubbleTouch can be used to simulate contact with a robot's tactile skin patches as it interacts with humans and objects. The simulator creates detailed traces of contact forces that can be used in experiments in tactile contact activities. We summarize the design of BubbleTouch and highlight our recent work that uses BubbleTouch for experiments with tactile object recognition.
△ Less
Submitted 24 September, 2018;
originally announced September 2018.
-
Maximizing the Number of Spanning Trees in a Connected Graph
Authors:
Huan Li,
Stacy Patterson,
Yuhao Yi,
Zhongzhi Zhang
Abstract:
We study the problem of maximizing the number of spanning trees in a connected graph by adding at most $k$ edges from a given candidate edge set. We give both algorithmic and hardness results for this problem:
- We give a greedy algorithm that, using submodularity, obtains an approximation ratio of $(1 - 1/e - ε)$ in the exponent of the number of spanning trees for any $ε> 0$ in time…
▽ More
We study the problem of maximizing the number of spanning trees in a connected graph by adding at most $k$ edges from a given candidate edge set. We give both algorithmic and hardness results for this problem:
- We give a greedy algorithm that, using submodularity, obtains an approximation ratio of $(1 - 1/e - ε)$ in the exponent of the number of spanning trees for any $ε> 0$ in time $\tilde{O}(m ε^{-1} + (n + q) ε^{-3})$, where $m$ and $q$ is the number of edges in the original graph and the candidate edge set, respectively. Our running time is optimal with respect to the input size up to logarithmic factors, and substantially improves upon the $O(n^3)$ running time of the previous proposed greedy algorithm with approximation ratio $(1 - 1/e)$ in the exponent. Notably, the independence of our running time of $k$ is novel, comparing to conventional top-$k$ selections on graphs that usually run in $Ω(mk)$ time. A key ingredient of our greedy algorithm is a routine for maintaining effective resistances under edge additions in an online-offline hybrid setting.
- We show the exponential inapproximability of this problem by proving that there exists a constant $c > 0$ such that it is NP-hard to approximate the optimum number of spanning trees in the exponent within $(1 - c)$. This inapproximability result follows from a reduction from the minimum path cover in undirected graphs, whose hardness again follows from the constant inapproximability of the Traveling Salesman Problem (TSP) with distances 1 and 2. Thus, the approximation ratio of our algorithm is also optimal up to a constant factor in the exponent. To our knowledge, this is the first hardness of approximation result for maximizing the number of spanning trees in a graph, or equivalently, by Kirchhoff's matrix-tree theorem, maximizing the determinant of an SDDM matrix.
△ Less
Submitted 14 July, 2018; v1 submitted 8 April, 2018;
originally announced April 2018.
-
Scale-free Loopy Structure is Resistant to Noise in Consensus Dynamics in Complex Networks
Authors:
Yuhao Yi,
Zhongzhi Zhang,
Stacy Patterson
Abstract:
The vast majority of real-world networks are scale-free, loopy, and sparse, with a power-law degree distribution and a constant average degree. In this paper, we study first-order consensus dynamics in binary scale-free networks, where vertices are subject to white noise. We focus on the coherence of networks characterized in terms of the $H_2$-norm, which quantifies how closely agents track the c…
▽ More
The vast majority of real-world networks are scale-free, loopy, and sparse, with a power-law degree distribution and a constant average degree. In this paper, we study first-order consensus dynamics in binary scale-free networks, where vertices are subject to white noise. We focus on the coherence of networks characterized in terms of the $H_2$-norm, which quantifies how closely agents track the consensus value. We first provide a lower bound of coherence of a network in terms of its average degree, which is independent of the network order. We then study the coherence of some sparse, scale-free real-world networks, which approaches a constant. We also study numerically the coherence of Barabási-Albert networks and high-dimensional random Apollonian networks, which also converges to a constant when the networks grow. Finally, based on the connection of coherence and the Kirchhoff index, we study analytically the coherence of two deterministically-growing sparse networks and obtain the exact expressions, which tend to small constants. Our results indicate that the effect of noise on the consensus dynamics in power-law networks is negligible. We argue that scale-free topology, together with loopy structure, is responsible for the strong robustness with respect to noisy consensus dynamics in power-law networks.
△ Less
Submitted 1 January, 2018;
originally announced January 2018.
-
Compressed Sensing for Scalable Robotic Tactile Skins
Authors:
Brayden Hollis,
Stacy Patterson,
Jeff Trinkle
Abstract:
The potential of large tactile arrays to improve robot perception for safe operation in human-dominated environments and of high-resolution tactile arrays to enable human-level dexterous manipulation is well accepted. However, the increase in the number of tactile sensing elements introduces challenges including wiring complexity, data acquisition, and data processing. To help address these challe…
▽ More
The potential of large tactile arrays to improve robot perception for safe operation in human-dominated environments and of high-resolution tactile arrays to enable human-level dexterous manipulation is well accepted. However, the increase in the number of tactile sensing elements introduces challenges including wiring complexity, data acquisition, and data processing. To help address these challenges, we develop a tactile sensing technique based on compressed sensing. Compressed sensing simultaneously performs data sampling and compression with recovery guarantees and has been successfully applied in computer vision. We use compressed sensing techniques for tactile data acquisition to reduce hardware complexity and data transmission, while allowing fast, accurate reconstruction of the full-resolution signal. For our simulated test array of 4096 taxels, we achieve reconstruction quality equivalent to measuring all taxel signals independently (the full signal) from just 1024 measurements (the compressed signal) at a rate over 100Hz. We then apply tactile compressed sensing to the problem of object classification. Specifically, we perform object classification on the compressed tactile data based on a method called compressed learning. We obtain up to 98% classification accuracy, even with a compression ratio of 64:1.
△ Less
Submitted 12 May, 2017;
originally announced May 2017.
-
Compressed Learning for Tactile Object Classification
Authors:
Brayden Hollis,
Stacy Patterson,
Jeff Trinkle
Abstract:
The potential of large tactile arrays to improve robot perception for safe operation in human-dominated environments and of high-resolution tactile arrays to enable human-level dexterous manipulation is well accepted. However, the increase in the number of tactile sensing elements introduces challenges including wiring complexity, power consumption, and data processing. To help address these chall…
▽ More
The potential of large tactile arrays to improve robot perception for safe operation in human-dominated environments and of high-resolution tactile arrays to enable human-level dexterous manipulation is well accepted. However, the increase in the number of tactile sensing elements introduces challenges including wiring complexity, power consumption, and data processing. To help address these challenges, we previously developed a tactile sensing technique based compressed sensing that reduces hardware complexity and data transmission, while allowing accurate reconstruction of the full-resolution signal. In this paper, we apply tactile compressed sensing to the problem of object classification. Specifically, we perform object classification on the compressed tactile data. We evaluate our method using BubbleTouch, our tactile array simulator. Our results show our approach achieves high classification accuracy, even with compression factors up to 64.
△ Less
Submitted 23 September, 2016;
originally announced September 2016.
-
Compressed Sensing for Tactile Skins
Authors:
Brayden Hollis,
Stacy Patterson,
Jeff Trinkle
Abstract:
Whole body tactile perception via tactile skins offers large benefits for robots in unstructured environments. To fully realize this benefit, tactile systems must support real-time data acquisition over a massive number of tactile sensor elements. We present a novel approach for scalable tactile data acquisition using compressed sensing. We first demonstrate that the tactile data is amenable to co…
▽ More
Whole body tactile perception via tactile skins offers large benefits for robots in unstructured environments. To fully realize this benefit, tactile systems must support real-time data acquisition over a massive number of tactile sensor elements. We present a novel approach for scalable tactile data acquisition using compressed sensing. We first demonstrate that the tactile data is amenable to compressed sensing techniques. We then develop a solution for fast data sampling, compression, and reconstruction that is suited for tactile system hardware and has potential for reducing the wiring complexity. Finally, we evaluate the performance of our technique on simulated tactile sensor networks. Our evaluations show that compressed sensing, with a compression ratio of 3 to 1, can achieve higher signal acquisition accuracy than full data acquisition of noisy sensor data.
△ Less
Submitted 3 March, 2016;
originally announced March 2016.
-
Efficient, Optimal $k$-Leader Selection for Coherent, One-Dimensional Formations
Authors:
Stacy Patterson,
Neil McGlohon,
Kirill Dyagilev
Abstract:
We study the problem of optimal leader selection in consensus networks with noisy relative information. The objective is to identify the set of $k$ leaders that minimizes the formation's deviation from the desired trajectory established by the leaders. An optimal leader set can be found by an exhaustive search over all possible leader sets; however, this approach is not scalable to large networks.…
▽ More
We study the problem of optimal leader selection in consensus networks with noisy relative information. The objective is to identify the set of $k$ leaders that minimizes the formation's deviation from the desired trajectory established by the leaders. An optimal leader set can be found by an exhaustive search over all possible leader sets; however, this approach is not scalable to large networks. In recent years, several works have proposed approximation algorithms to the $k$-leader selection problem, yet the question of whether there exists an efficient, non-combinatorial method to identify the optimal leader set remains open. This work takes a first step towards answering this question. We show that, in one-dimensional weighted graphs, namely path graphs and ring graphs, the $k$-leader selection problem can be solved in polynomial time (in both $k$ and the network size $n$). We give an $O(n^3)$ solution for optimal $k$-leader selection in path graphs and an $O(kn^3)$ solution for optimal $k$-leader selection in ring graphs.
△ Less
Submitted 19 December, 2014;
originally announced December 2014.
-
Distributed Compressed Sensing For Static and Time-Varying Networks
Authors:
Stacy Patterson,
Yonina C. Eldar,
Idit Keidar
Abstract:
We consider the problem of in-network compressed sensing from distributed measurements. Every agent has a set of measurements of a signal $x$, and the objective is for the agents to recover $x$ from their collective measurements using only communication with neighbors in the network. Our distributed approach to this problem is based on the centralized Iterative Hard Thresholding algorithm (IHT). W…
▽ More
We consider the problem of in-network compressed sensing from distributed measurements. Every agent has a set of measurements of a signal $x$, and the objective is for the agents to recover $x$ from their collective measurements using only communication with neighbors in the network. Our distributed approach to this problem is based on the centralized Iterative Hard Thresholding algorithm (IHT). We first present a distributed IHT algorithm for static networks that leverages standard tools from distributed computing to execute in-network computations with minimized bandwidth consumption. Next, we address distributed signal recovery in networks with time-varying topologies. The network dynamics necessarily introduce inaccuracies to our in-network computations. To accommodate these inaccuracies, we show how centralized IHT can be extended to include inexact computations while still providing the same recovery guarantees as the original IHT algorithm. We then leverage these new theoretical results to develop a distributed version of IHT for time-varying networks. Evaluations show that our distributed algorithms for both static and time-varying networks outperform previously proposed solutions in time and bandwidth by several orders of magnitude.
△ Less
Submitted 6 August, 2014; v1 submitted 28 August, 2013;
originally announced August 2013.
-
Distributed Sparse Signal Recovery For Sensor Networks
Authors:
Stacy Patterson,
Yonina C. Eldar,
Idit Keidar
Abstract:
We propose a distributed algorithm for sparse signal recovery in sensor networks based on Iterative Hard Thresholding (IHT). Every agent has a set of measurements of a signal x, and the objective is for the agents to recover x from their collective measurements at a minimal communication cost and with low computational complexity. A naive distributed implementation of IHT would require global comm…
▽ More
We propose a distributed algorithm for sparse signal recovery in sensor networks based on Iterative Hard Thresholding (IHT). Every agent has a set of measurements of a signal x, and the objective is for the agents to recover x from their collective measurements at a minimal communication cost and with low computational complexity. A naive distributed implementation of IHT would require global communication of every agent's full state in each iteration. We find that we can dramatically reduce this communication cost by leveraging solutions to the distributed top-K problem in the database literature. Evaluations show that our algorithm requires up to three orders of magnitude less total bandwidth than the best-known distributed basis pursuit method.
△ Less
Submitted 21 February, 2013; v1 submitted 25 December, 2012;
originally announced December 2012.
-
Serializability, not Serial: Concurrency Control and Availability in Multi-Datacenter Datastores
Authors:
Stacy Patterson,
Aaron J. Elmore,
Faisal Nawab,
Divyakant Agrawal,
Amr El Abbadi
Abstract:
We present a framework for concurrency control and availability in multi-datacenter datastores. While we consider Google's Megastore as our motivating example, we define general abstractions for key components, making our solution extensible to any system that satisfies the abstraction properties. We first develop and analyze a transaction management and replication protocol based on a straightfor…
▽ More
We present a framework for concurrency control and availability in multi-datacenter datastores. While we consider Google's Megastore as our motivating example, we define general abstractions for key components, making our solution extensible to any system that satisfies the abstraction properties. We first develop and analyze a transaction management and replication protocol based on a straightforward implementation of the Paxos algorithm. Our investigation reveals that this protocol acts as a concurrency prevention mechanism rather than a concurrency control mechanism. We then propose an enhanced protocol called Paxos with Combination and Promotion (Paxos-CP) that provides true transaction concurrency while requiring the same per instance message complexity as the basic Paxos protocol. Finally, we compare the performance of Paxos and Paxos-CP in a multi-datacenter experimental study, and we demonstrate that Paxos-CP results in significantly fewer aborted transactions than basic Paxos.
△ Less
Submitted 1 August, 2012;
originally announced August 2012.
-
Conditional mean embeddings as regressors - supplementary
Authors:
Steffen Grünewälder,
Guy Lever,
Luca Baldassarre,
Sam Patterson,
Arthur Gretton,
Massimilano Pontil
Abstract:
We demonstrate an equivalence between reproducing kernel Hilbert space (RKHS) embeddings of conditional distributions and vector-valued regressors. This connection introduces a natural regularized loss function which the RKHS embeddings minimise, providing an intuitive understanding of the embeddings and a justification for their use. Furthermore, the equivalence allows the application of vector-v…
▽ More
We demonstrate an equivalence between reproducing kernel Hilbert space (RKHS) embeddings of conditional distributions and vector-valued regressors. This connection introduces a natural regularized loss function which the RKHS embeddings minimise, providing an intuitive understanding of the embeddings and a justification for their use. Furthermore, the equivalence allows the application of vector-valued regression methods and results to the problem of learning conditional distributions. Using this link we derive a sparse version of the embedding by considering alternative formulations. Further, by applying convergence results for vector-valued regression to the embedding problem we derive minimax convergence rates which are O(\log(n)/n) -- compared to current state of the art rates of O(n^{-1/4}) -- and are valid under milder and more intuitive assumptions. These minimax upper rates coincide with lower rates up to a logarithmic factor, showing that the embedding method achieves nearly optimal rates. We study our sparse embedding algorithm in a reinforcement learning task where the algorithm shows significant improvement in sparsity over an incomplete Cholesky decomposition.
△ Less
Submitted 24 July, 2012; v1 submitted 21 May, 2012;
originally announced May 2012.
-
Coherence in Large-Scale Networks: Dimension-Dependent Limitations of Local Feedback
Authors:
Bassam Bamieh,
Mihailo R. Jovanović,
Partha Mitra,
Stacy Patterson
Abstract:
We consider distributed consensus and vehicular formation control problems. Specifically we address the question of whether local feedback is sufficient to maintain coherence in large-scale networks subject to stochastic disturbances. We define macroscopic performance measures which are global quantities that capture the notion of coherence; a notion of global order that quantifies how closely the…
▽ More
We consider distributed consensus and vehicular formation control problems. Specifically we address the question of whether local feedback is sufficient to maintain coherence in large-scale networks subject to stochastic disturbances. We define macroscopic performance measures which are global quantities that capture the notion of coherence; a notion of global order that quantifies how closely the formation resembles a solid object. We consider how these measures scale asymptotically with network size in the topologies of regular lattices in 1, 2 and higher dimensions, with vehicular platoons corresponding to the 1 dimensional case. A common phenomenon appears where a higher spatial dimension implies a more favorable scaling of coherence measures, with a dimensions of 3 being necessary to achieve coherence in consensus and vehicular formations under certain conditions. In particular, we show that it is impossible to have large coherent one dimensional vehicular platoons with only local feedback. We analyze these effects in terms of the underlying energetic modes of motion, showing that they take the form of large temporal and spatial scales resulting in an accordion-like motion of formations. A conclusion can be drawn that in low spatial dimensions, local feedback is unable to regulate large-scale disturbances, but it can in higher spatial dimensions. This phenomenon is distinct from, and unrelated to string instability issues which are commonly encountered in control problems for automated highways.
△ Less
Submitted 16 December, 2011;
originally announced December 2011.