-
LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning
Authors:
Timothy Castiglia,
Yi Zhou,
Shiqiang Wang,
Swanand Kadhe,
Nathalie Baracaldo,
Stacy Patterson
Abstract:
We propose LESS-VFL, a communication-efficient feature selection method for distributed systems with vertically partitioned data. We consider a system of a server and several parties with local datasets that share a sample ID space but have different feature sets. The parties wish to collaboratively train a model for a prediction task. As part of the training, the parties wish to remove unimportan…
▽ More
We propose LESS-VFL, a communication-efficient feature selection method for distributed systems with vertically partitioned data. We consider a system of a server and several parties with local datasets that share a sample ID space but have different feature sets. The parties wish to collaboratively train a model for a prediction task. As part of the training, the parties wish to remove unimportant features in the system to improve generalization, efficiency, and explainability. In LESS-VFL, after a short pre-training period, the server optimizes its part of the global model to determine the relevant outputs from party models. This information is shared with the parties to then allow local feature selection without communication. We analytically prove that LESS-VFL removes spurious features from model training. We provide extensive empirical evidence that LESS-VFL can achieve high accuracy and remove spurious features at a fraction of the communication cost of other feature selection approaches.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Flexible Vertical Federated Learning with Heterogeneous Parties
Authors:
Timothy Castiglia,
Shiqiang Wang,
Stacy Patterson
Abstract:
We propose Flexible Vertical Federated Learning (Flex-VFL), a distributed machine algorithm that trains a smooth, non-convex function in a distributed system with vertically partitioned data. We consider a system with several parties that wish to collaboratively learn a global function. Each party holds a local dataset; the datasets have different features but share the same sample ID space. The p…
▽ More
We propose Flexible Vertical Federated Learning (Flex-VFL), a distributed machine algorithm that trains a smooth, non-convex function in a distributed system with vertically partitioned data. We consider a system with several parties that wish to collaboratively learn a global function. Each party holds a local dataset; the datasets have different features but share the same sample ID space. The parties are heterogeneous in nature: the parties' operating speeds, local model architectures, and optimizers may be different from one another and, further, they may change over time. To train a global model in such a system, Flex-VFL utilizes a form of parallel block coordinate descent, where parties train a partition of the global model via stochastic coordinate descent. We provide theoretical convergence analysis for Flex-VFL and show that the convergence rate is constrained by the party speeds and local optimizer parameters. We apply this analysis and extend our algorithm to adapt party learning rates in response to changing speeds and local optimizer parameters. Finally, we compare the convergence time of Flex-VFL against synchronous and asynchronous VFL algorithms, as well as illustrate the effectiveness of our adaptive extension.
△ Less
Submitted 30 August, 2023; v1 submitted 26 August, 2022;
originally announced August 2022.
-
Compressed-VFL: Communication-Efficient Learning with Vertically Partitioned Data
Authors:
Timothy Castiglia,
Anirban Das,
Shiqiang Wang,
Stacy Patterson
Abstract:
We propose Compressed Vertical Federated Learning (C-VFL) for communication-efficient training on vertically partitioned data. In C-VFL, a server and multiple parties collaboratively train a model on their respective features utilizing several local iterations and sharing compressed intermediate results periodically. Our work provides the first theoretical analysis of the effect message compressio…
▽ More
We propose Compressed Vertical Federated Learning (C-VFL) for communication-efficient training on vertically partitioned data. In C-VFL, a server and multiple parties collaboratively train a model on their respective features utilizing several local iterations and sharing compressed intermediate results periodically. Our work provides the first theoretical analysis of the effect message compression has on distributed training over vertically partitioned data. We prove convergence of non-convex objectives at a rate of $O(\frac{1}{\sqrt{T}})$ when the compression error is bounded over the course of training. We provide specific requirements for convergence with common compression techniques, such as quantization and top-$k$ sparsification. Finally, we experimentally show compression can reduce communication by over $90\%$ without a significant decrease in accuracy over VFL without compression.
△ Less
Submitted 28 March, 2023; v1 submitted 16 June, 2022;
originally announced June 2022.
-
Cross-Silo Federated Learning for Multi-Tier Networks with Vertical and Horizontal Data Partitioning
Authors:
Anirban Das,
Timothy Castiglia,
Shiqiang Wang,
Stacy Patterson
Abstract:
We consider federated learning in tiered communication networks. Our network model consists of a set of silos, each holding a vertical partition of the data. Each silo contains a hub and a set of clients, with the silo's vertical data shard partitioned horizontally across its clients. We propose Tiered Decentralized Coordinate Descent (TDCD), a communication-efficient decentralized training algori…
▽ More
We consider federated learning in tiered communication networks. Our network model consists of a set of silos, each holding a vertical partition of the data. Each silo contains a hub and a set of clients, with the silo's vertical data shard partitioned horizontally across its clients. We propose Tiered Decentralized Coordinate Descent (TDCD), a communication-efficient decentralized training algorithm for such two-tiered networks. The clients in each silo perform multiple local gradient steps before sharing updates with their hub to reduce communication overhead. Each hub adjusts its coordinates by averaging its workers' updates, and then hubs exchange intermediate updates with one another. We present a theoretical analysis of our algorithm and show the dependence of the convergence rate on the number of vertical partitions and the number of local updates. We further validate our approach empirically via simulation-based experiments using a variety of datasets and objectives.
△ Less
Submitted 25 April, 2024; v1 submitted 19 August, 2021;
originally announced August 2021.
-
Multi-Level Local SGD for Heterogeneous Hierarchical Networks
Authors:
Timothy Castiglia,
Anirban Das,
Stacy Patterson
Abstract:
We propose Multi-Level Local SGD, a distributed gradient method for learning a smooth, non-convex objective in a heterogeneous multi-level network. Our network model consists of a set of disjoint sub-networks, with a single hub and multiple worker nodes; further, worker nodes may have different operating rates. The hubs exchange information with one another via a connected, but not necessarily com…
▽ More
We propose Multi-Level Local SGD, a distributed gradient method for learning a smooth, non-convex objective in a heterogeneous multi-level network. Our network model consists of a set of disjoint sub-networks, with a single hub and multiple worker nodes; further, worker nodes may have different operating rates. The hubs exchange information with one another via a connected, but not necessarily complete communication network. In our algorithm, sub-networks execute a distributed SGD algorithm, using a hub-and-spoke paradigm, and the hubs periodically average their models with neighboring hubs. We first provide a unified mathematical framework that describes the Multi-Level Local SGD algorithm. We then present a theoretical analysis of the algorithm; our analysis shows the dependence of the convergence error on the worker node heterogeneity, hub network topology, and the number of local, sub-network, and global iterations. We back up our theoretical results via simulation-based experiments using both convex and non-convex objectives.
△ Less
Submitted 18 February, 2022; v1 submitted 27 July, 2020;
originally announced July 2020.
-
A Hierarchical Model for Fast Distributed Consensus in Dynamic Networks
Authors:
Timothy Castiglia,
Colin Goldberg,
Stacy Patterson
Abstract:
We present two new consensus algorithms for dynamic networks. The first, Fast Raft, is a variation on the Raft consensus algorithm that reduces the number of message rounds in typical operation. Fast Raft is ideal for fast-paced distributed systems where membership changes over time and where sites must reach consensus quickly. The second, C-Raft, is targeted for distributed systems where sites ar…
▽ More
We present two new consensus algorithms for dynamic networks. The first, Fast Raft, is a variation on the Raft consensus algorithm that reduces the number of message rounds in typical operation. Fast Raft is ideal for fast-paced distributed systems where membership changes over time and where sites must reach consensus quickly. The second, C-Raft, is targeted for distributed systems where sites are grouped into clusters, with fast communication within clusters and slower communication between clusters. C-Raft uses Fast Raft as a building block and defines a hierarchical model of consensus to improve upon throughput in globally distributed systems. We prove the safety and liveness properties of each algorithm. Finally, we present an experimental evaluation of both algorithms in AWS.
△ Less
Submitted 27 July, 2020; v1 submitted 13 April, 2020;
originally announced April 2020.
-
Shifting Opinions in a Social Network Through Leader Selection
Authors:
Yuhao Yi,
Timothy Castiglia,
Stacy Patterson
Abstract:
We study the French-DeGroot opinion dynamics in a social network with two polarizing parties. We consider a network in which the leaders of one party are given, and we pose the problem of selecting the leader set of the opposing party so as to shift the average opinion to a desired value. When each party has only one leader, we express the average opinion in terms of the transition matrix and the…
▽ More
We study the French-DeGroot opinion dynamics in a social network with two polarizing parties. We consider a network in which the leaders of one party are given, and we pose the problem of selecting the leader set of the opposing party so as to shift the average opinion to a desired value. When each party has only one leader, we express the average opinion in terms of the transition matrix and the stationary distribution of random walks in the network. The analysis shows balance of influence between the two leader nodes. We show that the problem of selecting at most $k$ absolute leaders to shift the average opinion is $\mathbf{NP}$-hard. Then, we reduce the problem to a problem of submodular maximization with a submodular knapsack constraint and an additional cardinality constraint and propose a greedy algorithm with upper bound search to approximate the optimum solution. We also conduct experiments in random networks and real-world networks to show the effectiveness of the algorithm.
△ Less
Submitted 15 May, 2020; v1 submitted 28 October, 2019;
originally announced October 2019.