Search | arXiv e-print repository

FedAUXfdp: Differentially Private One-Shot Federated Distillation

Authors: Haley Hoech, Roman Rischke, Karsten Müller, Wojciech Samek

Abstract: Federated learning suffers in the case of non-iid local datasets, i.e., when the distributions of the clients' data are heterogeneous. One promising approach to this challenge is the recently proposed method FedAUX, an augmentation of federated distillation with robust results on even highly heterogeneous client data. FedAUX is a partially $(ε, δ)$-differentially private method, insofar as the cli… ▽ More Federated learning suffers in the case of non-iid local datasets, i.e., when the distributions of the clients' data are heterogeneous. One promising approach to this challenge is the recently proposed method FedAUX, an augmentation of federated distillation with robust results on even highly heterogeneous client data. FedAUX is a partially $(ε, δ)$-differentially private method, insofar as the clients' private data is protected in only part of the training it takes part in. This work contributes a fully differentially private modification, termed FedAUXfdp. We further contribute an upper bound on the $l_2$-sensitivity of regularized multinomial logistic regression. In experiments with deep networks on large-scale image datasets, FedAUXfdp with strong differential privacy guarantees performs significantly better than other equally privatized SOTA baselines on non-iid client data in just a single communication round. Full privatization of the modified method results in a negligible reduction in accuracy at all levels of data heterogeneity. △ Less

Submitted 21 June, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

arXiv:2102.02514 [pdf, other]

FedAUX: Leveraging Unlabeled Auxiliary Data in Federated Learning

Authors: Felix Sattler, Tim Korjakow, Roman Rischke, Wojciech Samek

Abstract: Federated Distillation (FD) is a popular novel algorithmic paradigm for Federated Learning, which achieves training performance competitive to prior parameter averaging based methods, while additionally allowing the clients to train different model architectures, by distilling the client predictions on an unlabeled auxiliary set of data into a student model. In this work we propose FedAUX, an exte… ▽ More Federated Distillation (FD) is a popular novel algorithmic paradigm for Federated Learning, which achieves training performance competitive to prior parameter averaging based methods, while additionally allowing the clients to train different model architectures, by distilling the client predictions on an unlabeled auxiliary set of data into a student model. In this work we propose FedAUX, an extension to FD, which, under the same set of assumptions, drastically improves performance by deriving maximum utility from the unlabeled auxiliary data. FedAUX modifies the FD training procedure in two ways: First, unsupervised pre-training on the auxiliary data is performed to find a model initialization for the distributed training. Second, $(\varepsilon, δ)$-differentially private certainty scoring is used to weight the ensemble predictions on the auxiliary data according to the certainty of each client model. Experiments on large-scale convolutional neural networks and transformer models demonstrate, that the training performance of FedAUX exceeds SOTA FL baseline methods by a substantial margin in both the iid and non-iid regime, further closing the gap to centralized training performance. Code is available at github.com/fedl-repo/fedaux. △ Less

Submitted 4 February, 2021; originally announced February 2021.

arXiv:2012.00632 [pdf, other]

Communication-Efficient Federated Distillation

Authors: Felix Sattler, Arturo Marban, Roman Rischke, Wojciech Samek

Abstract: Communication constraints are one of the major challenges preventing the wide-spread adoption of Federated Learning systems. Recently, Federated Distillation (FD), a new algorithmic paradigm for Federated Learning with fundamentally different communication properties, emerged. FD methods leverage ensemble distillation techniques and exchange model outputs, presented as soft labels on an unlabeled… ▽ More Communication constraints are one of the major challenges preventing the wide-spread adoption of Federated Learning systems. Recently, Federated Distillation (FD), a new algorithmic paradigm for Federated Learning with fundamentally different communication properties, emerged. FD methods leverage ensemble distillation techniques and exchange model outputs, presented as soft labels on an unlabeled public data set, between the central server and the participating clients. While for conventional Federated Learning algorithms, like Federated Averaging (FA), communication scales with the size of the jointly trained model, in FD communication scales with the distillation data set size, resulting in advantageous communication properties, especially when large models are trained. In this work, we investigate FD from the perspective of communication efficiency by analyzing the effects of active distillation-data curation, soft-label quantization and delta-coding techniques. Based on the insights gathered from this analysis, we present Compressed Federated Distillation (CFD), an efficient Federated Distillation method. Extensive experiments on Federated image classification and language modeling problems demonstrate that our method can reduce the amount of communication necessary to achieve fixed performance targets by more than two orders of magnitude, when compared to FD and by more than four orders of magnitude when compared with FA. △ Less

Submitted 1 December, 2020; originally announced December 2020.

arXiv:1811.12657 [pdf, ps, other]

Optimal Algorithms for Scheduling under Time-of-Use Tariffs

Authors: Lin Chen, Nicole Megow, Roman Rischke, Leen Stougie, José Verschae

Abstract: We consider a natural generalization of classical scheduling problems in which using a time unit for processing a job causes some time-dependent cost which must be paid in addition to the standard scheduling cost. We study the scheduling objectives of minimizing the makespan and the sum of (weighted) completion times. It is not difficult to derive a polynomial-time algorithm for preemptive schedul… ▽ More We consider a natural generalization of classical scheduling problems in which using a time unit for processing a job causes some time-dependent cost which must be paid in addition to the standard scheduling cost. We study the scheduling objectives of minimizing the makespan and the sum of (weighted) completion times. It is not difficult to derive a polynomial-time algorithm for preemptive scheduling to minimize the makespan on unrelated machines. The problem of minimizing the total (weighted) completion time is considerably harder, even on a single machine. We present a polynomial-time algorithm that computes for any given sequence of jobs an optimal schedule, i.e., the optimal set of time-slots to be used for scheduling jobs according to the given sequence. This result is based on dynamic programming using a subtle analysis of the structure of optimal solutions and a potential function argument. With this algorithm, we solve the unweighted problem optimally in polynomial time. For the more general problem, in which jobs may have individual weights, we develop a polynomial-time approximation scheme (PTAS) based on a dual scheduling approach introduced for scheduling on a machine of varying speed. As the weighted problem is strongly NP-hard, our PTAS is the best possible approximation we can hope for. △ Less

Submitted 30 November, 2018; originally announced November 2018.

Comments: 17 pages; A preliminary version of this paper with a subset of results appeared in the Proceedings of MFCS 2015

arXiv:1701.08809 [pdf, ps, other]

Scheduling Maintenance Jobs in Networks

Authors: Fidaa Abed, Lin Chen, Yann Disser, Martin Groß, Nicole Megow, Julie Meißner, Alexander T. Richter, Roman Rischke

Abstract: We investigate the problem of scheduling the maintenance of edges in a network, motivated by the goal of minimizing outages in transportation or telecommunication networks. We focus on maintaining connectivity between two nodes over time; for the special case of path networks, this is related to the problem of minimizing the busy time of machines. We show that the problem can be solved in polyno… ▽ More We investigate the problem of scheduling the maintenance of edges in a network, motivated by the goal of minimizing outages in transportation or telecommunication networks. We focus on maintaining connectivity between two nodes over time; for the special case of path networks, this is related to the problem of minimizing the busy time of machines. We show that the problem can be solved in polynomial time in arbitrary networks if preemption is allowed. If preemption is restricted to integral time points, the problem is NP-hard and in the non-preemptive case we give strong non-approximability results. Furthermore, we give tight bounds on the power of preemption, that is, the maximum ratio of the values of non-preemptive and preemptive optimal solutions. Interestingly, the preemptive and the non-preemptive problem can be solved efficiently on paths, whereas we show that mixing both leads to a weakly NP-hard problem that allows for a simple 2-approximation. △ Less

Submitted 30 January, 2017; originally announced January 2017.

Comments: CIAC 2017

MSC Class: 68 ACM Class: F.2.2

arXiv:1412.4273 [pdf, ps, other]

Complexity of interval minmax regret scheduling on parallel identical machines with total completion time criterion

Authors: Maciej Drwal, Roman Rischke

Abstract: In this paper, we consider the problem of scheduling jobs on parallel identical machines, where the processing times of jobs are uncertain: only interval bounds of processing times are known. The optimality criterion of a schedule is the total completion time. In order to cope with the uncertainty, we consider the maximum regret objective and we seek a schedule that performs well under all possibl… ▽ More In this paper, we consider the problem of scheduling jobs on parallel identical machines, where the processing times of jobs are uncertain: only interval bounds of processing times are known. The optimality criterion of a schedule is the total completion time. In order to cope with the uncertainty, we consider the maximum regret objective and we seek a schedule that performs well under all possible instantiations of processing times. Although the deterministic version of the considered problem is solvable in polynomial time, the minmax regret version is known to be weakly NP-hard even for a single machine, and strongly NP-hard for parallel unrelated machines. In this paper, we show that the problem is strongly NP-hard also in the case of parallel identical machines. △ Less

Submitted 24 March, 2016; v1 submitted 13 December, 2014; originally announced December 2014.

Showing 1–6 of 6 results for author: Rischke, R