-
VALID: a Validated Algorithm for Learning in Decentralized Networks with Possible Adversarial Presence
Authors:
Mayank Bakshi,
Sara Ghasvarianjahromi,
Yauhen Yakimenka,
Allison Beemer,
Oliver Kosut,
Joerg Kliewer
Abstract:
We introduce the paradigm of validated decentralized learning for undirected networks with heterogeneous data and possible adversarial infiltration. We require (a) convergence to a global empirical loss minimizer when adversaries are absent, and (b) either detection of adversarial presence of convergence to an admissible consensus irrespective of the adversarial configuration. To this end, we prop…
▽ More
We introduce the paradigm of validated decentralized learning for undirected networks with heterogeneous data and possible adversarial infiltration. We require (a) convergence to a global empirical loss minimizer when adversaries are absent, and (b) either detection of adversarial presence of convergence to an admissible consensus irrespective of the adversarial configuration. To this end, we propose the VALID protocol which, to the best of our knowledge, is the first to achieve a validated learning guarantee. Moreover, VALID offers an O(1/T) convergence rate (under pertinent regularity assumptions), and computational and communication complexities comparable to non-adversarial distributed stochastic gradient descent. Remarkably, VALID retains optimal performance metrics in adversary-free environments, sidestep** the robustness penalties observed in prior byzantine-robust methods. A distinctive aspect of our study is a heterogeneity metric based on the norms of individual agents' gradients computed at the global empirical loss minimizer. This not only provides a natural statistic for detecting significant byzantine disruptions but also allows us to prove the optimality of VALID in wide generality. Lastly, our numerical results reveal that, in the absence of adversaries, VALID converges faster than state-of-the-art byzantine robust algorithms, while when adversaries are present, VALID terminates with each honest either converging to an admissible consensus of declaring adversarial presence in the network.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Straggler-Resilient Differentially-Private Decentralized Learning
Authors:
Yauhen Yakimenka,
Chung-Wei Weng,
Hsuan-Yin Lin,
Eirik Rosnes,
Jörg Kliewer
Abstract:
We consider the straggler problem in decentralized learning over a logical ring while preserving user data privacy. Especially, we extend the recently proposed framework of differential privacy (DP) amplification by decentralization by Cyffers and Bellet to include overall training latency--comprising both computation and communication latency. Analytical results on both the convergence speed and…
▽ More
We consider the straggler problem in decentralized learning over a logical ring while preserving user data privacy. Especially, we extend the recently proposed framework of differential privacy (DP) amplification by decentralization by Cyffers and Bellet to include overall training latency--comprising both computation and communication latency. Analytical results on both the convergence speed and the DP level are derived for both a skip** scheme (which ignores the stragglers after a timeout) and a baseline scheme that waits for each node to finish before the training continues. A trade-off between overall training latency, accuracy, and privacy, parameterized by the timeout of the skip** scheme, is identified and empirically validated for logistic regression on a real-world dataset and for image classification using the MNIST and CIFAR-10 datasets.
△ Less
Submitted 28 June, 2024; v1 submitted 6 December, 2022;
originally announced December 2022.
-
Optimal Rate-Distortion-Leakage Tradeoff for Single-Server Information Retrieval
Authors:
Yauhen Yakimenka,
Hsuan-Yin Lin,
Eirik Rosnes,
Jörg Kliewer
Abstract:
Private information retrieval protocols guarantee that a user can privately and losslessly retrieve a single file from a database stored across multiple servers. In this work, we propose to simultaneously relax the conditions of perfect retrievability and privacy in order to obtain improved download rates when all files are stored uncoded on a single server. Information leakage is measured in term…
▽ More
Private information retrieval protocols guarantee that a user can privately and losslessly retrieve a single file from a database stored across multiple servers. In this work, we propose to simultaneously relax the conditions of perfect retrievability and privacy in order to obtain improved download rates when all files are stored uncoded on a single server. Information leakage is measured in terms of the average success probability for the server of correctly guessing the identity of the desired file. The main findings are: i) The derivation of the optimal tradeoff between download rate, distortion, and information leakage when the file size is infinite. Closed-form expressions of the optimal tradeoff for the special cases of "no-leakage" and "no-privacy" are also given. ii) A novel approach based on linear programming (LP) to construct schemes for a finite file size and an arbitrary number of files. The proposed LP approach can be leveraged to find provably optimal schemes with corresponding closed-form expressions for the rate-distortion-leakage tradeoff when the database contains at most four bits.
Finally, for a database that contains 320 bits, we compare two construction methods based on the LP approach with a nonconstructive scheme downloading subsets of files using a finite-length lossy compressor based on random coding.
△ Less
Submitted 6 January, 2022; v1 submitted 9 November, 2021;
originally announced November 2021.
-
Generative Adversarial User Privacy in Lossy Single-Server Information Retrieval
Authors:
Chung-Wei Weng,
Yauhen Yakimenka,
Hsuan-Yin Lin,
Eirik Rosnes,
Joerg Kliewer
Abstract:
We propose to extend the concept of private information retrieval by allowing for distortion in the retrieval process and relaxing the perfect privacy requirement at the same time. In particular, we study the trade-off between download rate, distortion, and user privacy leakage, and show that in the limit of large file sizes this trade-off can be captured via a novel information-theoretical formul…
▽ More
We propose to extend the concept of private information retrieval by allowing for distortion in the retrieval process and relaxing the perfect privacy requirement at the same time. In particular, we study the trade-off between download rate, distortion, and user privacy leakage, and show that in the limit of large file sizes this trade-off can be captured via a novel information-theoretical formulation for datasets with a known distribution. Moreover, for scenarios where the statistics of the dataset is unknown, we propose a new deep learning framework by leveraging a generative adversarial network approach, which allows the user to learn efficient schemes from the data itself. We evaluate the performance of the scheme on a synthetic Gaussian dataset as well as on the MNIST, CIFAR-10, and LSUN datasets. For the MNIST, CIFAR-10, and LSUN datasets, the data-driven approach significantly outperforms a nonlearning-based scheme which combines source coding with the download of multiple files.
△ Less
Submitted 19 October, 2022; v1 submitted 7 December, 2020;
originally announced December 2020.
-
On the Capacity of Private Monomial Computation
Authors:
Yauhen Yakimenka,
Hsuan-Yin Lin,
Eirik Rosnes
Abstract:
In this work, we consider private monomial computation (PMC) for replicated noncolluding databases. In PMC, a user wishes to privately retrieve an arbitrary multivariate monomial from a candidate set of monomials in $f$ messages over a finite field $\mathbb F_q$, where $q=p^k$ is a power of a prime $p$ and $k \ge 1$, replicated over $n$ databases. We derive the PMC capacity under a technical condi…
▽ More
In this work, we consider private monomial computation (PMC) for replicated noncolluding databases. In PMC, a user wishes to privately retrieve an arbitrary multivariate monomial from a candidate set of monomials in $f$ messages over a finite field $\mathbb F_q$, where $q=p^k$ is a power of a prime $p$ and $k \ge 1$, replicated over $n$ databases. We derive the PMC capacity under a technical condition on $p$ and for asymptotically large $q$. The condition on $p$ is satisfied, e.g., for large enough $p$. Also, we present a novel PMC scheme for arbitrary $q$ that is capacity-achieving in the asymptotic case above. Moreover, we present formulas for the entropy of a multivariate monomial and for a set of monomials in uniformly distributed random variables over a finite field, which are used in the derivation of the capacity expression.
△ Less
Submitted 17 January, 2020;
originally announced January 2020.
-
Failure Analysis of the Interval-Passing Algorithm for Compressed Sensing
Authors:
Yauhen Yakimenka,
Eirik Rosnes
Abstract:
In this work, we perform a complete failure analysis of the interval-passing algorithm (IPA) for compressed sensing, an efficient iterative algorithm for reconstructing a $k$-sparse nonnegative $n$-dimensional real signal $\boldsymbol{x}$ from a small number of linear measurements $\boldsymbol{y}$. In particular, we show that the IPA fails to recover $\boldsymbol{x}$ from $\boldsymbol{y}$ if and o…
▽ More
In this work, we perform a complete failure analysis of the interval-passing algorithm (IPA) for compressed sensing, an efficient iterative algorithm for reconstructing a $k$-sparse nonnegative $n$-dimensional real signal $\boldsymbol{x}$ from a small number of linear measurements $\boldsymbol{y}$. In particular, we show that the IPA fails to recover $\boldsymbol{x}$ from $\boldsymbol{y}$ if and only if it fails to recover a corresponding binary vector of the same support, and also that only positions of nonzero values in the measurement matrix are of importance to the success of recovery. Based on this observation, we introduce termatiko sets and show that the IPA fails to fully recover $\boldsymbol x$ if and only if the support of $\boldsymbol x$ contains a nonempty termatiko set, thus giving a complete (graph-theoretic) description of the failing sets of the IPA. Two heuristics to locate small-size termatiko sets are presented. For binary column-regular measurement matrices with no $4$-cycles, we provide a lower bound on the termatiko distance, defined as the smallest size of a nonempty termatiko set. For measurement matrices constructed from the parity-check matrices of array LDPC codes, upper bounds on the termatiko distance are provided for column-weight at most $7$, while for column-weight $3$, the exact termatiko distance and its corresponding multiplicity are provided. Next, we show that adding redundant rows to the measurement matrix does not create new termatiko sets, but rather potentially removes termatiko sets and thus improves performance. An algorithm is provided to efficiently search for such redundant rows. Finally, we present numerical results for different specific measurement matrices and also for protograph-based ensembles of measurement matrices, as well as simulation results of IPA performance, showing the influence of small-size termatiko sets.
△ Less
Submitted 19 December, 2019; v1 submitted 13 June, 2018;
originally announced June 2018.
-
Stop** Redundancy Hierarchy Beyond the Minimum Distance
Authors:
Yauhen Yakimenka,
Vitaly Skachek,
Irina E. Bocharova,
Boris D. Kudryashov
Abstract:
Stop** sets play a crucial role in failure events of iterative decoders over a binary erasure channel (BEC). The $\ell$-th stop** redundancy is the minimum number of rows in the parity-check matrix of a code, which contains no stop** sets of size up to $\ell$. In this work, a notion of coverable stop** sets is defined. In order to achieve maximum-likelihood performance under iterative deco…
▽ More
Stop** sets play a crucial role in failure events of iterative decoders over a binary erasure channel (BEC). The $\ell$-th stop** redundancy is the minimum number of rows in the parity-check matrix of a code, which contains no stop** sets of size up to $\ell$. In this work, a notion of coverable stop** sets is defined. In order to achieve maximum-likelihood performance under iterative decoding over the BEC, the parity-check matrix should contain no coverable stop** sets of size $\ell$, for $1 \le \ell \le n-k$, where $n$ is the code length, $k$ is the code dimension. By estimating the number of coverable stop** sets, we obtain upper bounds on the $\ell$-th stop** redundancy, $1 \le \ell \le n-k$. The bounds are derived for both specific codes and code ensembles. In the range $1 \le \ell \le d-1$, for specific codes, the new bounds improve on the results in the literature. Numerical calculations are also presented.
△ Less
Submitted 29 September, 2018; v1 submitted 18 April, 2018;
originally announced April 2018.
-
Distance Properties of Short LDPC Codes and their Impact on the BP, ML and Near-ML Decoding Performance
Authors:
Irina E. Bocharova,
Boris D. Kudryashov,
Vitaly Skachek,
Yauhen Yakimenka
Abstract:
Parameters of LDPC codes, such as minimum distance, stop** distance, stop** redundancy, girth of the Tanner graph, and their influence on the frame error rate performance of the BP, ML and near-ML decoding over a BEC and an AWGN channel are studied. Both random and structured LDPC codes are considered. In particular, the BP decoding is applied to the code parity-check matrices with an increasi…
▽ More
Parameters of LDPC codes, such as minimum distance, stop** distance, stop** redundancy, girth of the Tanner graph, and their influence on the frame error rate performance of the BP, ML and near-ML decoding over a BEC and an AWGN channel are studied. Both random and structured LDPC codes are considered. In particular, the BP decoding is applied to the code parity-check matrices with an increasing number of redundant rows, and the convergence of the performance to that of the ML decoding is analyzed. A comparison of the simulated BP, ML, and near-ML performance with the improved theoretical bounds on the error probability based on the exact weight spectrum coefficients and the exact stop** size spectrum coefficients is presented. It is observed that decoding performance very close to the ML decoding performance can be achieved with a relatively small number of redundant rows for some codes, for both the BEC and the AWGN channels.
△ Less
Submitted 4 July, 2017;
originally announced July 2017.
-
BP-LED decoding algorithm for LDPC codes over AWGN channels
Authors:
Irina E. Bocharova,
Boris D. Kudryashov,
Vitaly Skachek,
Yauhen Yakimenka
Abstract:
A new method for low-complexity near-maximum-likelihood (ML) decoding of low-density parity-check (LDPC) codes over the additive white Gaussian noise channel is presented. The proposed method termed belief-propagation--list erasure decoding (BP-LED) is based on erasing carefully chosen unreliable bits performed in case of BP decoding failure. A strategy of introducing erasures into the received ve…
▽ More
A new method for low-complexity near-maximum-likelihood (ML) decoding of low-density parity-check (LDPC) codes over the additive white Gaussian noise channel is presented. The proposed method termed belief-propagation--list erasure decoding (BP-LED) is based on erasing carefully chosen unreliable bits performed in case of BP decoding failure. A strategy of introducing erasures into the received vector and a new erasure decoding algorithm are proposed. The new erasure decoding algorithm, called list erasure decoding, combines ML decoding over the BEC with list decoding applied if the ML decoder fails to find a unique solution. The asymptotic exponent of the average list size for random regular LDPC codes from the Gallager ensemble is analyzed. Furthermore, a few examples of regular and irregular quasi-cyclic LDPC codes of short and moderate lengths are studied by simulations and their performance is compared with the upper bound on the LDPC ensemble-average performance and the upper bound on the average performance of random linear codes under ML decoding. A comparison with the BP decoding performance of the WiMAX standard codes and performance of the near-ML BEAST decoding are presented. The new algorithm is applied to decoding a short nonbinary LDPC code over the extension of the binary Galois field. The obtained simulation results are compared to the upper bound on the ensemble-average performance of the binary image of regular nonbinary LDPC codes.
△ Less
Submitted 26 May, 2017;
originally announced May 2017.
-
On Failing Sets of the Interval-Passing Algorithm for Compressed Sensing
Authors:
Yauhen Yakimenka,
Eirik Rosnes
Abstract:
In this work, we analyze the failing sets of the interval-passing algorithm (IPA) for compressed sensing. The IPA is an efficient iterative algorithm for reconstructing a k-sparse nonnegative n-dimensional real signal x from a small number of linear measurements y. In particular, we show that the IPA fails to recover x from y if and only if it fails to recover a corresponding binary vector of the…
▽ More
In this work, we analyze the failing sets of the interval-passing algorithm (IPA) for compressed sensing. The IPA is an efficient iterative algorithm for reconstructing a k-sparse nonnegative n-dimensional real signal x from a small number of linear measurements y. In particular, we show that the IPA fails to recover x from y if and only if it fails to recover a corresponding binary vector of the same support, and also that only positions of nonzero values in the measurement matrix are of importance for success of recovery. Based on this observation, we introduce termatiko sets and show that the IPA fails to fully recover x if and only if the support of x contains a nonempty termatiko set, thus giving a complete (graph-theoretic) description of the failing sets of the IPA. Finally, we present an extensive numerical study showing that in many cases there exist termatiko sets of size strictly smaller than the stop** distance of the binary measurement matrix; even as low as half the stop** distance in some cases.
△ Less
Submitted 4 October, 2016; v1 submitted 18 July, 2016;
originally announced July 2016.
-
Refined Upper Bounds on Stop** Redundancy of Binary Linear Codes
Authors:
Yauhen Yakimenka,
Vitaly Skachek
Abstract:
The $l$-th stop** redundancy $ρ_l(\mathcal C)$ of the binary $[n, k, d]$ code $\mathcal C$, $1 \le l \le d$, is defined as the minimum number of rows in the parity-check matrix of $\mathcal C$, such that the smallest stop** set is of size at least $l$. The stop** redundancy $ρ(\mathcal C)$ is defined as $ρ_d(\mathcal C)$. In this work, we improve on the probabilistic analysis of stop** red…
▽ More
The $l$-th stop** redundancy $ρ_l(\mathcal C)$ of the binary $[n, k, d]$ code $\mathcal C$, $1 \le l \le d$, is defined as the minimum number of rows in the parity-check matrix of $\mathcal C$, such that the smallest stop** set is of size at least $l$. The stop** redundancy $ρ(\mathcal C)$ is defined as $ρ_d(\mathcal C)$. In this work, we improve on the probabilistic analysis of stop** redundancy, proposed by Han, Siegel and Vardy, which yields the best bounds known today. In our approach, we judiciously select the first few rows in the parity-check matrix, and then continue with the probabilistic method. By using similar techniques, we improve also on the best known bounds on $ρ_l(\mathcal C)$, for $1 \le l \le d$. Our approach is compared to the existing methods by numerical computations.
△ Less
Submitted 27 February, 2015; v1 submitted 31 October, 2014;
originally announced October 2014.