-
An End-to-End Coding Scheme for DNA-Based Data Storage With Nanopore-Sequenced Reads
Authors:
Lorenz Welter,
Roman Sokolovskii,
Thomas Heinis,
Antonia Wachter-Zeh,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
We consider error-correcting coding for deoxyribonucleic acid (DNA)-based storage using nanopore sequencing. We model the DNA storage channel as a sampling noise channel where the input data is chunked into $M$ short DNA strands, which are copied a random number of times, and the channel outputs a random selection of $N$ noisy DNA strands. The retrieved DNA reads are prone to strand-dependent inse…
▽ More
We consider error-correcting coding for deoxyribonucleic acid (DNA)-based storage using nanopore sequencing. We model the DNA storage channel as a sampling noise channel where the input data is chunked into $M$ short DNA strands, which are copied a random number of times, and the channel outputs a random selection of $N$ noisy DNA strands. The retrieved DNA reads are prone to strand-dependent insertion, deletion, and substitution (IDS) errors. We construct an index-based concatenated coding scheme consisting of the concatenation of an outer code, an index code, and an inner code. We further propose a low-complexity (linear in $N$) maximum a posteriori probability decoder that takes into account the strand-dependent IDS errors and the randomness of the drawing to infer symbolwise a posteriori probabilities for the outer decoder. We present Monte-Carlo simulations for information-outage probabilities and frame error rates for different channel setups on experimental data. We finally evaluate the overall system performance using the read/write cost trade-off. A powerful combination of tailored channel modeling and soft information processing allows us to achieve excellent performance even with error-prone nanopore-sequenced reads outperforming state-of-the-art schemes.%
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Generalizing Quantum Tanner Codes
Authors:
Olai Å. Mostad,
Eirik Rosnes,
Hsuan-Yin Lin
Abstract:
In this work, we present a generalization of the recently proposed quantum Tanner codes by Leverrier and Zémor, which contains a construction of asymptotically good quantum LDPC codes. Quantum Tanner codes have so far been constructed equivalently from groups, Cayley graphs, or square complexes constructed from groups. We show how to enlarge this to group actions on finite sets, Schreier graphs, a…
▽ More
In this work, we present a generalization of the recently proposed quantum Tanner codes by Leverrier and Zémor, which contains a construction of asymptotically good quantum LDPC codes. Quantum Tanner codes have so far been constructed equivalently from groups, Cayley graphs, or square complexes constructed from groups. We show how to enlarge this to group actions on finite sets, Schreier graphs, and a family of square complexes which is the largest possible in a certain sense. Furthermore, we discuss how the proposed generalization opens up the possibility of finding other families of asymptotically good quantum codes.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Weakly-Private Information Retrieval From MDS-Coded Distributed Storage
Authors:
Asbjørn O. Orvedal,
Hsuan-Yin Lin,
Eirik Rosnes
Abstract:
We consider the problem of weakly-private information retrieval (WPIR) when data is encoded by a maximum distance separable code and stored across multiple servers. In WPIR, a user wishes to retrieve a piece of data from a set of servers without leaking too much information about which piece of data she is interested in. We study and provide the first WPIR protocols for this scenario and present r…
▽ More
We consider the problem of weakly-private information retrieval (WPIR) when data is encoded by a maximum distance separable code and stored across multiple servers. In WPIR, a user wishes to retrieve a piece of data from a set of servers without leaking too much information about which piece of data she is interested in. We study and provide the first WPIR protocols for this scenario and present results on their optimal trade-off between download rate and information leakage using the maximal leakage privacy metric.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Improved Capacity Outer Bound for Private Quadratic Monomial Computation
Authors:
Karen M. Dæhli,
Sarah A Obead,
Hsuan-Yin Lin,
Eirik Rosnes
Abstract:
In private computation, a user wishes to retrieve a function evaluation of messages stored on a set of databases without revealing the function's identity to the databases. Obead \emph{et al.} introduced a capacity outer bound for private nonlinear computation, dependent on the order of the candidate functions. Focusing on private \emph{quadratic monomial} computation, we propose three methods for…
▽ More
In private computation, a user wishes to retrieve a function evaluation of messages stored on a set of databases without revealing the function's identity to the databases. Obead \emph{et al.} introduced a capacity outer bound for private nonlinear computation, dependent on the order of the candidate functions. Focusing on private \emph{quadratic monomial} computation, we propose three methods for ordering candidate functions: a graph edge-coloring method, a graph-distance method, and an entropy-based greedy method. We confirm, via an exhaustive search, that all three methods yield an optimal ordering for $f < 6$ messages. For $6 \leq f \leq 12$ messages, we numerically evaluate the performance of the proposed methods compared with a directed random search. For almost all scenarios considered, the entropy-based greedy method gives the smallest gap to the best-found ordering.
△ Less
Submitted 26 February, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
On the Capacity of Private Nonlinear Computation for Replicated Databases
Authors:
Sarah A. Obead,
Hsuan-Yin Lin,
Eirik Rosnes,
Jörg Kliewer
Abstract:
We consider the problem of private computation (PC) in a distributed storage system. In such a setting a user wishes to compute a function of $f$ messages replicated across $n$ noncolluding databases, while revealing no information about the desired function to the databases. We provide an information-theoretically accurate achievable PC rate, which is the ratio of the smallest desired amount of i…
▽ More
We consider the problem of private computation (PC) in a distributed storage system. In such a setting a user wishes to compute a function of $f$ messages replicated across $n$ noncolluding databases, while revealing no information about the desired function to the databases. We provide an information-theoretically accurate achievable PC rate, which is the ratio of the smallest desired amount of information and the total amount of downloaded information, for the scenario of nonlinear computation. For a large message size the rate equals the PC capacity, i.e., the maximum achievable PC rate, when the candidate functions are the $f$ independent messages and one arbitrary nonlinear function of these. When the number of messages grows, the PC rate approaches an outer bound on the PC capacity. As a special case, we consider private monomial computation (PMC) and numerically compare the achievable PMC rate to the outer bound for a finite number of messages.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Efficient Interpolation-Based Decoding of Reed-Solomon Codes
Authors:
Wrya K. Kadir,
Hsuan-Yin Lin,
Eirik Rosnes
Abstract:
We propose a new interpolation-based error decoding algorithm for $(n,k)$ Reed-Solomon (RS) codes over a finite field of size $q$, where $n=q-1$ is the length and $k$ is the dimension. In particular, we employ the fast Fourier transform (FFT) together with properties of a circulant matrix associated with the error interpolation polynomial and some known results from elimination theory in the decod…
▽ More
We propose a new interpolation-based error decoding algorithm for $(n,k)$ Reed-Solomon (RS) codes over a finite field of size $q$, where $n=q-1$ is the length and $k$ is the dimension. In particular, we employ the fast Fourier transform (FFT) together with properties of a circulant matrix associated with the error interpolation polynomial and some known results from elimination theory in the decoding process. The asymptotic computational complexity of the proposed algorithm for correcting any $t \leq \lfloor \frac{n-k}{2} \rfloor$ errors in an $(n,k)$ RS code is of order $\mathcal{O}(t\log^2 t)$ and $\mathcal{O}(n\log^2 n \log\log n)$ over FFT-friendly and arbitrary finite fields, respectively, achieving the best currently known asymptotic decoding complexity, proposed for the same set of parameters.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Single-Server Pliable Private Information Retrieval With Side Information
Authors:
Sarah A. Obead,
Hsuan-Yin Lin,
Eirik Rosnes
Abstract:
We study the problem of pliable private information retrieval with side information (PPIR-SI) for the single server case. In PPIR, the messages are partitioned into nonoverlap** classes and stored in a number of noncolluding databases. The user wishes to retrieve any one message from a desired class while revealing no information about the desired class identity to the databases. In PPIR-SI, the…
▽ More
We study the problem of pliable private information retrieval with side information (PPIR-SI) for the single server case. In PPIR, the messages are partitioned into nonoverlap** classes and stored in a number of noncolluding databases. The user wishes to retrieve any one message from a desired class while revealing no information about the desired class identity to the databases. In PPIR-SI, the user has prior access to some side information in the form of messages from different classes and wishes to retrieve any one new message from a desired class, i.e., the message is not included in the side information set, while revealing no information about the desired class to the databases. We characterize the capacity of (linear) single-server PPIR-SI for the case where the user's side information is unidentified, i.e., the user is oblivious of the identities of its side information messages and the database structure. We term this case PPIR-USI. Surprisingly, we show that having side information, in PPIR-USI, is disadvantageous, in terms of the download rate, compared to PPIR.
△ Less
Submitted 27 May, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Index-Based Concatenated Codes for the Multi-Draw DNA Storage Channel
Authors:
Lorenz Welter,
Issam Maarouf,
Andreas Lenz,
Antonia Wachter-Zeh,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
We consider error-correcting coding for DNA-based storage. We model the DNA storage channel as a multi-draw IDS channel where the input data is chunked into $M$ short DNA strands, which are copied a random number of times, and the channel outputs a random selection of $N$ noisy DNA strands. The retrieved DNA strands are prone to insertion, deletion, and substitution (IDS) errors. We propose an ind…
▽ More
We consider error-correcting coding for DNA-based storage. We model the DNA storage channel as a multi-draw IDS channel where the input data is chunked into $M$ short DNA strands, which are copied a random number of times, and the channel outputs a random selection of $N$ noisy DNA strands. The retrieved DNA strands are prone to insertion, deletion, and substitution (IDS) errors. We propose an index-based concatenated coding scheme consisting of the concatenation of an outer code, an index code, and an inner synchronization code, where the latter two tackle IDS errors. We further propose a mismatched joint index-synchronization code maximum a posteriori probability decoder with optional clustering to infer symbolwise a posterior probabilities for the outer decoder. We compute achievable information rates for the outer code and present Monte-Carlo simulations for information-outage probabilities and frame error rates on synthetic and experimental data, respectively.
△ Less
Submitted 21 June, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Finite Blocklength Performance Bound for the DNA Storage Channel
Authors:
Issam Maarouf,
Gianluigi Liva,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
We present a finite blocklength performance bound for a DNA storage channel with insertions, deletions, and substitutions. The considered bound -- the dependency testing (DT) bound, introduced by Polyanskiy et al. in 2010 -- provides an upper bound on the achievable frame error probability and can be used to benchmark coding schemes in the practical short-to-medium blocklength regime. In particula…
▽ More
We present a finite blocklength performance bound for a DNA storage channel with insertions, deletions, and substitutions. The considered bound -- the dependency testing (DT) bound, introduced by Polyanskiy et al. in 2010 -- provides an upper bound on the achievable frame error probability and can be used to benchmark coding schemes in the practical short-to-medium blocklength regime. In particular, we consider a concatenated coding scheme where an inner synchronization code deals with insertions and deletions and the outer code corrects remaining (mostly substitution) errors. The bound depends on the inner synchronization code. Thus, it allows to guide its choice. We then consider low-density parity-check codes for the outer code, which we optimize based on extrinsic information transfer charts. Our optimized coding schemes achieve a normalized rate of $88\%$ to $96\%$ with respect to the DT bound for code lengths up to $2000$ DNA symbols for a frame error probability of $10^{-3}$ and code rate 1/2.
△ Less
Submitted 4 August, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Achievable Information Rates and Concatenated Codes for the DNA Nanopore Sequencing Channel
Authors:
Issam Maarouf,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
The errors occurring in DNA-based storage are correlated in nature, which is a direct consequence of the synthesis and sequencing processes. In this paper, we consider the memory-$k$ nanopore channel model recently introduced by Hamoum et al., which models the inherent memory of the channel. We derive the maximum a posteriori (MAP) decoder for this channel model. The derived MAP decoder allows us…
▽ More
The errors occurring in DNA-based storage are correlated in nature, which is a direct consequence of the synthesis and sequencing processes. In this paper, we consider the memory-$k$ nanopore channel model recently introduced by Hamoum et al., which models the inherent memory of the channel. We derive the maximum a posteriori (MAP) decoder for this channel model. The derived MAP decoder allows us to compute achievable information rates for the true DNA storage channel assuming a mismatched decoder matched to the memory-$k$ nanopore channel model, and quantify the loss in performance assuming a small memory length--and hence limited decoding complexity. Furthermore, the derived MAP decoder can be used to design error-correcting codes tailored to the DNA storage channel. We show that a concatenated coding scheme with an outer low-density parity-check code and an inner convolutional code yields excellent performance.
△ Less
Submitted 24 March, 2023; v1 submitted 14 December, 2022;
originally announced December 2022.
-
Straggler-Resilient Differentially-Private Decentralized Learning
Authors:
Yauhen Yakimenka,
Chung-Wei Weng,
Hsuan-Yin Lin,
Eirik Rosnes,
Jörg Kliewer
Abstract:
We consider the straggler problem in decentralized learning over a logical ring while preserving user data privacy. Especially, we extend the recently proposed framework of differential privacy (DP) amplification by decentralization by Cyffers and Bellet to include overall training latency--comprising both computation and communication latency. Analytical results on both the convergence speed and…
▽ More
We consider the straggler problem in decentralized learning over a logical ring while preserving user data privacy. Especially, we extend the recently proposed framework of differential privacy (DP) amplification by decentralization by Cyffers and Bellet to include overall training latency--comprising both computation and communication latency. Analytical results on both the convergence speed and the DP level are derived for both a skip** scheme (which ignores the stragglers after a timeout) and a baseline scheme that waits for each node to finish before the training continues. A trade-off between overall training latency, accuracy, and privacy, parameterized by the timeout of the skip** scheme, is identified and empirically validated for logistic regression on a real-world dataset and for image classification using the MNIST and CIFAR-10 datasets.
△ Less
Submitted 28 June, 2024; v1 submitted 6 December, 2022;
originally announced December 2022.
-
Computational Code-Based Privacy in Coded Federated Learning
Authors:
Marvin Xhemrishi,
Alexandre Graell i Amat,
Eirik Rosnes,
Antonia Wachter-Zeh
Abstract:
We propose a privacy-preserving federated learning (FL) scheme that is resilient against straggling devices. An adaptive scenario is suggested where the slower devices share their data with the faster ones and do not participate in the learning process. The proposed scheme employs code-based cryptography to ensure \emph{computational} privacy of the private data, i.e., no device with bounded compu…
▽ More
We propose a privacy-preserving federated learning (FL) scheme that is resilient against straggling devices. An adaptive scenario is suggested where the slower devices share their data with the faster ones and do not participate in the learning process. The proposed scheme employs code-based cryptography to ensure \emph{computational} privacy of the private data, i.e., no device with bounded computational power can obtain information about the other devices' data in feasible time. For a scenario with 25 devices, the proposed scheme achieves a speed-up of 4.7 and 4 for 92 and 128 bits security, respectively, for an accuracy of 95\% on the MNIST dataset compared with conventional mini-batch FL.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
CodedPaddedFL and CodedSecAgg: Straggler Mitigation and Secure Aggregation in Federated Learning
Authors:
Reent Schlegel,
Siddhartha Kumar,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
We present two novel federated learning (FL) schemes that mitigate the effect of straggling devices by introducing redundancy on the devices' data across the network. Compared to other schemes in the literature, which deal with stragglers or device dropouts by ignoring their contribution, the proposed schemes do not suffer from the client drift problem. The first scheme, CodedPaddedFL, mitigates t…
▽ More
We present two novel federated learning (FL) schemes that mitigate the effect of straggling devices by introducing redundancy on the devices' data across the network. Compared to other schemes in the literature, which deal with stragglers or device dropouts by ignoring their contribution, the proposed schemes do not suffer from the client drift problem. The first scheme, CodedPaddedFL, mitigates the effect of stragglers while retaining the privacy level of conventional FL. It combines one-time padding for user data privacy with gradient codes to yield straggler resiliency. The second scheme, CodedSecAgg, provides straggler resiliency and robustness against model inversion attacks and is based on Shamir's secret sharing. We apply CodedPaddedFL and CodedSecAgg to a classification problem. For a scenario with 120 devices, CodedPaddedFL achieves a speed-up factor of 18 for an accuracy of 95% on the MNIST dataset compared to conventional FL. Furthermore, it yields similar performance in terms of latency compared to a recently proposed scheme by Prakash et al. without the shortcoming of additional leakage of private data. CodedSecAgg outperforms the state-of-the-art secure aggregation scheme LightSecAgg by a speed-up factor of 6.6-18.7 for the MNIST dataset for an accuracy of 95%.
△ Less
Submitted 3 June, 2022; v1 submitted 16 December, 2021;
originally announced December 2021.
-
Concatenated Codes for Multiple Reads of a DNA Sequence
Authors:
Issam Maarouf,
Andreas Lenz,
Lorenz Welter,
Antonia Wachter-Zeh,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
Decoding sequences that stem from multiple transmissions of a codeword over an insertion, deletion, and substitution channel is a critical component of efficient deoxyribonucleic acid (DNA) data storage systems. In this paper, we consider a concatenated coding scheme with an outer nonbinary low-density parity-check code or a polar code and either an inner convolutional code or a time-varying block…
▽ More
Decoding sequences that stem from multiple transmissions of a codeword over an insertion, deletion, and substitution channel is a critical component of efficient deoxyribonucleic acid (DNA) data storage systems. In this paper, we consider a concatenated coding scheme with an outer nonbinary low-density parity-check code or a polar code and either an inner convolutional code or a time-varying block code. We propose two novel decoding algorithms for inference from multiple received sequences, both combining the inner code and channel to a joint hidden Markov model to infer symbolwise a posteriori probabilities (APPs). The first decoder computes the exact APPs by jointly decoding the received sequences, whereas the second decoder approximates the APPs by combining the results of separately decoded received sequences and has a complexity that is linear with the number of sequences. Using the proposed algorithms, we evaluate the performance of decoding multiple received sequences by means of achievable information rates and Monte-Carlo simulations. We show significant performance gains compared to a single received sequence. In addition, we succeed in improving the performance of the aforementioned coding scheme by optimizing both the inner and outer codes.
△ Less
Submitted 12 September, 2022; v1 submitted 29 November, 2021;
originally announced November 2021.
-
DSAG: A mixed synchronous-asynchronous iterative method for straggler-resilient learning
Authors:
Albin Severinson,
Eirik Rosnes,
Salim El Rouayheb,
Alexandre Graell i Amat
Abstract:
We consider straggler-resilient learning. In many previous works, e.g., in the coded computing literature, straggling is modeled as random delays that are independent and identically distributed between workers. However, in many practical scenarios, a given worker may straggle over an extended period of time. We propose a latency model that captures this behavior and is substantiated by traces col…
▽ More
We consider straggler-resilient learning. In many previous works, e.g., in the coded computing literature, straggling is modeled as random delays that are independent and identically distributed between workers. However, in many practical scenarios, a given worker may straggle over an extended period of time. We propose a latency model that captures this behavior and is substantiated by traces collected on Microsoft Azure, Amazon Web Services (AWS), and a small local cluster. Building on this model, we propose DSAG, a mixed synchronous-asynchronous iterative optimization method, based on the stochastic average gradient (SAG) method, that combines timely and stale results. We also propose a dynamic load-balancing strategy to further reduce the impact of straggling workers. We evaluate DSAG for principal component analysis, cast as a finite-sum optimization problem, of a large genomics dataset, and for logistic regression on a cluster composed of 100 workers on AWS, and find that DSAG is up to about 50% faster than SAG, and more than twice as fast as coded computing methods, for the particular scenario that we consider.
△ Less
Submitted 27 November, 2021;
originally announced November 2021.
-
Optimal Rate-Distortion-Leakage Tradeoff for Single-Server Information Retrieval
Authors:
Yauhen Yakimenka,
Hsuan-Yin Lin,
Eirik Rosnes,
Jörg Kliewer
Abstract:
Private information retrieval protocols guarantee that a user can privately and losslessly retrieve a single file from a database stored across multiple servers. In this work, we propose to simultaneously relax the conditions of perfect retrievability and privacy in order to obtain improved download rates when all files are stored uncoded on a single server. Information leakage is measured in term…
▽ More
Private information retrieval protocols guarantee that a user can privately and losslessly retrieve a single file from a database stored across multiple servers. In this work, we propose to simultaneously relax the conditions of perfect retrievability and privacy in order to obtain improved download rates when all files are stored uncoded on a single server. Information leakage is measured in terms of the average success probability for the server of correctly guessing the identity of the desired file. The main findings are: i) The derivation of the optimal tradeoff between download rate, distortion, and information leakage when the file size is infinite. Closed-form expressions of the optimal tradeoff for the special cases of "no-leakage" and "no-privacy" are also given. ii) A novel approach based on linear programming (LP) to construct schemes for a finite file size and an arbitrary number of files. The proposed LP approach can be leveraged to find provably optimal schemes with corresponding closed-form expressions for the rate-distortion-leakage tradeoff when the database contains at most four bits.
Finally, for a database that contains 320 bits, we compare two construction methods based on the LP approach with a nonconstructive scheme downloading subsets of files using a finite-length lossy compressor based on random coding.
△ Less
Submitted 6 January, 2022; v1 submitted 9 November, 2021;
originally announced November 2021.
-
Privacy-Preserving Coded Mobile Edge Computing for Low-Latency Distributed Inference
Authors:
Reent Schlegel,
Siddhartha Kumar,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
We consider a mobile edge computing scenario where a number of devices want to perform a linear inference $\boldsymbol{W}\boldsymbol{x}$ on some local data $\boldsymbol{x}$ given a network-side matrix $\boldsymbol{W}$. The computation is performed at the network edge over a number of edge servers. We propose a coding scheme that provides information-theoretic privacy against $z$ colluding (honest-…
▽ More
We consider a mobile edge computing scenario where a number of devices want to perform a linear inference $\boldsymbol{W}\boldsymbol{x}$ on some local data $\boldsymbol{x}$ given a network-side matrix $\boldsymbol{W}$. The computation is performed at the network edge over a number of edge servers. We propose a coding scheme that provides information-theoretic privacy against $z$ colluding (honest-but-curious) edge servers, while minimizing the overall latency\textemdash comprising upload, computation, download, and decoding latency\textemdash in the presence of straggling servers. The proposed scheme exploits Shamir's secret sharing to yield data privacy and straggler mitigation, combined with replication to provide spatial diversity for the download. We also propose two variants of the scheme that further reduce latency. For a considered scenario with $9$ edge servers, the proposed scheme reduces the latency by $8\%$ compared to the nonprivate scheme recently introduced by Zhang and Simeone, while providing privacy against an honest-but-curious edge server.
△ Less
Submitted 15 February, 2022; v1 submitted 7 October, 2021;
originally announced October 2021.
-
Coding for Straggler Mitigation in Federated Learning
Authors:
Siddhartha Kumar,
Reent Schlegel,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
We present a novel coded federated learning (FL) scheme for linear regression that mitigates the effect of straggling devices while retaining the privacy level of conventional FL. The proposed scheme combines one-time padding to preserve privacy and gradient codes to yield resiliency against stragglers and consists of two phases. In the first phase, the devices share a one-time padded version of t…
▽ More
We present a novel coded federated learning (FL) scheme for linear regression that mitigates the effect of straggling devices while retaining the privacy level of conventional FL. The proposed scheme combines one-time padding to preserve privacy and gradient codes to yield resiliency against stragglers and consists of two phases. In the first phase, the devices share a one-time padded version of their local data with a subset of other devices. In the second phase, the devices and the central server collaboratively and iteratively train a global linear model using gradient codes on the one-time padded local data. To apply one-time padding to real data, our scheme exploits a fixed-point arithmetic representation of the data. Unlike the coded FL scheme recently introduced by Prakash \emph{et al.}, the proposed scheme maintains the same level of privacy as conventional FL while achieving a similar training time. Compared to conventional FL, we show that the proposed scheme achieves a training speed-up factor of $6.6$ and $9.2$ on the MNIST and Fashion-MNIST datasets for an accuracy of $95\%$ and $85\%$, respectively.
△ Less
Submitted 15 February, 2022; v1 submitted 30 September, 2021;
originally announced September 2021.
-
Rateless Codes for Low-Latency Distributed Inference in Mobile Edge Computing
Authors:
Anton Frigård,
Siddhartha Kumar,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
We consider a mobile edge computing scenario where users want to perform a linear inference operation $\boldsymbol{W} \boldsymbol{x}$ on local data $\boldsymbol{x}$ for some network-side matrix $\boldsymbol{W}$. The inference is performed in a distributed fashion over multiple servers at the network edge. For this scenario, we propose a coding scheme that combines a rateless code to provide resili…
▽ More
We consider a mobile edge computing scenario where users want to perform a linear inference operation $\boldsymbol{W} \boldsymbol{x}$ on local data $\boldsymbol{x}$ for some network-side matrix $\boldsymbol{W}$. The inference is performed in a distributed fashion over multiple servers at the network edge. For this scenario, we propose a coding scheme that combines a rateless code to provide resiliency against straggling servers--hence reducing the computation latency--and an irregular-repetition code to provide spatial diversity--hence reducing the communication latency. We further derive a lower bound on the total latency--comprising computation latency, communication latency, and decoding latency. The proposed scheme performs remarkably close to the bound and yields significantly lower latency than the scheme based on maximum distance separable codes recently proposed by Zhang and Simeone.
△ Less
Submitted 17 August, 2021;
originally announced August 2021.
-
Dynamic Coded Caching in Wireless Networks Using Multi-Agent Reinforcement Learning
Authors:
Jesper Pedersen,
Alexandre Graell i Amat,
Fredrik Brännström,
Eirik Rosnes
Abstract:
We consider distributed caching of content across several small base stations (SBSs) in a wireless network, where the content is encoded using a maximum distance separable code. Specifically, we apply soft time-to-live (STTL) cache management policies, where coded packets may be evicted from the caches at periodic times. We propose a reinforcement learning (RL) approach to find coded STTL policies…
▽ More
We consider distributed caching of content across several small base stations (SBSs) in a wireless network, where the content is encoded using a maximum distance separable code. Specifically, we apply soft time-to-live (STTL) cache management policies, where coded packets may be evicted from the caches at periodic times. We propose a reinforcement learning (RL) approach to find coded STTL policies minimizing the overall network load. We demonstrate that such caching policies achieve almost the same network load as policies obtained through optimization, where the latter assumes perfect knowledge of the distribution of times between file requests as well the distribution of the number of SBSs within communication range of a user placing a request. We also suggest a multi-agent RL (MARL) framework for the scenario of non-uniformly distributed requests in space. For such a scenario, we show that MARL caching policies achieve lower network load as compared to optimized caching policies assuming a uniform request placement. We also provide convincing evidence that synchronous updates offer a lower network load than asynchronous updates for spatially homogeneous renewal request processes due to the memory of the renewal processes.
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
Generative Adversarial User Privacy in Lossy Single-Server Information Retrieval
Authors:
Chung-Wei Weng,
Yauhen Yakimenka,
Hsuan-Yin Lin,
Eirik Rosnes,
Joerg Kliewer
Abstract:
We propose to extend the concept of private information retrieval by allowing for distortion in the retrieval process and relaxing the perfect privacy requirement at the same time. In particular, we study the trade-off between download rate, distortion, and user privacy leakage, and show that in the limit of large file sizes this trade-off can be captured via a novel information-theoretical formul…
▽ More
We propose to extend the concept of private information retrieval by allowing for distortion in the retrieval process and relaxing the perfect privacy requirement at the same time. In particular, we study the trade-off between download rate, distortion, and user privacy leakage, and show that in the limit of large file sizes this trade-off can be captured via a novel information-theoretical formulation for datasets with a known distribution. Moreover, for scenarios where the statistics of the dataset is unknown, we propose a new deep learning framework by leveraging a generative adversarial network approach, which allows the user to learn efficient schemes from the data itself. We evaluate the performance of the scheme on a synthetic Gaussian dataset as well as on the MNIST, CIFAR-10, and LSUN datasets. For the MNIST, CIFAR-10, and LSUN datasets, the data-driven approach significantly outperforms a nonlearning-based scheme which combines source coding with the download of multiple files.
△ Less
Submitted 19 October, 2022; v1 submitted 7 December, 2020;
originally announced December 2020.
-
Concatenated Codes for Recovery From Multiple Reads of DNA Sequences
Authors:
Andreas Lenz,
Issam Maarouf,
Lorenz Welter,
Antonia Wachter-Zeh,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
Decoding sequences that stem from multiple transmissions of a codeword over an insertion, deletion, and substitution channel is a critical component of efficient deoxyribonucleic acid (DNA) data storage systems. In this paper, we consider a concatenated coding scheme with an outer low-density parity-check code and either an inner convolutional code or a block code. We propose two new decoding algo…
▽ More
Decoding sequences that stem from multiple transmissions of a codeword over an insertion, deletion, and substitution channel is a critical component of efficient deoxyribonucleic acid (DNA) data storage systems. In this paper, we consider a concatenated coding scheme with an outer low-density parity-check code and either an inner convolutional code or a block code. We propose two new decoding algorithms for inference from multiple received sequences, both combining the inner code and channel to a joint hidden Markov model to infer symbolwise a posteriori probabilities (APPs). The first decoder computes the exact APPs by jointly decoding the received sequences, whereas the second decoder approximates the APPs by combining the results of separately decoded received sequences. Using the proposed algorithms, we evaluate the performance of decoding multiple received sequences by means of achievable information rates and Monte-Carlo simulations. We show significant performance gains compared to a single received sequence.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
Multi-Server Weakly-Private Information Retrieval
Authors:
Hsuan-Yin Lin,
Siddhartha Kumar,
Eirik Rosnes,
Alexandre Graell i Amat,
Eitan Yaakobi
Abstract:
Private information retrieval (PIR) protocols ensure that a user can download a file from a database without revealing any information on the identity of the requested file to the servers storing the database. While existing protocols strictly impose that no information is leaked on the file's identity, this work initiates the study of the tradeoffs that can be achieved by relaxing the perfect pri…
▽ More
Private information retrieval (PIR) protocols ensure that a user can download a file from a database without revealing any information on the identity of the requested file to the servers storing the database. While existing protocols strictly impose that no information is leaked on the file's identity, this work initiates the study of the tradeoffs that can be achieved by relaxing the perfect privacy requirement. We refer to such protocols as weakly-private information retrieval (WPIR) protocols. In particular, for the case of multiple noncolluding replicated servers, we study how the download rate, the upload cost, and the access complexity can be improved when relaxing the full privacy constraint. To quantify the information leakage on the requested file's identity we consider mutual information (MI), worst-case information leakage, and maximal leakage (MaxL). We present two WPIR schemes, denoted by Scheme A and Scheme B, based on two recent PIR protocols and show that the download rate of the former can be optimized by solving a convex optimization problem. We also show that Scheme A achieves an improved download rate compared to the recently proposed scheme by Samy et al. under the so-called $ε$-privacy metric. Additionally, a family of schemes based on partitioning is presented. Moreover, we provide an information-theoretic converse bound for the maximum possible download rate for the MI and MaxL privacy metrics under a practical restriction on the alphabet size of queries and answers. For two servers and two files, the bound is tight under the MaxL metric, which settles the WPIR capacity in this particular case. Finally, we compare the performance of the proposed schemes and their gap to the converse bound.
△ Less
Submitted 2 November, 2021; v1 submitted 17 July, 2020;
originally announced July 2020.
-
Private Edge Computing for Linear Inference Based on Secret Sharing
Authors:
Reent Schlegel,
Siddhartha Kumar,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
We consider an edge computing scenario where users want to perform a linear computation on local, private data and a network-wide, public matrix. Users offload computations to edge servers located at the edge of the network, but do not want the servers, or any other party with access to the wireless links, to gain any information about their data. We provide a scheme that guarantees information-th…
▽ More
We consider an edge computing scenario where users want to perform a linear computation on local, private data and a network-wide, public matrix. Users offload computations to edge servers located at the edge of the network, but do not want the servers, or any other party with access to the wireless links, to gain any information about their data. We provide a scheme that guarantees information-theoretic user data privacy against an eavesdropper with access to a number of edge servers or their corresponding communication links. The novelty of the proposed scheme lies in the utilization of secret sharing and partial replication to provide privacy, mitigate the effect of straggling servers, and to allow for joint beamforming opportunities in the download phase, to minimize the overall latency, consisting of upload, computation, and download latencies.
△ Less
Submitted 19 October, 2020; v1 submitted 18 May, 2020;
originally announced May 2020.
-
Private Function Computation for Noncolluding Coded Databases
Authors:
Sarah A. Obead,
Hsuan-Yin Lin,
Eirik Rosnes,
Jörg Kliewer
Abstract:
Private computation in a distributed storage system (DSS) is a generalization of the private information retrieval (PIR) problem. In such setting a user wishes to compute a function of $f$ messages stored in $n$ noncolluding coded databases, i.e., databases storing data encoded with an $[n,k]$ linear storage code, while revealing no information about the desired function to the databases. We consi…
▽ More
Private computation in a distributed storage system (DSS) is a generalization of the private information retrieval (PIR) problem. In such setting a user wishes to compute a function of $f$ messages stored in $n$ noncolluding coded databases, i.e., databases storing data encoded with an $[n,k]$ linear storage code, while revealing no information about the desired function to the databases. We consider the problem of private polynomial computation (PPC). In PPC, a user wishes to compute a multivariate polynomial of degree at most $g$ over $f$ variables (or messages) stored in multiple databases. First, we consider the private computation of polynomials of degree $g=1$, i.e., private linear computation (PLC) for coded databases. In PLC, a user wishes to compute a linear combination over the $f$ messages while kee** the coefficients of the desired linear combination hidden from the database. For a linearly encoded DSS, we present a capacity-achieving PLC scheme and show that the PLC capacity, which is the ratio of the desired amount of information and the total amount of downloaded information, matches the maximum distance separable coded capacity of PIR for a large class of linear storage codes. Then, we consider private computation of higher degree polynomials, i.e., $g>1$. For this setup, we construct two novel PPC schemes. In the first scheme, we consider Reed-Solomon coded databases with Lagrange encoding, which leverages ideas from recently proposed star-product PIR and Lagrange coded computation. The second scheme considers the special case of coded databases with systematic Lagrange encoding. Both schemes yield improved rates, while asymptotically, as $f\rightarrow \infty$, the systematic scheme gives a significantly better computation retrieval rate compared to all known schemes up to some storage code rate that depends on the maximum degree of the candidate polynomials.
△ Less
Submitted 4 August, 2021; v1 submitted 22 March, 2020;
originally announced March 2020.
-
Dynamic Coded Caching in Wireless Networks
Authors:
Jesper Pedersen,
Alexandre Graell i Amat,
Jasper Goseling,
Fredrik Brännström,
Iryna Andriyanova,
Eirik Rosnes
Abstract:
We consider distributed and dynamic caching of coded content at small base stations (SBSs) in an area served by a macro base station (MBS). Specifically, content is encoded using a maximum distance separable code and cached according to a time-to-live (TTL) cache eviction policy, which allows coded packets to be removed from the caches at periodic times. Mobile users requesting a particular conten…
▽ More
We consider distributed and dynamic caching of coded content at small base stations (SBSs) in an area served by a macro base station (MBS). Specifically, content is encoded using a maximum distance separable code and cached according to a time-to-live (TTL) cache eviction policy, which allows coded packets to be removed from the caches at periodic times. Mobile users requesting a particular content download coded packets from SBSs within communication range. If additional packets are required to decode the file, these are downloaded from the MBS. We formulate an optimization problem that is efficiently solved numerically, providing TTL caching policies minimizing the overall network load. We demonstrate that distributed coded caching using TTL caching policies can offer significant reductions in terms of network load when request arrivals are bursty. We show how the distributed coded caching problem utilizing TTL caching policies can be analyzed as a specific single cache, convex optimization problem. Our problem encompasses static caching and the single cache as special cases. We prove that, interestingly, static caching is optimal under a Poisson request process, and that for a single cache the optimization problem has a surprisingly simple solution.
△ Less
Submitted 22 December, 2020; v1 submitted 19 February, 2020;
originally announced February 2020.
-
The Capacity of Single-Server Weakly-Private Information Retrieval
Authors:
Hsuan-Yin Lin,
Siddhartha Kumar,
Eirik Rosnes,
Alexandre Graell i Amat,
Eitan Yaakobi
Abstract:
A private information retrieval (PIR) protocol guarantees that a user can privately retrieve files stored in a database without revealing any information about the identity of the requested file. Existing information-theoretic PIR protocols ensure perfect privacy, i.e., zero information leakage to the servers storing the database, but at the cost of high download. In this work, we present weakly-p…
▽ More
A private information retrieval (PIR) protocol guarantees that a user can privately retrieve files stored in a database without revealing any information about the identity of the requested file. Existing information-theoretic PIR protocols ensure perfect privacy, i.e., zero information leakage to the servers storing the database, but at the cost of high download. In this work, we present weakly-private information retrieval (WPIR) schemes that trade off perfect privacy to improve the download cost when the database is stored on a single server. We study the tradeoff between the download cost and information leakage in terms of mutual information (MI) and maximal leakage (MaxL) privacy metrics. By relating the WPIR problem to rate-distortion theory, the download-leakage function, which is defined as the minimum required download cost of all single-server WPIR schemes for a given level of information leakage and a fixed file size, is introduced. By characterizing the download-leakage function for the MI and MaxL metrics, the capacity of single-server WPIR is fully described.
△ Less
Submitted 30 January, 2021; v1 submitted 23 January, 2020;
originally announced January 2020.
-
On the Capacity of Private Monomial Computation
Authors:
Yauhen Yakimenka,
Hsuan-Yin Lin,
Eirik Rosnes
Abstract:
In this work, we consider private monomial computation (PMC) for replicated noncolluding databases. In PMC, a user wishes to privately retrieve an arbitrary multivariate monomial from a candidate set of monomials in $f$ messages over a finite field $\mathbb F_q$, where $q=p^k$ is a power of a prime $p$ and $k \ge 1$, replicated over $n$ databases. We derive the PMC capacity under a technical condi…
▽ More
In this work, we consider private monomial computation (PMC) for replicated noncolluding databases. In PMC, a user wishes to privately retrieve an arbitrary multivariate monomial from a candidate set of monomials in $f$ messages over a finite field $\mathbb F_q$, where $q=p^k$ is a power of a prime $p$ and $k \ge 1$, replicated over $n$ databases. We derive the PMC capacity under a technical condition on $p$ and for asymptotically large $q$. The condition on $p$ is satisfied, e.g., for large enough $p$. Also, we present a novel PMC scheme for arbitrary $q$ that is capacity-achieving in the asymptotic case above. Moreover, we present formulas for the entropy of a multivariate monomial and for a set of monomials in uniformly distributed random variables over a finite field, which are used in the derivation of the capacity expression.
△ Less
Submitted 17 January, 2020;
originally announced January 2020.
-
Coded Distributed Tracking
Authors:
Albin Severinson,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
We consider the problem of tracking the state of a process that evolves over time in a distributed setting, with multiple observers each observing parts of the state, which is a fundamental information processing problem with a wide range of applications. We propose a cloud-assisted scheme where the tracking is performed over the cloud. In particular, to provide timely and accurate updates, and al…
▽ More
We consider the problem of tracking the state of a process that evolves over time in a distributed setting, with multiple observers each observing parts of the state, which is a fundamental information processing problem with a wide range of applications. We propose a cloud-assisted scheme where the tracking is performed over the cloud. In particular, to provide timely and accurate updates, and alleviate the straggler problem of cloud computing, we propose a coded distributed computing approach where coded observations are distributed over multiple workers. The proposed scheme is based on a coded version of the Kalman filter that operates on data encoded with an erasure correcting code, such that the state can be estimated from partial updates computed by a subset of the workers. We apply the proposed scheme to the problem of tracking multiple vehicles. We show that replication achieves significantly higher accuracy than the corresponding uncoded scheme. The use of maximum distance separable (MDS) codes further improves accuracy for larger update intervals. In both cases, the proposed scheme approaches the accuracy of an ideal centralized scheme when the update interval is large enough. Finally, we observe a trade-off between age-of-information and estimation accuracy for MDS codes.
△ Less
Submitted 2 September, 2019; v1 submitted 14 May, 2019;
originally announced May 2019.
-
Private Polynomial Computation for Noncolluding Coded Databases
Authors:
Sarah A. Obead,
Hsuan-Yin Lin,
Eirik Rosnes,
Jörg Kliewer
Abstract:
We consider private polynomial computation (PPC) over noncolluding coded databases. In such a setting a user wishes to compute a multivariate polynomial of degree at most $g$ over $f$ variables (or messages) stored in multiple databases while revealing no information about the desired polynomial to the databases. We construct two novel PPC schemes, where the first is a generalization of our previo…
▽ More
We consider private polynomial computation (PPC) over noncolluding coded databases. In such a setting a user wishes to compute a multivariate polynomial of degree at most $g$ over $f$ variables (or messages) stored in multiple databases while revealing no information about the desired polynomial to the databases. We construct two novel PPC schemes, where the first is a generalization of our previous work in private linear computation for coded databases. In this scheme we consider Reed-Solomon coded databases with Lagrange encoding, which leverages ideas from recently proposed star-product private information retrieval and Lagrange coded computation. The second scheme considers the special case of coded databases with systematic Lagrange encoding. Both schemes yield improved rates compared to the best known schemes from the literature for a small number of messages, while in the asymptotic case the rates match.
△ Less
Submitted 7 May, 2019; v1 submitted 29 January, 2019;
originally announced January 2019.
-
Weakly-Private Information Retrieval
Authors:
Hsuan-Yin Lin,
Siddhartha Kumar,
Eirik Rosnes,
Alexandre Graell i Amat,
Eitan Yaakobi
Abstract:
Private information retrieval (PIR) protocols make it possible to retrieve a file from a database without disclosing any information about the identity of the file being retrieved. These protocols have been rigorously explored from an information-theoretic perspective in recent years. While existing protocols strictly impose that no information is leaked on the file's identity, this work initiates…
▽ More
Private information retrieval (PIR) protocols make it possible to retrieve a file from a database without disclosing any information about the identity of the file being retrieved. These protocols have been rigorously explored from an information-theoretic perspective in recent years. While existing protocols strictly impose that no information is leaked on the file's identity, this work initiates the study of the tradeoffs that can be achieved by relaxing the requirement of perfect privacy. In case the user is willing to leak some information on the identity of the retrieved file, we study how the PIR rate, as well as the upload cost and access complexity, can be improved. For the particular case of replicated servers, we propose two weakly-private information retrieval schemes based on two recent PIR protocols and a family of schemes based on partitioning. Lastly, we compare the performance of the proposed schemes.
△ Less
Submitted 6 May, 2019; v1 submitted 20 January, 2019;
originally announced January 2019.
-
Capacity of Private Linear Computation for Coded Databases
Authors:
Sarah A. Obead,
Hsuan-Yin Lin,
Eirik Rosnes,
Jörg Kliewer
Abstract:
We consider the problem of private linear computation (PLC) in a distributed storage system. In PLC, a user wishes to compute a linear combination of $f$ messages stored in noncolluding databases while revealing no information about the coefficients of the desired linear combination to the databases. In extension of our previous work we employ linear codes to encode the information on the database…
▽ More
We consider the problem of private linear computation (PLC) in a distributed storage system. In PLC, a user wishes to compute a linear combination of $f$ messages stored in noncolluding databases while revealing no information about the coefficients of the desired linear combination to the databases. In extension of our previous work we employ linear codes to encode the information on the databases. We show that the PLC capacity, which is the ratio of the desired linear function size and the total amount of downloaded information, matches the maximum distance separable (MDS) coded capacity of private information retrieval for a large class of linear codes that includes MDS codes. In particular, the proposed converse is valid for any number of messages and linear combinations, and the capacity expression depends on the rank of the coefficient matrix obtained from all linear combinations.
△ Less
Submitted 9 October, 2018;
originally announced October 2018.
-
A Droplet Approach Based on Raptor Codes for Distributed Computing With Straggling Servers
Authors:
Albin Severinson,
Alexandre Graell i Amat,
Eirik Rosnes,
Francisco Lazaro,
Gianluigi Liva
Abstract:
We propose a coded distributed computing scheme based on Raptor codes to address the straggler problem. In particular, we consider a scheme where each server computes intermediate values, referred to as droplets, that are either stored locally or sent over the network. Once enough droplets are collected, the computation can be completed. Compared to previous schemes in the literature, our proposed…
▽ More
We propose a coded distributed computing scheme based on Raptor codes to address the straggler problem. In particular, we consider a scheme where each server computes intermediate values, referred to as droplets, that are either stored locally or sent over the network. Once enough droplets are collected, the computation can be completed. Compared to previous schemes in the literature, our proposed scheme achieves lower computational delay when the decoding time is taken into account.
△ Less
Submitted 8 October, 2018;
originally announced October 2018.
-
Construction D$^\prime$ Lattices from Quasi-Cyclic Low-Density Parity-Check Codes
Authors:
Siyu Chen,
Brian M. Kurkoski,
Eirik Rosnes
Abstract:
Recently, Branco da Silva and Silva described an efficient encoding and decoding algorithm for Construction D$^\prime$ lattices. Using their algorithm, we propose a Construction D$^\prime$ lattice based on binary quasi-cyclic low-density parity-check (QC-LPDC) codes and single parity-check product codes. The underlying codes designed by the balanced-distances rule contribute in a balanced manner t…
▽ More
Recently, Branco da Silva and Silva described an efficient encoding and decoding algorithm for Construction D$^\prime$ lattices. Using their algorithm, we propose a Construction D$^\prime$ lattice based on binary quasi-cyclic low-density parity-check (QC-LPDC) codes and single parity-check product codes. The underlying codes designed by the balanced-distances rule contribute in a balanced manner to the squared minimum distance of the constructed lattice, which results in a high lattice coding gain. The proposed lattice based on IEEE 802.16e QC-LDPC codes is shown to provide competitive error-rate performance on the power-unconstrained additive white Gaussian noise channel.
△ Less
Submitted 4 October, 2018;
originally announced October 2018.
-
Local Reconstruction Codes: A Class of MDS-PIR Capacity-Achieving Codes
Authors:
Siddhartha Kumar,
Hsuan-Yin Lin,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
We prove that a class of distance-optimal local reconstruction codes (LRCs), an important family of repair-efficient codes for distributed storage systems, achieve the maximum distance separable private information retrieval capacity for the case of noncolluding nodes. This particular class of codes includes Pyramid codes and other LRCs proposed in the literature.
We prove that a class of distance-optimal local reconstruction codes (LRCs), an important family of repair-efficient codes for distributed storage systems, achieve the maximum distance separable private information retrieval capacity for the case of noncolluding nodes. This particular class of codes includes Pyramid codes and other LRCs proposed in the literature.
△ Less
Submitted 18 September, 2018;
originally announced September 2018.
-
Private Information Retrieval From a Cellular Network With Caching at the Edge
Authors:
Siddhartha Kumar,
Alexandre Graell i Amat,
Eirik Rosnes,
Linda Senigagliesi
Abstract:
We consider the problem of downloading content from a cellular network where content is cached at the wireless edge while achieving privacy. In particular, we consider private information retrieval (PIR) of content from a library of files, i.e., the user wishes to download a file and does not want the network to learn any information about which file she is interested in. To reduce the backhaul us…
▽ More
We consider the problem of downloading content from a cellular network where content is cached at the wireless edge while achieving privacy. In particular, we consider private information retrieval (PIR) of content from a library of files, i.e., the user wishes to download a file and does not want the network to learn any information about which file she is interested in. To reduce the backhaul usage, content is cached at the wireless edge in a number of small-cell base stations (SBSs) using maximum distance separable codes. We propose a PIR scheme for this scenario that achieves privacy against a number of spy SBSs that (possibly) collaborate. The proposed PIR scheme is an extension of a recently introduced scheme by Kumar et al. to the case of multiple code rates, suitable for the scenario where files have different popularities. We then derive the backhaul rate and optimize the content placement to minimize it. We prove that uniform content placement is optimal, i.e., all files that are cached should be stored using the same code rate. This is in contrast to the case where no PIR is required. Furthermore, we show numerically that popular content placement is optimal for some scenarios.
△ Less
Submitted 4 September, 2018;
originally announced September 2018.
-
On the Fundamental Limit of Private Information Retrieval for Coded Distributed Storage
Authors:
Hsuan-Yin Lin,
Siddhartha Kumar,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
We consider private information retrieval (PIR) for distributed storage systems (DSSs) with noncolluding nodes where data is stored using a non maximum distance separable (MDS) linear code. It was recently shown that if data is stored using a particular class of non-MDS linear codes, the MDS-PIR capacity, i.e., the maximum possible PIR rate for MDS-coded DSSs, can be achieved. For this class of co…
▽ More
We consider private information retrieval (PIR) for distributed storage systems (DSSs) with noncolluding nodes where data is stored using a non maximum distance separable (MDS) linear code. It was recently shown that if data is stored using a particular class of non-MDS linear codes, the MDS-PIR capacity, i.e., the maximum possible PIR rate for MDS-coded DSSs, can be achieved. For this class of codes, we prove that the PIR capacity is indeed equal to the MDS-PIR capacity, giving the first family of non-MDS codes for which the PIR capacity is known. For other codes, we provide asymmetric PIR protocols that achieve a strictly larger PIR rate compared to existing symmetric PIR protocols.
△ Less
Submitted 27 August, 2018;
originally announced August 2018.
-
Failure Analysis of the Interval-Passing Algorithm for Compressed Sensing
Authors:
Yauhen Yakimenka,
Eirik Rosnes
Abstract:
In this work, we perform a complete failure analysis of the interval-passing algorithm (IPA) for compressed sensing, an efficient iterative algorithm for reconstructing a $k$-sparse nonnegative $n$-dimensional real signal $\boldsymbol{x}$ from a small number of linear measurements $\boldsymbol{y}$. In particular, we show that the IPA fails to recover $\boldsymbol{x}$ from $\boldsymbol{y}$ if and o…
▽ More
In this work, we perform a complete failure analysis of the interval-passing algorithm (IPA) for compressed sensing, an efficient iterative algorithm for reconstructing a $k$-sparse nonnegative $n$-dimensional real signal $\boldsymbol{x}$ from a small number of linear measurements $\boldsymbol{y}$. In particular, we show that the IPA fails to recover $\boldsymbol{x}$ from $\boldsymbol{y}$ if and only if it fails to recover a corresponding binary vector of the same support, and also that only positions of nonzero values in the measurement matrix are of importance to the success of recovery. Based on this observation, we introduce termatiko sets and show that the IPA fails to fully recover $\boldsymbol x$ if and only if the support of $\boldsymbol x$ contains a nonempty termatiko set, thus giving a complete (graph-theoretic) description of the failing sets of the IPA. Two heuristics to locate small-size termatiko sets are presented. For binary column-regular measurement matrices with no $4$-cycles, we provide a lower bound on the termatiko distance, defined as the smallest size of a nonempty termatiko set. For measurement matrices constructed from the parity-check matrices of array LDPC codes, upper bounds on the termatiko distance are provided for column-weight at most $7$, while for column-weight $3$, the exact termatiko distance and its corresponding multiplicity are provided. Next, we show that adding redundant rows to the measurement matrix does not create new termatiko sets, but rather potentially removes termatiko sets and thus improves performance. An algorithm is provided to efficiently search for such redundant rows. Finally, we present numerical results for different specific measurement matrices and also for protograph-based ensembles of measurement matrices, as well as simulation results of IPA performance, showing the influence of small-size termatiko sets.
△ Less
Submitted 19 December, 2019; v1 submitted 13 June, 2018;
originally announced June 2018.
-
Asymmetry Helps: Improved Private Information Retrieval Protocols for Distributed Storage
Authors:
Hsuan-Yin Lin,
Siddhartha Kumar,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
We consider private information retrieval (PIR) for distributed storage systems (DSSs) with noncolluding nodes where data is stored using a non maximum distance separable (MDS) linear code. It was recently shown that if data is stored using a particular class of non-MDS linear codes, the MDS-PIR capacity, i.e., the maximum possible PIR rate for MDS-coded DSSs, can be achieved. For this class of co…
▽ More
We consider private information retrieval (PIR) for distributed storage systems (DSSs) with noncolluding nodes where data is stored using a non maximum distance separable (MDS) linear code. It was recently shown that if data is stored using a particular class of non-MDS linear codes, the MDS-PIR capacity, i.e., the maximum possible PIR rate for MDS-coded DSSs, can be achieved. For this class of codes, we prove that the PIR capacity is indeed equal to the MDS-PIR capacity, giving the first family of non-MDS codes for which the PIR capacity is known. For other codes, we provide asymmetric PIR protocols that achieve a strictly larger PIR rate compared to existing symmetric PIR protocols.
△ Less
Submitted 18 September, 2018; v1 submitted 4 June, 2018;
originally announced June 2018.
-
Code Constructions for Distributed Storage With Low Repair Bandwidth and Low Repair Complexity
Authors:
Siddhartha Kumar,
Alexandre Graell i Amat,
Iryna Andriyanova,
Fredrik Brännström,
Eirik Rosnes
Abstract:
We present the construction of a family of erasure correcting codes for distributed storage that achieve low repair bandwidth and complexity at the expense of a lower fault tolerance. The construction is based on two classes of codes, where the primary goal of the first class of codes is to provide fault tolerance, while the second class aims at reducing the repair bandwidth and repair complexity.…
▽ More
We present the construction of a family of erasure correcting codes for distributed storage that achieve low repair bandwidth and complexity at the expense of a lower fault tolerance. The construction is based on two classes of codes, where the primary goal of the first class of codes is to provide fault tolerance, while the second class aims at reducing the repair bandwidth and repair complexity. The repair procedure is a two- step procedure where parts of the failed node are repaired in the first step using the first code. The downloaded symbols during the first step are cached in the memory and used to repair the remaining erased data symbols at minimal additional read cost during the second step. The first class of codes is based on MDS codes modified using piggybacks, while the second class is designed to reduce the number of additional symbols that need to be downloaded to repair the remaining erased symbols. We numerically show that the proposed codes achieve better repair bandwidth compared to MDS codes, codes constructed using piggybacks, and local reconstruction/Pyramid codes, while a better repair complexity is achieved when compared to MDS, Zigzag, Pyramid codes, and codes constructed using piggybacks.
△ Less
Submitted 31 August, 2018; v1 submitted 18 January, 2018;
originally announced January 2018.
-
An MDS-PIR Capacity-Achieving Protocol for Distributed Storage Using Non-MDS Linear Codes
Authors:
Hsuan-Yin Lin,
Siddhartha Kumar,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
We propose a private information retrieval (PIR) protocol for distributed storage systems with noncolluding nodes where data is stored using an arbitrary linear code. An expression for the PIR rate, i.e., the ratio of the amount of retrieved data per unit of downloaded data, is derived, and a necessary and a sufficient condition for codes to achieve the maximum distance separable (MDS) PIR capacit…
▽ More
We propose a private information retrieval (PIR) protocol for distributed storage systems with noncolluding nodes where data is stored using an arbitrary linear code. An expression for the PIR rate, i.e., the ratio of the amount of retrieved data per unit of downloaded data, is derived, and a necessary and a sufficient condition for codes to achieve the maximum distance separable (MDS) PIR capacity are given. The necessary condition is based on the generalized Hamming weights of the storage code, while the sufficient condition is based on code automorphisms. We show that cyclic codes and Reed-Muller codes satisfy the sufficient condition and are thus MDS-PIR capacity-achieving.
△ Less
Submitted 31 May, 2018; v1 submitted 13 January, 2018;
originally announced January 2018.
-
Block-Diagonal and LT Codes for Distributed Computing With Straggling Servers
Authors:
Albin Severinson,
Alexandre Graell i Amat,
Eirik Rosnes
Abstract:
We propose two coded schemes for the distributed computing problem of multiplying a matrix by a set of vectors. The first scheme is based on partitioning the matrix into submatrices and applying maximum distance separable (MDS) codes to each submatrix. For this scheme, we prove that up to a given number of partitions the communication load and the computational delay (not including the encoding an…
▽ More
We propose two coded schemes for the distributed computing problem of multiplying a matrix by a set of vectors. The first scheme is based on partitioning the matrix into submatrices and applying maximum distance separable (MDS) codes to each submatrix. For this scheme, we prove that up to a given number of partitions the communication load and the computational delay (not including the encoding and decoding delay) are identical to those of the scheme recently proposed by Li et al., based on a single, long MDS code. However, due to the use of shorter MDS codes, our scheme yields a significantly lower overall computational delay when the delay incurred by encoding and decoding is also considered. We further propose a second coded scheme based on Luby Transform (LT) codes under inactivation decoding. Interestingly, LT codes may reduce the delay over the partitioned scheme at the expense of an increased communication load. We also consider distributed computing under a deadline and show numerically that the proposed schemes outperform other schemes in the literature, with the LT code-based scheme yielding the best performance for the scenarios considered.
△ Less
Submitted 19 October, 2018; v1 submitted 21 December, 2017;
originally announced December 2017.
-
Achieving Maximum Distance Separable Private Information Retrieval Capacity With Linear Codes
Authors:
Siddhartha Kumar,
Hsuan-Yin Lin,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
We propose three private information retrieval (PIR) protocols for distributed storage systems (DSSs) where data is stored using an arbitrary linear code. The first two protocols, named Protocol 1 and Protocol 2, achieve privacy for the scenario with noncolluding nodes. Protocol 1 requires a file size that is exponential in the number of files in the system, while Protocol 2 requires a file size t…
▽ More
We propose three private information retrieval (PIR) protocols for distributed storage systems (DSSs) where data is stored using an arbitrary linear code. The first two protocols, named Protocol 1 and Protocol 2, achieve privacy for the scenario with noncolluding nodes. Protocol 1 requires a file size that is exponential in the number of files in the system, while Protocol 2 requires a file size that is independent of the number of files and is hence simpler. We prove that, for certain linear codes, Protocol 1 achieves the maximum distance separable (MDS) PIR capacity, i.e., the maximum PIR rate (the ratio of the amount of retrieved stored data per unit of downloaded data) for a DSS that uses an MDS code to store any given (finite and infinite) number of files, and Protocol 2 achieves the asymptotic MDS-PIR capacity (with infinitely large number of files in the DSS). In particular, we provide a necessary and a sufficient condition for a code to achieve the MDS-PIR capacity with Protocols 1 and 2 and prove that cyclic codes, Reed-Muller (RM) codes, and a class of distance-optimal local reconstruction codes achieve both the finite MDS-PIR capacity (i.e., with any given number of files) and the asymptotic MDS-PIR capacity with Protocols 1 and 2, respectively. Furthermore, we present a third protocol, Protocol 3, for the scenario with multiple colluding nodes, which can be seen as an improvement of a protocol recently introduced by Freij-Hollanti et al.. Similar to the noncolluding case, we provide a necessary and a sufficient condition to achieve the maximum possible PIR rate of Protocol 3. Moreover, we provide a particular class of codes that is suitable for this protocol and show that RM codes achieve the maximum possible PIR rate for the protocol. For all three protocols, we present an algorithm to optimize their PIR rates.
△ Less
Submitted 11 February, 2019; v1 submitted 11 December, 2017;
originally announced December 2017.
-
ML and Near-ML Decoding of LDPC Codes Over the BEC: Bounds and Decoding Algorithms
Authors:
Irina E. Bocharova,
Boris D. Kudryashov,
Vitaly Skachek,
Eirik Rosnes,
Øyvind Ytrehus
Abstract:
The performance of maximum-likelihood (ML) decoding on the binary erasure channel for finite-length low-density parity-check (LDPC) codes from two random ensembles is studied. The theoretical average spectrum of the Gallager ensemble is computed by using a recurrent procedure and compared to the empirically found average spectrum for the same ensemble as well as to the empirical average spectrum o…
▽ More
The performance of maximum-likelihood (ML) decoding on the binary erasure channel for finite-length low-density parity-check (LDPC) codes from two random ensembles is studied. The theoretical average spectrum of the Gallager ensemble is computed by using a recurrent procedure and compared to the empirically found average spectrum for the same ensemble as well as to the empirical average spectrum of the Richardson-Urbanke ensemble and spectra of selected codes from both ensembles. Distance properties of the random codes from the Gallager ensemble are discussed. A tightened union-type upper bound on the ML decoding error probability based on the precise coefficients of the average spectrum is presented. A new upper bound on the ML decoding performance of LDPC codes from the Gallager ensemble based on computing the rank of submatrices of the code parity-check matrix is derived. A new low-complexity near-ML decoding algorithm for quasi-cyclic LDPC codes is proposed and simulated. Its performance is compared to the upper bounds on the ML decoding performance.
△ Less
Submitted 20 November, 2018; v1 submitted 5 September, 2017;
originally announced September 2017.
-
Adaptive Linear Programming Decoding of Nonbinary Linear Codes Over Prime Fields
Authors:
Eirik Rosnes,
Michael Helmling
Abstract:
In this work, we consider adaptive linear programming (ALP) decoding of linear codes over the finite field $\mathbb{F}_p$ of size $p$ where $p$ is a prime. In particular, we provide a general construction of valid inequalities for the codeword polytope of the so-called constant-weight embedding of a single parity-check (SPC) code over any prime field. The construction is based on classes of buildi…
▽ More
In this work, we consider adaptive linear programming (ALP) decoding of linear codes over the finite field $\mathbb{F}_p$ of size $p$ where $p$ is a prime. In particular, we provide a general construction of valid inequalities for the codeword polytope of the so-called constant-weight embedding of a single parity-check (SPC) code over any prime field. The construction is based on classes of building blocks that are assembled to form the left-hand side of an inequality according to several rules. In the case of almost doubly-symmetric valid classes we prove that the resulting inequalities are all facet-defining, while we conjecture this to be true if and only if the class is valid and symmetric. For $p=3$, there is only a single valid symmetric class and we prove that the resulting inequalities together with the so-called simplex constraints give a completely and irredundant description of the codeword polytope of the embedded SPC code. For $p>5$, we show that there are additional facets beyond those from the proposed construction. We use these inequalities to develop an efficient (relaxed) ALP decoder for general (non-SPC) linear codes over prime fields. The key ingredient is an efficient separation algorithm based on the principle of dynamic programming. Furthermore, we construct a decoder for linear codes over arbitrary fields $\mathbb{F}_q$ with $q=p^m$ and $m>1$ by a factor graph representation that reduces to several instances of the case $m=1$, which results, in general, in a relaxation of the original decoding polytope. Finally, we present an efficient cut-generating algorithm to search for redundant parity-checks to further improve the performance towards maximum-likelihood decoding for short-to-medium block lengths. Numerical experiments confirm that our new decoder is very efficient compared to a static LP decoder for various field sizes, check-node degrees, and block lengths.
△ Less
Submitted 19 November, 2019; v1 submitted 23 August, 2017;
originally announced August 2017.
-
Lengthening and Extending Binary Private Information Retrieval Codes
Authors:
Hsuan-Yin Lin,
Eirik Rosnes
Abstract:
It was recently shown by Fazeli et al. that the storage overhead of a traditional $t$-server private information retrieval (PIR) protocol can be significantly reduced using the concept of a $t$-server PIR code. In this work, we show that a family of $t$-server PIR codes (with increasing dimensions and blocklengths) can be constructed from an existing $t$-server PIR code through lengthening by a si…
▽ More
It was recently shown by Fazeli et al. that the storage overhead of a traditional $t$-server private information retrieval (PIR) protocol can be significantly reduced using the concept of a $t$-server PIR code. In this work, we show that a family of $t$-server PIR codes (with increasing dimensions and blocklengths) can be constructed from an existing $t$-server PIR code through lengthening by a single information symbol and code extension by at most $\bigl\lceil t/2\bigr\rceil$ code symbols. Furthermore, by extending a code construction notion from Steiner systems by Fazeli et al., we obtain a specific family of $t$-server PIR codes. Based on a code construction technique that lengthens and extends a $t$-server PIR code simultaneously, a basic algorithm to find good (i.e., small blocklength) $t$-server PIR codes is proposed. For the special case of $t=5$, we find provably optimal PIR codes for code dimensions $k\leq 6$, while for all $7\leq k\leq 32$ we find codes of smaller blocklength than the best known codes from the literature. Furthermore, in the case of $t = 8$, we also find better codes for $k = 5, 6, 11, 12$. Numerical results show that most of the best found $5$-server PIR codes can be constructed from the proposed family of codes connected to Steiner systems.
△ Less
Submitted 23 January, 2018; v1 submitted 11 July, 2017;
originally announced July 2017.
-
Block-Diagonal Coding for Distributed Computing With Straggling Servers
Authors:
Albin Severinson,
Alexandre Graell i Amat,
Eirik Rosnes
Abstract:
We consider the distributed computing problem of multiplying a set of vectors with a matrix. For this scenario, Li et al. recently presented a unified coding framework and showed a fundamental tradeoff between computational delay and communication load. This coding framework is based on maximum distance separable (MDS) codes of code length proportional to the number of rows of the matrix, which ca…
▽ More
We consider the distributed computing problem of multiplying a set of vectors with a matrix. For this scenario, Li et al. recently presented a unified coding framework and showed a fundamental tradeoff between computational delay and communication load. This coding framework is based on maximum distance separable (MDS) codes of code length proportional to the number of rows of the matrix, which can be very large. We propose a block-diagonal coding scheme consisting of partitioning the matrix into submatrices and encoding each submatrix using a shorter MDS code. We show that the assignment of coded matrix rows to servers to minimize the communication load can be formulated as an integer program with a nonlinear cost function, and propose an algorithm to solve it. We further prove that, up to a level of partitioning, the proposed scheme does not incur any loss in terms of computational delay (as defined by Li et al.) and communication load compared to the scheme by Li et al.. We also show numerically that, when the decoding time is also taken into account, the proposed scheme significantly lowers the overall computational delay with respect to the scheme by Li et al.. For heavy partitioning, this is achieved at the expense of a slight increase in communication load.
△ Less
Submitted 15 September, 2017; v1 submitted 23 January, 2017;
originally announced January 2017.
-
Private Information Retrieval in Distributed Storage Systems Using an Arbitrary Linear Code
Authors:
Siddhartha Kumar,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
We propose an information-theoretic private information retrieval (PIR) scheme for distributed storage systems where data is stored using a linear systematic code of rate $R > 1/2$. The proposed scheme generalizes the PIR scheme for data stored using maximum distance separable codes recently proposed by Tajeddine and El Rouayheb for the scenario of a single spy node. We further propose an algorith…
▽ More
We propose an information-theoretic private information retrieval (PIR) scheme for distributed storage systems where data is stored using a linear systematic code of rate $R > 1/2$. The proposed scheme generalizes the PIR scheme for data stored using maximum distance separable codes recently proposed by Tajeddine and El Rouayheb for the scenario of a single spy node. We further propose an algorithm to optimize the communication price of privacy (cPoP) using the structure of the underlying linear code. As an example, we apply the proposed algorithm to several distributed storage codes, showing that the cPoP can be significantly reduced by exploiting the structure of the distributed storage code.
△ Less
Submitted 30 May, 2017; v1 submitted 21 December, 2016;
originally announced December 2016.
-
On Failing Sets of the Interval-Passing Algorithm for Compressed Sensing
Authors:
Yauhen Yakimenka,
Eirik Rosnes
Abstract:
In this work, we analyze the failing sets of the interval-passing algorithm (IPA) for compressed sensing. The IPA is an efficient iterative algorithm for reconstructing a k-sparse nonnegative n-dimensional real signal x from a small number of linear measurements y. In particular, we show that the IPA fails to recover x from y if and only if it fails to recover a corresponding binary vector of the…
▽ More
In this work, we analyze the failing sets of the interval-passing algorithm (IPA) for compressed sensing. The IPA is an efficient iterative algorithm for reconstructing a k-sparse nonnegative n-dimensional real signal x from a small number of linear measurements y. In particular, we show that the IPA fails to recover x from y if and only if it fails to recover a corresponding binary vector of the same support, and also that only positions of nonzero values in the measurement matrix are of importance for success of recovery. Based on this observation, we introduce termatiko sets and show that the IPA fails to fully recover x if and only if the support of x contains a nonempty termatiko set, thus giving a complete (graph-theoretic) description of the failing sets of the IPA. Finally, we present an extensive numerical study showing that in many cases there exist termatiko sets of size strictly smaller than the stop** distance of the binary measurement matrix; even as low as half the stop** distance in some cases.
△ Less
Submitted 4 October, 2016; v1 submitted 18 July, 2016;
originally announced July 2016.
-
Secure Repairable Fountain Codes
Authors:
Siddhartha Kumar,
Eirik Rosnes,
Alexandre Graell i Amat
Abstract:
In this letter, we provide the construction of repairable fountain codes (RFCs) for distributed storage systems that are information-theoretically secure against an eavesdropper that has access to the data stored in a subset of the storage nodes and the data downloaded to repair an additional subset of storage nodes. The security is achieved by adding random symbols to the message, which is then e…
▽ More
In this letter, we provide the construction of repairable fountain codes (RFCs) for distributed storage systems that are information-theoretically secure against an eavesdropper that has access to the data stored in a subset of the storage nodes and the data downloaded to repair an additional subset of storage nodes. The security is achieved by adding random symbols to the message, which is then encoded by the concatenation of a Gabidulin code and an RFC. We compare the achievable code rates of the proposed codes with those of secure minimum storage regenerating codes and secure locally repairable codes.
△ Less
Submitted 26 May, 2016;
originally announced May 2016.