-
Computationally Relaxed Locally Decodable Codes, Revisited
Authors:
Alexander R. Block,
Jeremiah Blocki
Abstract:
We revisit computationally relaxed locally decodable codes (crLDCs) (Blocki et al., Trans. Inf. Theory '21) and give two new constructions. Our first construction is a Hamming crLDC that is conceptually simpler than prior constructions, leveraging digital signature schemes and an appropriately chosen Hamming code. Our second construction is an extension of our Hamming crLDC to handle insertion-del…
▽ More
We revisit computationally relaxed locally decodable codes (crLDCs) (Blocki et al., Trans. Inf. Theory '21) and give two new constructions. Our first construction is a Hamming crLDC that is conceptually simpler than prior constructions, leveraging digital signature schemes and an appropriately chosen Hamming code. Our second construction is an extension of our Hamming crLDC to handle insertion-deletion (InsDel) errors, yielding an InsDel crLDC. This extension crucially relies on the noisy binary search techniques of Block et al. (FSTTCS '20) to handle InsDel errors. Both crLDC constructions have binary codeword alphabets, are resilient to a constant fraction of Hamming and InsDel errors, respectively, and under suitable parameter choices have poly-logarithmic locality and encoding length linear in the message length and polynomial in the security parameter. These parameters compare favorably to prior constructions in the poly-logarithmic locality regime.
△ Less
Submitted 4 September, 2023; v1 submitted 1 May, 2023;
originally announced May 2023.
-
Differentially Private $L_2$-Heavy Hitters in the Sliding Window Model
Authors:
Jeremiah Blocki,
Seunghoon Lee,
Tamalika Mukherjee,
Samson Zhou
Abstract:
The data management of large companies often prioritize more recent data, as a source of higher accuracy prediction than outdated data. For example, the Facebook data policy retains user search histories for $6$ months while the Google data retention policy states that browser information may be stored for up to $9$ months. These policies are captured by the sliding window model, in which only the…
▽ More
The data management of large companies often prioritize more recent data, as a source of higher accuracy prediction than outdated data. For example, the Facebook data policy retains user search histories for $6$ months while the Google data retention policy states that browser information may be stored for up to $9$ months. These policies are captured by the sliding window model, in which only the most recent $W$ statistics form the underlying dataset.
In this paper, we consider the problem of privately releasing the $L_2$-heavy hitters in the sliding window model, which include $L_p$-heavy hitters for $p\le 2$ and in some sense are the strongest possible guarantees that can be achieved using polylogarithmic space, but cannot be handled by existing techniques due to the sub-additivity of the $L_2$ norm. Moreover, existing non-private sliding window algorithms use the smooth histogram framework, which has high sensitivity.
To overcome these barriers, we introduce the first differentially private algorithm for $L_2$-heavy hitters in the sliding window model by initiating a number of $L_2$-heavy hitter algorithms across the stream with significantly lower threshold. Similarly, we augment the algorithms with an approximate frequency tracking algorithm with significantly higher accuracy. We then use smooth sensitivity and statistical distance arguments to show that we can add noise proportional to an estimation of the $L_2$ norm. To the best of our knowledge, our techniques are the first to privately release statistics that are related to a sub-additive function in the sliding window model, and may be of independent interest to future differentially private algorithmic design in the sliding window model.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
How to Make Your Approximation Algorithm Private: A Black-Box Differentially-Private Transformation for Tunable Approximation Algorithms of Functions with Low Sensitivity
Authors:
Jeremiah Blocki,
Elena Grigorescu,
Tamalika Mukherjee,
Samson Zhou
Abstract:
We develop a framework for efficiently transforming certain approximation algorithms into differentially-private variants, in a black-box manner. Specifically, our results focus on algorithms A that output an approximation to a function f of the form $(1-a)f(x)-k \leq A(x) \leq (1+a)f(x)+k$, where $k \in \mathbb{R}_{\geq 0}$ denotes additive error and $a \in [0,1)$ denotes multiplicative error can…
▽ More
We develop a framework for efficiently transforming certain approximation algorithms into differentially-private variants, in a black-box manner. Specifically, our results focus on algorithms A that output an approximation to a function f of the form $(1-a)f(x)-k \leq A(x) \leq (1+a)f(x)+k$, where $k \in \mathbb{R}_{\geq 0}$ denotes additive error and $a \in [0,1)$ denotes multiplicative error can be``tuned" to small-enough values while incurring only a polynomial blowup in the running time/space. We show that such algorithms can be made DP without sacrificing accuracy, as long as the function f has small global sensitivity. We achieve these results by applying the smooth sensitivity framework developed by Nissim, Raskhodnikova, and Smith (STOC 2007).
Our framework naturally applies to transform non-private FPRAS and FPTAS algorithms into $ε$-DP approximation algorithms where the former case requires an additional postprocessing step. We apply our framework in the context of sublinear-time and sublinear-space algorithms, while preserving the nature of the algorithm in meaningful ranges of the parameters. Our results include the first (to the best of our knowledge) $ε$-edge DP sublinear-time algorithm for estimating the number of triangles, the number of connected components, and the weight of a minimum spanning tree of a graph. In the area of streaming algorithms, our results include $ε$-DP algorithms for estimating Lp-norms, distinct elements, and weighted minimum spanning tree for both insertion-only and turnstile streams. Our transformation also provides a private version of the smooth histogram framework, which is commonly used for converting streaming algorithms into sliding window variants, and achieves a multiplicative approximation to many problems, such as estimating Lp-norms, distinct elements, and the length of the longest increasing subsequence.
△ Less
Submitted 10 September, 2023; v1 submitted 7 October, 2022;
originally announced October 2022.
-
On Relaxed Locally Decodable Codes for Hamming and Insertion-Deletion Errors
Authors:
Alex Block,
Jeremiah Blocki,
Kuan Cheng,
Elena Grigorescu,
Xin Li,
Yu Zheng,
Minshen Zhu
Abstract:
Locally Decodable Codes (LDCs) are error-correcting codes $C:Σ^n\rightarrow Σ^m$ with super-fast decoding algorithms. They are important mathematical objects in many areas of theoretical computer science, yet the best constructions so far have codeword length $m$ that is super-polynomial in $n$, for codes with constant query complexity and constant alphabet size. In a very surprising result, Ben-S…
▽ More
Locally Decodable Codes (LDCs) are error-correcting codes $C:Σ^n\rightarrow Σ^m$ with super-fast decoding algorithms. They are important mathematical objects in many areas of theoretical computer science, yet the best constructions so far have codeword length $m$ that is super-polynomial in $n$, for codes with constant query complexity and constant alphabet size. In a very surprising result, Ben-Sasson et al. showed how to construct a relaxed version of LDCs (RLDCs) with constant query complexity and almost linear codeword length over the binary alphabet, and used them to obtain significantly-improved constructions of Probabilistically Checkable Proofs. In this work, we study RLDCs in the standard Hamming-error setting, and introduce their variants in the insertion and deletion (Insdel) error setting. Insdel LDCs were first studied by Ostrovsky and Paskin-Cherniavsky, and are further motivated by recent advances in DNA random access bio-technologies, in which the goal is to retrieve individual files from a DNA storage database. Our first result is an exponential lower bound on the length of Hamming RLDCs making 2 queries, over the binary alphabet. This answers a question explicitly raised by Gur and Lachish. Our result exhibits a "phase-transition"-type behavior on the codeword length for constant-query Hamming RLDCs. We further define two variants of RLDCs in the Insdel-error setting, a weak and a strong version. On the one hand, we construct weak Insdel RLDCs with with parameters matching those of the Hamming variants. On the other hand, we prove exponential lower bounds for strong Insdel RLDCs. These results demonstrate that, while these variants are equivalent in the Hamming setting, they are significantly different in the insdel setting. Our results also prove a strict separation between Hamming RLDCs and Insdel RLDCs.
△ Less
Submitted 18 September, 2022;
originally announced September 2022.
-
Cost-Asymmetric Memory Hard Password Hashing
Authors:
Wenjie Bai,
Jeremiah Blocki,
Mohammad Hassan Ameri
Abstract:
In the past decade, billions of user passwords have been exposed to the dangerous threat of offline password cracking attacks. An offline attacker who has stolen the cryptographic hash of a user's password can check as many password guesses as s/he likes limited only by the resources that s/he is willing to invest to crack the password. Pepper and key-stretching are two techniques that have been p…
▽ More
In the past decade, billions of user passwords have been exposed to the dangerous threat of offline password cracking attacks. An offline attacker who has stolen the cryptographic hash of a user's password can check as many password guesses as s/he likes limited only by the resources that s/he is willing to invest to crack the password. Pepper and key-stretching are two techniques that have been proposed to deter an offline attacker by increasing guessing costs. Pepper ensures that the cost of rejecting an incorrect password guess is higher than the (expected) cost of verifying a correct password guess. This is useful because most of the offline attacker's guesses will be incorrect. Unfortunately, as we observe the traditional peppering defense seems to be incompatible with modern memory hard key-stretching algorithms such as Argon2 or Scrypt. We introduce an alternative to pepper which we call Cost-Asymmetric Memory Hard Password Authentication which benefits from the same cost-asymmetry as the classical peppering defense i.e., the cost of rejecting an incorrect password guess is larger than the expected cost to authenticate a correct password guess. When configured properly we prove that our mechanism can only reduce the percentage of user passwords that are cracked by a rational offline attacker whose goal is to maximize (expected) profit i.e., the total value of cracked passwords minus the total guessing costs. We evaluate the effectiveness of our mechanism on empirical password datasets against a rational offline attacker. Our empirical analysis shows that our mechanism can reduce significantly the percentage of user passwords that are cracked by a rational attacker by up to 10%.
△ Less
Submitted 26 June, 2022;
originally announced June 2022.
-
Using Illustrations to Communicate Differential Privacy Trust Models: An Investigation of Users' Comprehension, Perception, and Data Sharing Decision
Authors:
Chuhao Wu,
Tianhao Wang,
Robert W. Proctor,
Jeremiah Blocki,
Ninghui Li,
Somesh Jha
Abstract:
Proper communication is key to the adoption and implementation of differential privacy (DP). However, a prior study found that laypeople did not understand the data perturbation processes of DP and how DP noise protects their sensitive personal information. Consequently, they distrusted the techniques and chose to opt out of participating. In this project, we designed explanative illustrations of…
▽ More
Proper communication is key to the adoption and implementation of differential privacy (DP). However, a prior study found that laypeople did not understand the data perturbation processes of DP and how DP noise protects their sensitive personal information. Consequently, they distrusted the techniques and chose to opt out of participating. In this project, we designed explanative illustrations of three DP models (Central DP, Local DP, Shuffler DP) to help laypeople conceptualize how random noise is added to protect individuals' privacy and preserve group utility. Following pilot surveys and interview studies, we conducted two online experiments (N = 595) examining participants' comprehension, privacy and utility perception, and data-sharing decisions across the three DP models. Besides the comparisons across the three models, we varied the noise levels of each model. We found that the illustrations can be effective in communicating DP to the participants. Given an adequate comprehension of DP, participants preferred strong privacy protection for a certain type of data usage scenarios (i.e., commercial interests) at both the model level and the noise level. We also obtained empirical evidence showing participants' acceptance of the Shuffler DP model for data privacy protection. Our findings have implications for multiple stakeholders for user-centered deployments of differential privacy, including app developers, DP model developers, data curators, and online users.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
Privately Estimating Graph Parameters in Sublinear time
Authors:
Jeremiah Blocki,
Elena Grigorescu,
Tamalika Mukherjee
Abstract:
We initiate a systematic study of algorithms that are both differentially private and run in sublinear time for several problems in which the goal is to estimate natural graph parameters. Our main result is a differentially-private $(1+ρ)$-approximation algorithm for the problem of computing the average degree of a graph, for every $ρ>0$. The running time of the algorithm is roughly the same as it…
▽ More
We initiate a systematic study of algorithms that are both differentially private and run in sublinear time for several problems in which the goal is to estimate natural graph parameters. Our main result is a differentially-private $(1+ρ)$-approximation algorithm for the problem of computing the average degree of a graph, for every $ρ>0$. The running time of the algorithm is roughly the same as its non-private version proposed by Goldreich and Ron (Sublinear Algorithms, 2005). We also obtain the first differentially-private sublinear-time approximation algorithms for the maximum matching size and the minimum vertex cover size of a graph.
An overarching technique we employ is the notion of coupled global sensitivity of randomized algorithms. Related variants of this notion of sensitivity have been used in the literature in ad-hoc ways. Here we formalize the notion and develop it as a unifying framework for privacy analysis of randomized approximation algorithms.
△ Less
Submitted 14 March, 2022; v1 submitted 11 February, 2022;
originally announced February 2022.
-
Differentially-Private Sublinear-Time Clustering
Authors:
Jeremiah Blocki,
Elena Grigorescu,
Tamalika Mukherjee
Abstract:
Clustering is an essential primitive in unsupervised machine learning. We bring forth the problem of sublinear-time differentially-private clustering as a natural and well-motivated direction of research. We combine the $k$-means and $k$-median sublinear-time results of Mishra et al. (SODA, 2001) and of Czumaj and Sohler (Rand. Struct. and Algorithms, 2007) with recent results on private clusterin…
▽ More
Clustering is an essential primitive in unsupervised machine learning. We bring forth the problem of sublinear-time differentially-private clustering as a natural and well-motivated direction of research. We combine the $k$-means and $k$-median sublinear-time results of Mishra et al. (SODA, 2001) and of Czumaj and Sohler (Rand. Struct. and Algorithms, 2007) with recent results on private clustering of Balcan et al. (ICML 2017), Gupta et al. (SODA, 2010) and Ghazi et al. (NeurIPS, 2020) to obtain sublinear-time private $k$-means and $k$-median algorithms via subsampling. We also investigate the privacy benefits of subsampling for group privacy.
△ Less
Submitted 27 December, 2021;
originally announced December 2021.
-
Exponential Lower Bounds for Locally Decodable and Correctable Codes for Insertions and Deletions
Authors:
Jeremiah Blocki,
Kuan Cheng,
Elena Grigorescu,
Xin Li,
Yu Zheng,
Minshen Zhu
Abstract:
Locally Decodable Codes (LDCs) are error-correcting codes for which individual message symbols can be quickly recovered despite errors in the codeword. LDCs for Hamming errors have been studied extensively in the past few decades, where a major goal is to understand the amount of redundancy that is necessary and sufficient to decode from large amounts of error, with small query complexity.
In th…
▽ More
Locally Decodable Codes (LDCs) are error-correcting codes for which individual message symbols can be quickly recovered despite errors in the codeword. LDCs for Hamming errors have been studied extensively in the past few decades, where a major goal is to understand the amount of redundancy that is necessary and sufficient to decode from large amounts of error, with small query complexity.
In this work, we study LDCs for insertion and deletion errors, called Insdel LDCs. Their study was initiated by Ostrovsky and Paskin-Cherniavsky (Information Theoretic Security, 2015), who gave a reduction from Hamming LDCs to Insdel LDCs with a small blowup in the code parameters. On the other hand, the only known lower bounds for Insdel LDCs come from those for Hamming LDCs, thus there is no separation between them. Here we prove new, strong lower bounds for the existence of Insdel LDCs. In particular, we show that $2$-query linear Insdel LDCs do not exist, and give an exponential lower bound for the length of all $q$-query Insdel LDCs with constant $q$. For $q \ge 3$ our bounds are exponential in the existing lower bounds for Hamming LDCs. Furthermore, our exponential lower bounds continue to hold for adaptive decoders, and even in private-key settings where the encoder and decoder share secret randomness. This exhibits a strict separation between Hamming LDCs and Insdel LDCs.
Our strong lower bounds also hold for the related notion of Insdel LCCs (except in the private-key setting), due to an analogue to the Insdel notions of a reduction from Hamming LCCs to LDCs.
Our techniques are based on a delicate design and analysis of hard distributions of insertion and deletion errors, which depart significantly from typical techniques used in analyzing Hamming LDCs.
△ Less
Submitted 16 September, 2022; v1 submitted 1 November, 2021;
originally announced November 2021.
-
The Parallel Reversible Pebbling Game: Analyzing the Post-Quantum Security of iMHFs
Authors:
Jeremiah Blocki,
Blake Holman,
Seunghoon Lee
Abstract:
The classical (parallel) black pebbling game is a useful abstraction which allows us to analyze the resources (space, space-time, cumulative space) necessary to evaluate a function $f$ with a static data-dependency graph $G$. Of particular interest in the field of cryptography are data-independent memory-hard functions $f_{G,H}$ which are defined by a directed acyclic graph (DAG) $G$ and a cryptog…
▽ More
The classical (parallel) black pebbling game is a useful abstraction which allows us to analyze the resources (space, space-time, cumulative space) necessary to evaluate a function $f$ with a static data-dependency graph $G$. Of particular interest in the field of cryptography are data-independent memory-hard functions $f_{G,H}$ which are defined by a directed acyclic graph (DAG) $G$ and a cryptographic hash function $H$. The pebbling complexity of the graph $G$ characterizes the amortized cost of evaluating $f_{G,H}$ multiple times as well as the total cost to run a brute-force preimage attack over a fixed domain $\mathcal{X}$, i.e., given $y \in \{0,1\}^*$ find $x \in \mathcal{X}$ such that $f_{G,H}(x)=y$. While a classical attacker will need to evaluate the function $f_{G,H}$ at least $m=|\mathcal{X}|$ times a quantum attacker running Grover's algorithm only requires $\mathcal{O}(\sqrt{m})$ blackbox calls to a quantum circuit $C_{G,H}$ evaluating the function $f_{G,H}$. Thus, to analyze the cost of a quantum attack it is crucial to understand the space-time cost (equivalently width times depth) of the quantum circuit $C_{G,H}$. We first observe that a legal black pebbling strategy for the graph $G$ does not necessarily imply the existence of a quantum circuit with comparable complexity -- in contrast to the classical setting where any efficient pebbling strategy for $G$ corresponds to an algorithm with comparable complexity evaluating $f_{G,H}$. Motivated by this observation we introduce a new parallel reversible pebbling game which captures additional restrictions imposed by the No-Deletion Theorem in Quantum Computing. We apply our new reversible pebbling game to analyze the reversible space-time complexity of several important graphs: Line Graphs, Argon2i-A, Argon2i-B, and DRSample. (See the paper for the full abstract.)
△ Less
Submitted 11 October, 2022; v1 submitted 8 October, 2021;
originally announced October 2021.
-
On Explicit Constructions of Extremely Depth Robust Graphs
Authors:
Jeremiah Blocki,
Mike Cinkoske,
Seunghoon Lee,
** Young Son
Abstract:
A directed acyclic graph $G=(V,E)$ is said to be $(e,d)$-depth robust if for every subset $S \subseteq V$ of $|S| \leq e$ nodes the graph $G-S$ still contains a directed path of length $d$. If the graph is $(e,d)$-depth-robust for any $e,d$ such that $e+d \leq (1-ε)|V|$ then the graph is said to be $ε$-extreme depth-robust. In the field of cryptography, (extremely) depth-robust graphs with low ind…
▽ More
A directed acyclic graph $G=(V,E)$ is said to be $(e,d)$-depth robust if for every subset $S \subseteq V$ of $|S| \leq e$ nodes the graph $G-S$ still contains a directed path of length $d$. If the graph is $(e,d)$-depth-robust for any $e,d$ such that $e+d \leq (1-ε)|V|$ then the graph is said to be $ε$-extreme depth-robust. In the field of cryptography, (extremely) depth-robust graphs with low indegree have found numerous applications including the design of side-channel resistant Memory-Hard Functions, Proofs of Space and Replication, and in the design of Computationally Relaxed Locally Correctable Codes. In these applications, it is desirable to ensure the graphs are locally navigable, i.e., there is an efficient algorithm $\mathsf{GetParents}$ running in time $\mathrm{polylog} |V|$ which takes as input a node $v \in V$ and returns the set of $v$'s parents. We give the first explicit construction of locally navigable $ε$-extreme depth-robust graphs with indegree $O(\log |V|)$. Previous constructions of $ε$-extreme depth-robust graphs either had indegree $\tildeω(\log^2 |V|)$ or were not explicit.
△ Less
Submitted 22 March, 2022; v1 submitted 8 October, 2021;
originally announced October 2021.
-
Towards a Rigorous Statistical Analysis of Empirical Password Datasets
Authors:
Jeremiah Blocki,
Peiyuan Liu
Abstract:
A central challenge in password security is to characterize the attacker's guessing curve i.e., what is the probability that the attacker will crack a random user's password within the first $G$ guesses. A key challenge is that the guessing curve depends on the attacker's guessing strategy and the distribution of user passwords both of which are unknown to us. In this work we aim to follow Kerckho…
▽ More
A central challenge in password security is to characterize the attacker's guessing curve i.e., what is the probability that the attacker will crack a random user's password within the first $G$ guesses. A key challenge is that the guessing curve depends on the attacker's guessing strategy and the distribution of user passwords both of which are unknown to us. In this work we aim to follow Kerckhoffs' principle and analyze the performance of an optimal attacker who knows the password distribution. Let $λ_G$ denote the probability that such an attacker can crack a random user's password within $G$ guesses. We develop several statistically rigorous techniques to upper and lower bound $λ_G$ given $N$ independent samples from the unknown distribution. We show that our bounds hold with high confidence and apply our techniques to analyze eight password datasets. Our empirical analysis shows that even state-of-the-art password cracking models are often significantly less guess efficient than an attacker who can optimize its attack based on its (partial) knowledge of the password distribution. We also apply our techniques to re-examine the empirical password distribution and Zipf's Law. We find that the empirical distribution closely matches our bounds on $λ_G$ when $G$ is not too large i.e., $G \ll N$. However, for larger values of $G$ our empirical analysis rigorously demonstrates that the empirical distribution (resp. Zipf's Law) overestimates the attacker's success rate. We apply our techniques to upper/lower bound the effectiveness of password throttling mechanisms (key-stretching) which are used to reduce the number of attacker guesses $G$. Finally, if we make an additional assumption about the way users respond to password restrictions, we can use our techniques to evaluate the effectiveness of password composition policies which restrict the passwords users may select.
△ Less
Submitted 20 September, 2022; v1 submitted 28 May, 2021;
originally announced May 2021.
-
Private and Resource-Bounded Locally Decodable Codes for Insertions and Deletions
Authors:
Alexander R. Block,
Jeremiah Blocki
Abstract:
We construct locally decodable codes (LDCs) to correct insertion-deletion errors in the setting where the sender and receiver share a secret key or where the channel is resource-bounded. Our constructions rely on a so-called "Hamming-to-InsDel" compiler (Ostrovsky and Paskin-Cherniavsky, ITS '15 & Block et al., FSTTCS '20), which compiles any locally decodable Hamming code into a locally decodable…
▽ More
We construct locally decodable codes (LDCs) to correct insertion-deletion errors in the setting where the sender and receiver share a secret key or where the channel is resource-bounded. Our constructions rely on a so-called "Hamming-to-InsDel" compiler (Ostrovsky and Paskin-Cherniavsky, ITS '15 & Block et al., FSTTCS '20), which compiles any locally decodable Hamming code into a locally decodable code resilient to insertion-deletion (InsDel) errors. While the compilers were designed for the classical coding setting, we show that the compilers still work in a secret key or resource-bounded setting. Applying our results to the private key Hamming LDC of Ostrovsky, Pandey, and Sahai (ICALP '07), we obtain a private key InsDel LDC with constant rate and polylogarithmic locality. Applying our results to the construction of Blocki, Kulkarni, and Zhou (ITC '20), we obtain similar results for resource-bounded channels; i.e., a channel where computation is constrained by resources such as space or time.
△ Less
Submitted 21 September, 2021; v1 submitted 25 March, 2021;
originally announced March 2021.
-
DAHash: Distribution Aware Tuning of Password Hashing Costs
Authors:
Wenjie Bai,
Jeremiah Blocki
Abstract:
An attacker who breaks into an authentication server and steals all of the cryptographic password hashes is able to mount an offline-brute force attack against each user's password. Offline brute-force attacks against passwords are increasingly commonplace and the danger is amplified by the well documented human tendency to select low-entropy password and/or reuse these passwords across multiple a…
▽ More
An attacker who breaks into an authentication server and steals all of the cryptographic password hashes is able to mount an offline-brute force attack against each user's password. Offline brute-force attacks against passwords are increasingly commonplace and the danger is amplified by the well documented human tendency to select low-entropy password and/or reuse these passwords across multiple accounts. Moderately hard password hashing functions are often deployed to help protect passwords against offline attacks by increasing the attacker's guessing cost. However, there is a limit to how "hard" one can make the password hash function as authentication servers are resource constrained and must avoid introducing substantial authentication delay. Observing that there is a wide gap in the strength of passwords selected by different users we introduce DAHash (Distribution Aware Password Hashing) a novel mechanism which reduces the number of passwords that an attacker will crack. Our key insight ishat a resource-constrained authentication server can dynamically tune the hardness parameters of a password hash function based on the (estimated) strength of the user's password. We introduce a Stackelberg game to model the interaction between a defender (authentication server) and an offline attacker. Our model allows the defender to optimize the parameters of DAHash e.g., specify how much effort is spent to hash weak/moderate/high strength passwords. We use several large scale password frequency datasets to empirically evaluate the effectiveness of our differentiated cost password hashing mechanism. We find that the defender who uses our mechanism can reduce the fraction of passwords that would be cracked by a rational offline attacker by around 15%.
△ Less
Submitted 30 January, 2021; v1 submitted 25 January, 2021;
originally announced January 2021.
-
Locally Decodable/Correctable Codes for Insertions and Deletions
Authors:
Alexander R. Block,
Jeremiah Blocki,
Elena Grigorescu,
Shubhang Kulkarni,
Minshen Zhu
Abstract:
Recent efforts in coding theory have focused on building codes for insertions and deletions, called insdel codes, with optimal trade-offs between their redundancy and their error-correction capabilities, as well as efficient encoding and decoding algorithms.
In many applications, polynomial running time may still be prohibitively expensive, which has motivated the study of codes with super-effic…
▽ More
Recent efforts in coding theory have focused on building codes for insertions and deletions, called insdel codes, with optimal trade-offs between their redundancy and their error-correction capabilities, as well as efficient encoding and decoding algorithms.
In many applications, polynomial running time may still be prohibitively expensive, which has motivated the study of codes with super-efficient decoding algorithms. These have led to the well-studied notions of Locally Decodable Codes (LDCs) and Locally Correctable Codes (LCCs). Inspired by these notions, Ostrovsky and Paskin-Cherniavsky (Information Theoretic Security, 2015) generalized Hamming LDCs to insertions and deletions. To the best of our knowledge, these are the only known results that study the analogues of Hamming LDCs in channels performing insertions and deletions.
Here we continue the study of insdel codes that admit local algorithms. Specifically, we reprove the results of Ostrovsky and Paskin-Cherniavsky for insdel LDCs using a different set of techniques. We also observe that the techniques extend to constructions of LCCs. Specifically, we obtain insdel LDCs and LCCs from their Hamming LDCs and LCCs analogues, respectively. The rate and error-correction capability blow up only by a constant factor, while the query complexity blows up by a poly log factor in the block length.
Since insdel locally decodable/correctble codes are scarcely studied in the literature, we believe our results and techniques may lead to further research. In particular, we conjecture that constant-query insdel LDCs/LCCs do not exist.
△ Less
Submitted 6 December, 2020; v1 submitted 22 October, 2020;
originally announced October 2020.
-
Password Strength Signaling: A Counter-Intuitive Defense Against Password Cracking
Authors:
Wenjie Bai,
Jeremiah Blocki,
Ben Harsha
Abstract:
We introduce password strength information signaling as a novel, yet counter-intuitive, defense mechanism against password cracking attacks. Recent breaches have exposed billions of user passwords to the dangerous threat of offline password cracking attacks. An offline attacker can quickly check millions (or sometimes billions/trillions) of password guesses by comparing their hash value with the s…
▽ More
We introduce password strength information signaling as a novel, yet counter-intuitive, defense mechanism against password cracking attacks. Recent breaches have exposed billions of user passwords to the dangerous threat of offline password cracking attacks. An offline attacker can quickly check millions (or sometimes billions/trillions) of password guesses by comparing their hash value with the stolen hash from a breached authentication server. The attacker is limited only by the resources he is willing to invest. Our key idea is to have the authentication server store a (noisy) signal about the strength of each user password for an offline attacker to find. Surprisingly, we show that the noise distribution for the signal can often be tuned so that a rational (profit-maximizing) attacker will crack fewer passwords. The signaling scheme exploits the fact that password cracking is not a zero-sum game i.e., the attacker's profit is given by the value of the cracked passwords minus the total guessing cost. Thus, a well-defined signaling strategy will encourage the attacker to reduce his guessing costs by cracking fewer passwords. We use an evolutionary algorithm to compute the optimal signaling scheme for the defender. As a proof-of-concept, we evaluate our mechanism on several password datasets and show that it can reduce the total number of cracked passwords by up to $12\%$ (resp. $5\%$) of all users in defending against offline (resp. online) attacks.
△ Less
Submitted 16 August, 2021; v1 submitted 21 September, 2020;
originally announced September 2020.
-
HACCLE: Metaprogramming for Secure Multi-Party Computation -- Extended Version
Authors:
Yuyan Bao,
Kirshanthan Sundararajah,
Raghav Malik,
Qianchuan Ye,
Christopher Wagner,
Nouraldin Jaber,
Fei Wang,
Mohammad Hassan Ameri,
Donghang Lu,
Alexander Seto,
Benjamin Delaware,
Roopsha Samanta,
Aniket Kate,
Christina Garman,
Jeremiah Blocki,
Pierre-David Letourneau,
Benoit Meister,
Jonathan Springer,
Tiark Rompf,
Milind Kulkarni
Abstract:
Cryptographic techniques have the potential to enable distrusting parties to collaborate in fundamentally new ways, but their practical implementation poses numerous challenges. An important class of such cryptographic techniques is known as Secure Multi-Party Computation (MPC). Develo** Secure MPC applications in realistic scenarios requires extensive knowledge spanning multiple areas of crypto…
▽ More
Cryptographic techniques have the potential to enable distrusting parties to collaborate in fundamentally new ways, but their practical implementation poses numerous challenges. An important class of such cryptographic techniques is known as Secure Multi-Party Computation (MPC). Develo** Secure MPC applications in realistic scenarios requires extensive knowledge spanning multiple areas of cryptography and systems. And while the steps to arrive at a solution for a particular application are often straightforward, it remains difficult to make the implementation efficient, and tedious to apply those same steps to a slightly different application from scratch. Hence, it is an important problem to design platforms for implementing Secure MPC applications with minimum effort and using techniques accessible to non-experts in cryptography. In this paper, we present the HACCLE (High Assurance Compositional Cryptography: Languages and Environments) toolchain, specifically targeted to MPC applications. HACCLE contains an embedded domain-specific language Harpoon, for software developers without cryptographic expertise to write MPC-based programs, and uses Lightweight Modular Staging (LMS) for code generation. Harpoon programs are compiled into acyclic circuits represented in HACCLE's Intermediate Representation (HIR) that serves as an abstraction over different cryptographic protocols such as secret sharing, homomorphic encryption, or garbled circuits. Implementations of different cryptographic protocols serve as different backends of our toolchain. The extensible design of HIR allows cryptographic experts to plug in new primitives and protocols to realize computation. And the use of standard metaprogramming techniques lowers the development effort significantly.
△ Less
Submitted 30 September, 2021; v1 submitted 3 September, 2020;
originally announced September 2020.
-
On the Security of Proofs of Sequential Work in a Post-Quantum World
Authors:
Jeremiah Blocki,
Seunghoon Lee,
Samson Zhou
Abstract:
A Proof of Sequential Work (PoSW) allows a prover to convince a resource-bounded verifier that the prover invested a substantial amount of sequential time to perform some underlying computation. PoSWs have many applications including time-stam**, blockchain design, and universally verifiable CPU benchmarks. Mahmoody, Moran, and Vadhan (ITCS 2013) gave the first construction of a PoSW in the rand…
▽ More
A Proof of Sequential Work (PoSW) allows a prover to convince a resource-bounded verifier that the prover invested a substantial amount of sequential time to perform some underlying computation. PoSWs have many applications including time-stam**, blockchain design, and universally verifiable CPU benchmarks. Mahmoody, Moran, and Vadhan (ITCS 2013) gave the first construction of a PoSW in the random oracle model though the construction relied on expensive depth-robust graphs. In a recent breakthrough, Cohen and Pietrzak (EUROCRYPT 2018) gave an efficient PoSW construction that does not require expensive depth-robust graphs.
In the classical parallel random oracle model, it is straightforward to argue that any successful PoSW attacker must produce a long $\mathcal{H}$-sequence and that any malicious party running in sequential time $T-1$ will fail to produce an $\mathcal{H}$-sequence of length $T$ except with negligible probability. In this paper, we prove that any quantum attacker running in sequential time $T-1$ will fail to produce an $\mathcal{H}$-sequence except with negligible probability -- even if the attacker submits a large batch of quantum queries in each round. The proof is substantially more challenging and highlights the power of Zhandry's recent compressed oracle technique (CRYPTO 2019). We further extend this result to establish post-quantum security of a non-interactive PoSW obtained by applying the Fiat-Shamir transform to Cohen and Pietrzak's efficient construction (EUROCRYPT 2018).
△ Less
Submitted 18 May, 2021; v1 submitted 19 June, 2020;
originally announced June 2020.
-
On the Economics of Offline Password Cracking
Authors:
Jeremiah Blocki,
Ben Harsha,
Samson Zhou
Abstract:
We develop an economic model of an offline password cracker which allows us to make quantitative predictions about the fraction of accounts that a rational password attacker would crack in the event of an authentication server breach. We apply our economic model to analyze recent massive password breaches at Yahoo!, Dropbox, LastPass and AshleyMadison. All four organizations were using key-stretch…
▽ More
We develop an economic model of an offline password cracker which allows us to make quantitative predictions about the fraction of accounts that a rational password attacker would crack in the event of an authentication server breach. We apply our economic model to analyze recent massive password breaches at Yahoo!, Dropbox, LastPass and AshleyMadison. All four organizations were using key-stretching to protect user passwords. In fact, LastPass' use of PBKDF2-SHA256 with $10^5$ hash iterations exceeds 2017 NIST minimum recommendation by an order of magnitude. Nevertheless, our analysis paints a bleak picture: the adopted key-stretching levels provide insufficient protection for user passwords. In particular, we present strong evidence that most user passwords follow a Zipf's law distribution, and characterize the behavior of a rational attacker when user passwords are selected from a Zipf's law distribution. We show that there is a finite threshold which depends on the Zipf's law parameters that characterizes the behavior of a rational attacker -- if the value of a cracked password (normalized by the cost of computing the password hash function) exceeds this threshold then the adversary's optimal strategy is always to continue attacking until each user password has been cracked. In all cases (Yahoo!, Dropbox, LastPass and AshleyMadison) we find that the value of a cracked password almost certainly exceeds this threshold meaning that a rational attacker would crack all passwords that are selected from the Zipf's law distribution (i.e., most user passwords). This prediction holds even if we incorporate an aggressive model of diminishing returns for the attacker (e.g., the total value of $500$ million cracked passwords is less than $100$ times the total value of $5$ million passwords).
See paper for full abstract.
△ Less
Submitted 8 June, 2020;
originally announced June 2020.
-
DALock: Distribution Aware Password Throttling
Authors:
Jeremiah Blocki,
Wuwei Zhang
Abstract:
Large-scale online password guessing attacks are wide-spread and continuously qualified as one of the top cyber-security risks. The common method for mitigating the risk of online cracking is to lock out the user after a fixed number ($K$) of consecutive incorrect login attempts. Selecting the value of $K$ induces a classic security-usability trade-off. When $K$ is too large a hacker can (quickly)…
▽ More
Large-scale online password guessing attacks are wide-spread and continuously qualified as one of the top cyber-security risks. The common method for mitigating the risk of online cracking is to lock out the user after a fixed number ($K$) of consecutive incorrect login attempts. Selecting the value of $K$ induces a classic security-usability trade-off. When $K$ is too large a hacker can (quickly) break into a significant fraction of user accounts, but when $K$ is too low we will start to annoy honest users by locking them out after a few mistakes. Motivated by the observation that honest user mistakes typically look quite different than the password guesses of an online attacker, we introduce DALock a {\em distribution aware} password lockout mechanism to reduce user annoyance while minimizing user risk. As the name suggests, DALock is designed to be aware of the frequency and popularity of the password used for login attacks while standard throttling mechanisms (e.g., $K$-strikes) are oblivious to the password distribution. In particular, DALock maintains an extra "hit count" in addition to "strike count" for each user which is based on (estimates of) the cumulative probability of {\em all} login attempts for that particular account. We empirically evaluate DALock with an extensive battery of simulations using real world password datasets. In comparison with the traditional $K$-strikes mechanism we find that DALock offers a superior security/usability trade-off. For example, in one of our simulations we are able to reduce the success rate of an attacker to $0.05\%$ (compared to $1\%$ for the $10$-strikes mechanism) whilst simultaneously reducing the unwanted lockout rate for accounts that are not under attack to just $0.08\%$ (compared to $4\%$ for the $3$-strikes mechanism).
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
An Economic Model for Quantum Key-Recovery Attacks against Ideal Ciphers
Authors:
Benjamin Harsha,
Jeremiah Blocki
Abstract:
It has been established that quantum algorithms can solve several key cryptographic problems more efficiently than classical computers. As progress continues in the field of quantum computing it is important to understand the risks they pose to deployed cryptographic systems. Here we focus on one of these risks - quantum key-recovery attacks against ideal ciphers. Specifically, we seek to model th…
▽ More
It has been established that quantum algorithms can solve several key cryptographic problems more efficiently than classical computers. As progress continues in the field of quantum computing it is important to understand the risks they pose to deployed cryptographic systems. Here we focus on one of these risks - quantum key-recovery attacks against ideal ciphers. Specifically, we seek to model the risk posed by an economically motivated quantum attacker who will choose to run a quantum key-recovery attack against an ideal cipher if the cost to recover the secret key is less than the value of the information at the time when the key-recovery attack is complete. In our analysis we introduce the concept of a quantum cipher circuit year to measure the cost of a quantum attack. This concept can be used to model the inherent tradeoff between the total time to run a quantum key recovery attack and the total work required to run said attack. Our model incorporates the time value of the encrypted information to predict whether any time/work tradeoff results in a key-recovery attack with positive utility for the attacker. We make these predictions under various projections of advances in quantum computing. We use these predictions to make recommendations for the future use and deployment of symmetric key ciphers to secure information against these quantum key-recovery attacks. We argue that, even with optimistic predictions for advances in quantum computing, 128 bit keys (as used in common cipher implementations like AES-128) provide adequate security against quantum attacks in almost all use cases.
△ Less
Submitted 12 May, 2020;
originally announced May 2020.
-
Bicycle Attacks Considered Harmful: Quantifying the Damage of Widespread Password Length Leakage
Authors:
Benjamin Harsha,
Robert Morton,
Jeremiah Blocki,
John Springer,
Melissa Dark
Abstract:
We examine the issue of password length leakage via encrypted traffic i.e., bicycle attacks. We aim to quantify both the prevalence of password length leakage bugs as well as the potential harm to users. In an observational study, we find that {\em most} of the Alexa top 100 rates sites are vulnerable to bicycle attacks meaning that an eavesdrop** attacker can infer the exact length of a passwor…
▽ More
We examine the issue of password length leakage via encrypted traffic i.e., bicycle attacks. We aim to quantify both the prevalence of password length leakage bugs as well as the potential harm to users. In an observational study, we find that {\em most} of the Alexa top 100 rates sites are vulnerable to bicycle attacks meaning that an eavesdrop** attacker can infer the exact length of a password based on the length the encrypted packet containing the password. We discuss several ways in which an eavesdrop** attacker could link this password length with a particular user account e.g., a targeted campaign against a smaller group of users or via DNS hijacking for larger scale campaigns. We next use a decision-theoretic model to quantify the extent to which password length leakage might help an attacker to crack user passwords. In our analysis, we consider three different levels of password attackers: hacker, criminal and nation-state. In all cases, we find that such an attacker who knows the length of each user password gains a significant advantage over one without knowing the password length. As part of this analysis, we also release a new differentially private password frequency dataset from the 2016 LinkedIn breach using a differentially private algorithm of Blocki et al. (NDSS 2016) to protect user accounts. The LinkedIn frequency corpus is based on over 170 million passwords making it the largest frequency corpus publicly available to password researchers. While the defense against bicycle attacks is straightforward (i.e., ensure that passwords are always padded before encryption), we discuss several practical challenges organizations may face when attempting to patch this vulnerability. We advocate for a new W3C standard on how password fields are handled which would effectively eliminate most instances of password length leakage.
△ Less
Submitted 4 February, 2020;
originally announced February 2020.
-
Computationally Data-Independent Memory Hard Functions
Authors:
Mohammad Hassan Ameri,
Jeremiah Blocki,
Samson Zhou
Abstract:
Memory hard functions (MHFs) are an important cryptographic primitive that are used to design egalitarian proofs of work and in the construction of moderately expensive key-derivation functions resistant to brute-force attacks. Broadly speaking, MHFs can be divided into two categories: data-dependent memory hard functions (dMHFs) and data-independent memory hard functions (iMHFs). iMHFs are resist…
▽ More
Memory hard functions (MHFs) are an important cryptographic primitive that are used to design egalitarian proofs of work and in the construction of moderately expensive key-derivation functions resistant to brute-force attacks. Broadly speaking, MHFs can be divided into two categories: data-dependent memory hard functions (dMHFs) and data-independent memory hard functions (iMHFs). iMHFs are resistant to certain side-channel attacks as the memory access pattern induced by the honest evaluation algorithm is independent of the potentially sensitive input e.g., password. While dMHFs are potentially vulnerable to side-channel attacks (the induced memory access pattern might leak useful information to a brute-force attacker), they can achieve higher cumulative memory complexity (CMC) in comparison than an iMHF. In this paper, we introduce the notion of computationally data-independent memory hard functions (ciMHFs). Intuitively, we require that memory access pattern induced by the (randomized) ciMHF evaluation algorithm appears to be independent from the standpoint of a computationally bounded eavesdrop** attacker --- even if the attacker selects the initial input. We then ask whether it is possible to circumvent known upper bound for iMHFs and build a ciMHF with CMC $Ω(N^2)$. Surprisingly, we answer the question in the affirmative when the ciMHF evaluation algorithm is executed on a two-tiered memory architecture (RAM/Cache).
See paper for the full abstract.
△ Less
Submitted 15 November, 2019;
originally announced November 2019.
-
A New Connection Between Node and Edge Depth Robust Graphs
Authors:
Jeremiah Blocki,
Mike Cinkoske
Abstract:
Given a directed acyclic graph (DAG) $G = (V,E)$, we say that $G$ is $(e,d)$-depth-robust (resp. $(e,d)$-edge-depth-robust) if for any set $S \subset V$ (resp. $S \subseteq E$) of at most $|S| \leq e$ nodes (resp. edges) the graph $G-S$ contains a directed path of length $d$. While edge-depth-robust graphs are potentially easier to construct many applications in cryptography require node depth-rob…
▽ More
Given a directed acyclic graph (DAG) $G = (V,E)$, we say that $G$ is $(e,d)$-depth-robust (resp. $(e,d)$-edge-depth-robust) if for any set $S \subset V$ (resp. $S \subseteq E$) of at most $|S| \leq e$ nodes (resp. edges) the graph $G-S$ contains a directed path of length $d$. While edge-depth-robust graphs are potentially easier to construct many applications in cryptography require node depth-robust graphs with small indegree. We create a graph reduction that transforms an $(e, d)$-edge-depth-robust graph with $m$ edges into a $(e/2,d)$-depth-robust graph with $O(m)$ nodes and constant indegree. One immediate consequence of this result is the first construction of a provably $(\frac{n \log \log n}{\log n}, \frac{n}{(\log n)^{1 + \log \log n}})$-depth-robust graph with constant indegree, where previous constructions for $e =\frac{n \log \log n}{\log n}$ had $d = O(n^{1-ε})$. Our reduction crucially relies on ST-Robust graphs, a new graph property we introduce which may be of independent interest. We say that a directed, acyclic graph with $n$ inputs and $n$ outputs is $(k_1, k_2)$-ST-Robust if we can remove any $k_1$ nodes and there exists a subgraph containing at least $k_2$ inputs and $k_2$ outputs such that each of the $k_2$ inputs is connected to all of the $k_2$ outputs. If the graph if $(k_1,n-k_1)$-ST-Robust for all $k_1 \leq n$ we say that the graph is maximally ST-robust. We show how to construct maximally ST-robust graphs with constant indegree and $O(n)$ nodes. Given a family $ \mathbb{M}$ of ST-robust graphs and an arbitrary $(e, d)$-edge-depth-robust graph $G$ we construct a new constant-indegree graph $ \mathrm{Reduce}(G, \mathbb{M})$ by replacing each node in $G$ with an ST-robust graph from $ \mathbb{M}$. We also show that ST-robust graphs can be used to construct (tight) proofs-of-space and (asymptotically) improved wide-block labeling functions.
△ Less
Submitted 10 December, 2020; v1 submitted 20 October, 2019;
originally announced October 2019.
-
On Locally Decodable Codes in Resource Bounded Channels
Authors:
Jeremiah Blocki,
Shubhang Kulkarni,
Samson Zhou
Abstract:
Constructions of locally decodable codes (LDCs) have one of two undesirable properties: low rate or high locality (polynomial in the length of the message). In settings where the encoder/decoder have already exchanged cryptographic keys and the channel is a probabilistic polynomial time (PPT) algorithm, it is possible to circumvent these barriers and design LDCs with constant rate and small locali…
▽ More
Constructions of locally decodable codes (LDCs) have one of two undesirable properties: low rate or high locality (polynomial in the length of the message). In settings where the encoder/decoder have already exchanged cryptographic keys and the channel is a probabilistic polynomial time (PPT) algorithm, it is possible to circumvent these barriers and design LDCs with constant rate and small locality. However, the assumption that the encoder/decoder have exchanged cryptographic keys is often prohibitive. We thus consider the problem of designing explicit and efficient LDCs in settings where the channel is slightly more constrained than the encoder/decoder with respect to some resource e.g., space or (sequential) time. Given an explicit function $f$ that the channel cannot compute, we show how the encoder can transmit a random secret key to the local decoder using $f(\cdot)$ and a random oracle $H(\cdot)$. This allows bootstrap from the private key LDC construction of Ostrovsky, Pandey and Sahai (ICALP, 2007), thereby answering an open question posed by Guruswami and Smith (FOCS 2010) of whether such bootstrap** techniques may apply to LDCs in weaker channel models than just PPT algorithms. Specifically, in the random oracle model we show how to construct explicit constant rate LDCs with locality of polylog in the security parameter against various resource constrained channels.
△ Less
Submitted 4 June, 2020; v1 submitted 24 September, 2019;
originally announced September 2019.
-
Approximating Cumulative Pebbling Cost is Unique Games Hard
Authors:
Jeremiah Blocki,
Seunghoon Lee,
Samson Zhou
Abstract:
The cumulative pebbling complexity of a directed acyclic graph $G$ is defined as $\mathsf{cc}(G) = \min_P \sum_i |P_i|$, where the minimum is taken over all legal (parallel) black pebblings of $G$ and $|P_i|$ denotes the number of pebbles on the graph during round $i$. Intuitively, $\mathsf{cc}(G)$ captures the amortized Space-Time complexity of pebbling $m$ copies of $G$ in parallel. The cumulati…
▽ More
The cumulative pebbling complexity of a directed acyclic graph $G$ is defined as $\mathsf{cc}(G) = \min_P \sum_i |P_i|$, where the minimum is taken over all legal (parallel) black pebblings of $G$ and $|P_i|$ denotes the number of pebbles on the graph during round $i$. Intuitively, $\mathsf{cc}(G)$ captures the amortized Space-Time complexity of pebbling $m$ copies of $G$ in parallel. The cumulative pebbling complexity of a graph $G$ is of particular interest in the field of cryptography as $\mathsf{cc}(G)$ is tightly related to the amortized Area-Time complexity of the Data-Independent Memory-Hard Function (iMHF) $f_{G,H}$ [AS15] defined using a constant indegree directed acyclic graph (DAG) $G$ and a random oracle $H(\cdot)$. A secure iMHF should have amortized Space-Time complexity as high as possible, e.g., to deter brute-force password attacker who wants to find $x$ such that $f_{G,H}(x) = h$. Thus, to analyze the (in)security of a candidate iMHF $f_{G,H}$, it is crucial to estimate the value $\mathsf{cc}(G)$ but currently, upper and lower bounds for leading iMHF candidates differ by several orders of magnitude. Blocki and Zhou recently showed that it is $\mathsf{NP}$-Hard to compute $\mathsf{cc}(G)$, but their techniques do not even rule out an efficient $(1+\varepsilon)$-approximation algorithm for any constant $\varepsilon>0$. We show that for any constant $c > 0$, it is Unique Games hard to approximate $\mathsf{cc}(G)$ to within a factor of $c$.
(See the paper for the full abstract.)
△ Less
Submitted 15 November, 2019; v1 submitted 17 April, 2019;
originally announced April 2019.
-
Relaxed Locally Correctable Codes in Computationally Bounded Channels
Authors:
Jeremiah Blocki,
Venkata Gandikota,
Elena Grigorescu,
Samson Zhou
Abstract:
Error-correcting codes that admit local decoding and correcting algorithms have been the focus of much recent research due to their numerous theoretical and practical applications. An important goal is to obtain the best possible tradeoffs between the number of queries the algorithm makes to its oracle (the locality of the task), and the amount of redundancy in the encoding (the information rate).…
▽ More
Error-correcting codes that admit local decoding and correcting algorithms have been the focus of much recent research due to their numerous theoretical and practical applications. An important goal is to obtain the best possible tradeoffs between the number of queries the algorithm makes to its oracle (the locality of the task), and the amount of redundancy in the encoding (the information rate).
In Hamming's classical adversarial channel model, the current tradeoffs are dramatic, allowing either small locality, but superpolynomial blocklength, or small blocklength, but high locality. However, in the computationally bounded, adversarial channel model, proposed by Lipton (STACS 1994), constructions of locally decodable codes suddenly exhibit small locality and small blocklength, but these constructions require strong trusted setup assumptions e.g., Ostrovsky, Pandey and Sahai (ICALP 2007) construct private locally decodable codes in the setting where the sender and receiver already share a symmetric key.
We study variants of locally decodable and locally correctable codes in computationally bounded, adversarial channels, in a setting with no public-key or private-key cryptographic setup. The only setup assumption we require is the selection of the public parameters (seed) for a collision-resistant hash function. Specifically, we provide constructions of relaxed locally correctable and relaxed locally decodable codes over the binary alphabet, with constant information rate, and poly-logarithmic locality.
Our constructions, which compare favorably with their classical analogues in the computationally unbounded Hamming channel, crucially employ collision-resistant hash functions and local expander graphs, extending ideas from recent cryptographic constructions of memory-hard functions.
△ Less
Submitted 17 September, 2018; v1 submitted 15 March, 2018;
originally announced March 2018.
-
Sustained Space Complexity
Authors:
Joel Alwen,
Jeremiah Blocki,
Krzysztof Pietrzak
Abstract:
Memory-hard functions (MHF) are functions whose evaluation cost is dominated by memory cost. MHFs are egalitarian, in the sense that evaluating them on dedicated hardware (like FPGAs or ASICs) is not much cheaper than on off-the-shelf hardware (like x86 CPUs). MHFs have interesting cryptographic applications, most notably to password hashing and securing blockchains.
Alwen and Serbinenko [STOC'1…
▽ More
Memory-hard functions (MHF) are functions whose evaluation cost is dominated by memory cost. MHFs are egalitarian, in the sense that evaluating them on dedicated hardware (like FPGAs or ASICs) is not much cheaper than on off-the-shelf hardware (like x86 CPUs). MHFs have interesting cryptographic applications, most notably to password hashing and securing blockchains.
Alwen and Serbinenko [STOC'15] define the cumulative memory complexity (cmc) of a function as the sum (over all time-steps) of the amount of memory required to compute the function. They advocate that a good MHF must have high cmc. Unlike previous notions, cmc takes into account that dedicated hardware might exploit amortization and parallelism. Still, cmc has been critizised as insufficient, as it fails to capture possible time-memory trade-offs, as memory cost doesn't scale linearly, functions with the same cmc could still have very different actual hardware cost.
In this work we address this problem, and introduce the notion of sustained-memory complexity, which requires that any algorithm evaluating the function must use a large amount of memory for many steps. We construct functions (in the parallel random oracle model) whose sustained-memory complexity is almost optimal: our function can be evaluated using $n$ steps and $O(n/\log(n))$ memory, in each step making one query to the (fixed-input length) random oracle, while any algorithm that can make arbitrary many parallel queries to the random oracle, still needs $Ω(n/\log(n))$ memory for $Ω(n)$ steps.
Our main technical contribution is the construction is a family of DAGs on $n$ nodes with constant indegree with high "sustained-space complexity", meaning that any parallel black-pebbling strategy requires $Ω(n/\log(n))$ pebbles for at least $Ω(n)$ steps.
△ Less
Submitted 7 July, 2017; v1 submitted 15 May, 2017;
originally announced May 2017.
-
Optimizing Locally Differentially Private Protocols
Authors:
Tianhao Wang,
Jeremiah Blocki,
Ninghui Li,
Somesh Jha
Abstract:
Protocols satisfying Local Differential Privacy (LDP) enable parties to collect aggregate information about a population while protecting each user's privacy, without relying on a trusted third party. LDP protocols (such as Google's RAPPOR) have been deployed in real-world scenarios. In these protocols, a user encodes his private information and perturbs the encoded value locally before sending it…
▽ More
Protocols satisfying Local Differential Privacy (LDP) enable parties to collect aggregate information about a population while protecting each user's privacy, without relying on a trusted third party. LDP protocols (such as Google's RAPPOR) have been deployed in real-world scenarios. In these protocols, a user encodes his private information and perturbs the encoded value locally before sending it to an aggregator, who combines values that users contribute to infer statistics about the population. In this paper, we introduce a framework that generalizes several LDP protocols proposed in the literature. Our framework yields a simple and fast aggregation algorithm, whose accuracy can be precisely analyzed. Our in-depth analysis enables us to choose optimal parameters, resulting in two new protocols (i.e., Optimized Unary Encoding and Optimized Local Hashing) that provide better utility than protocols previously proposed. We present precise conditions for when each proposed protocol should be used, and perform experiments that demonstrate the advantage of our proposed protocols.
△ Less
Submitted 14 May, 2017; v1 submitted 11 May, 2017;
originally announced May 2017.
-
On the Computational Complexity of Minimal Cumulative Cost Graph Pebbling
Authors:
Jeremiah Blocki,
Samson Zhou
Abstract:
We consider the computational complexity of finding a legal black pebbling of a DAG $G=(V,E)$ with minimum cumulative cost. A black pebbling is a sequence $P_0,\ldots, P_t \subseteq V$ of sets of nodes which must satisfy the following properties: $P_0 = \emptyset$ (we start off with no pebbles on $G$), $\mathsf{sinks}(G) \subseteq \bigcup_{j \leq t} P_j$ (every sink node was pebbled at some point)…
▽ More
We consider the computational complexity of finding a legal black pebbling of a DAG $G=(V,E)$ with minimum cumulative cost. A black pebbling is a sequence $P_0,\ldots, P_t \subseteq V$ of sets of nodes which must satisfy the following properties: $P_0 = \emptyset$ (we start off with no pebbles on $G$), $\mathsf{sinks}(G) \subseteq \bigcup_{j \leq t} P_j$ (every sink node was pebbled at some point) and $\mathsf{parents}\big(P_{i+1}\backslash P_i\big) \subseteq P_i$ (we can only place a new pebble on a node $v$ if all of $v$'s parents had a pebble during the last round). The cumulative cost of a pebbling $P_0,P_1,\ldots, P_t \subseteq V$ is $\mathsf{cc}(P) = | P_1| + \ldots + | P_t|$. The cumulative pebbling cost is an especially important security metric for data-independent memory hard functions, an important primitive for password hashing. Thus, an efficient (approximation) algorithm would be an invaluable tool for the cryptanalysis of password hash functions as it would provide an automated tool to establish tight bounds on the amortized space-time cost of computing the function. We show that such a tool is unlikely to exist. In particular, we prove the following results. (1) It is $\texttt{NP}\mbox{-}\texttt{Hard}$ to find a pebbling minimizing cumulative cost. (2) The natural linear program relaxation for the problem has integrality gap $\tilde{O}(n)$, where $n$ is the number of nodes in $G$. We conjecture that the problem is hard to approximate. (3) We show that a related problem, find the minimum size subset $S\subseteq V$ such that $\textsf{depth}(G-S) \leq d$, is also $\texttt{NP}\mbox{-}\texttt{Hard}$. In fact, under the unique games conjecture there is no $(2-ε)$-approximation algorithm.
△ Less
Submitted 21 January, 2018; v1 submitted 14 September, 2016;
originally announced September 2016.
-
Client-CASH: Protecting Master Passwords against Offline Attacks
Authors:
Jeremiah Blocki,
Anirudh Sridhar
Abstract:
Offline attacks on passwords are increasingly commonplace and dangerous. An offline adversary is limited only by the amount of computational resources he or she is willing to invest to crack a user's password. The danger is compounded by the existence of authentication servers who fail to adopt proper password storage practices like key-stretching. Password managers can help mitigate these risks b…
▽ More
Offline attacks on passwords are increasingly commonplace and dangerous. An offline adversary is limited only by the amount of computational resources he or she is willing to invest to crack a user's password. The danger is compounded by the existence of authentication servers who fail to adopt proper password storage practices like key-stretching. Password managers can help mitigate these risks by adopting key stretching procedures like hash iteration or memory hard functions to derive site specific passwords from the user's master password on the client-side. While key stretching can reduce the offline adversary's success rate, these procedures also increase computational costs for a legitimate user. Motivated by the observation that most of the password guesses of the offline adversary will be incorrect, we propose a client side cost asymmetric secure hashing scheme (Client-CASH). Client-CASH randomizes the runtime of client-side key stretching procedure in a way that the expected computational cost of our key derivation function is greater when run with an incorrect master password. We make several contributions. First, we show how to introduce randomness into a client-side key stretching algorithms through the use of halting predicates which are selected randomly at the time of account creation. Second, we formalize the problem of finding the optimal running time distribution subject to certain cost constraints for the client and certain security constrains on the halting predicates. Finally, we demonstrate that Client-CASH can reduce the adversary's success rate by up to $21\%$. These results demonstrate the promise of the Client-CASH mechanism.
△ Less
Submitted 2 March, 2016;
originally announced March 2016.
-
CASH: A Cost Asymmetric Secure Hash Algorithm for Optimal Password Protection
Authors:
Jeremiah Blocki,
Anupam Datta
Abstract:
An adversary who has obtained the cryptographic hash of a user's password can mount an offline attack to crack the password by comparing this hash value with the cryptographic hashes of likely password guesses. This offline attacker is limited only by the resources he is willing to invest to crack the password. Key-stretching tools can help mitigate the threat of offline attacks by making each pas…
▽ More
An adversary who has obtained the cryptographic hash of a user's password can mount an offline attack to crack the password by comparing this hash value with the cryptographic hashes of likely password guesses. This offline attacker is limited only by the resources he is willing to invest to crack the password. Key-stretching tools can help mitigate the threat of offline attacks by making each password guess more expensive for the adversary to verify. However, key-stretching increases authentication costs for a legitimate authentication server. We introduce a novel Stackelberg game model which captures the essential elements of this interaction between a defender and an offline attacker. We then introduce Cost Asymmetric Secure Hash (CASH), a randomized key-stretching mechanism that minimizes the fraction of passwords that would be cracked by a rational offline attacker without increasing amortized authentication costs for the legitimate authentication server. CASH is motivated by the observation that the legitimate authentication server will typically run the authentication procedure to verify a correct password, while an offline adversary will typically use incorrect password guesses. By using randomization we can ensure that the amortized cost of running CASH to verify a correct password guess is significantly smaller than the cost of rejecting an incorrect password. Using our Stackelberg game framework we can quantify the quality of the underlying CASH running time distribution in terms of the fraction of passwords that a rational offline adversary would crack. We provide an efficient algorithm to compute high quality CASH distributions for the defender. Finally, we analyze CASH using empirical data from two large scale password frequency datasets. Our analysis shows that CASH can significantly reduce (up to $50\%$) the fraction of password cracked by a rational offline adversary.
△ Less
Submitted 4 May, 2016; v1 submitted 1 September, 2015;
originally announced September 2015.
-
Spaced Repetition and Mnemonics Enable Recall of Multiple Strong Passwords
Authors:
Jeremiah Blocki,
Saranga Komanduri,
Lorrie Cranor,
Anupam Datta
Abstract:
We report on a user study that provides evidence that spaced repetition and a specific mnemonic technique enable users to successfully recall multiple strong passwords over time. Remote research participants were asked to memorize 4 Person-Action-Object (PAO) stories where they chose a famous person from a drop-down list and were given machine-generated random action-object pairs. Users were also…
▽ More
We report on a user study that provides evidence that spaced repetition and a specific mnemonic technique enable users to successfully recall multiple strong passwords over time. Remote research participants were asked to memorize 4 Person-Action-Object (PAO) stories where they chose a famous person from a drop-down list and were given machine-generated random action-object pairs. Users were also shown a photo of a scene and asked to imagine the PAO story taking place in the scene (e.g., Bill Gates---swallowing---bike on a beach). Subsequently, they were asked to recall the action-object pairs when prompted with the associated scene-person pairs following a spaced repetition schedule over a period of 127+ days. While we evaluated several spaced repetition schedules, the best results were obtained when users initially returned after 12 hours and then in $1.5\times$ increasing intervals: 77% of the participants successfully recalled all 4 stories in 10 tests over a period of 158 days. Much of the forgetting happened in the first test period (12 hours): 89% of participants who remembered their stories during the first test period successfully remembered them in every subsequent round. These findings, coupled with recent results on naturally rehearsing password schemes, suggest that 4 PAO stories could be used to create usable and strong passwords for 14 sensitive accounts following this spaced repetition schedule, possibly with a few extra upfront rehearsals. In addition, we find that there is an interference effect across multiple PAO stories: the recall rate of 100% (resp. 90%) for participants who were asked to memorize 1 PAO story (resp. 2 PAO stories) is significantly better than the recall rate for participants who were asked to memorize 4 PAO stories. These findings yield concrete advice for improving constructions of password management schemes and future user studies.
△ Less
Submitted 23 January, 2020; v1 submitted 6 October, 2014;
originally announced October 2014.
-
Audit Games with Multiple Defender Resources
Authors:
Jeremiah Blocki,
Nicolas Christin,
Anupam Datta,
Ariel Procaccia,
Arunesh Sinha
Abstract:
Modern organizations (e.g., hospitals, social networks, government agencies) rely heavily on audit to detect and punish insiders who inappropriately access and disclose confidential information. Recent work on audit games models the strategic interaction between an auditor with a single audit resource and auditees as a Stackelberg game, augmenting associated well-studied security games with a conf…
▽ More
Modern organizations (e.g., hospitals, social networks, government agencies) rely heavily on audit to detect and punish insiders who inappropriately access and disclose confidential information. Recent work on audit games models the strategic interaction between an auditor with a single audit resource and auditees as a Stackelberg game, augmenting associated well-studied security games with a configurable punishment parameter. We significantly generalize this audit game model to account for multiple audit resources where each resource is restricted to audit a subset of all potential violations, thus enabling application to practical auditing scenarios. We provide an FPTAS that computes an approximately optimal solution to the resulting non-convex optimization problem. The main technical novelty is in the design and correctness proof of an optimization transformation that enables the construction of this FPTAS. In addition, we experimentally demonstrate that this transformation significantly speeds up computation of solutions for a class of audit games and security games.
△ Less
Submitted 1 March, 2015; v1 submitted 16 September, 2014;
originally announced September 2014.
-
Set Families with Low Pairwise Intersection
Authors:
Calvin Beideman,
Jeremiah Blocki
Abstract:
A $\left(n,\ell,γ\right)$-sharing set family of size $m$ is a family of sets $S_1,\ldots,S_m\subseteq [n]$ s.t. each set has size $\ell$ and each pair of sets shares at most $γ$ elements. We let $m\left(n,\ell,γ\right)$ denote the maximum size of any such set family and we consider the following question: How large can $m\left(n,\ell,γ\right)$ be? $\left(n,\ell,γ\right)$-sharing set families have…
▽ More
A $\left(n,\ell,γ\right)$-sharing set family of size $m$ is a family of sets $S_1,\ldots,S_m\subseteq [n]$ s.t. each set has size $\ell$ and each pair of sets shares at most $γ$ elements. We let $m\left(n,\ell,γ\right)$ denote the maximum size of any such set family and we consider the following question: How large can $m\left(n,\ell,γ\right)$ be? $\left(n,\ell,γ\right)$-sharing set families have a rich set of applications including the construction of pseudorandom number generators and usable and secure password management schemes. We analyze the explicit construction of Blocki et al using recent bounds on the value of the $t$'th Ramanujan prime. We show that this explicit construction produces a $\left(4\ell^2\ln 4\ell,\ell,γ\right)$-sharing set family of size $\left(2 \ell \ln 2\ell\right)^{γ+1}$ for any $\ell\geq γ$. We also show that the construction of Blocki et al can be used to obtain a weak $\left(n,\ell,γ\right)$-sharing set family of size $m$ for any $m >0$. These results are competitive with the inexplicit construction of Raz et al for weak $\left(n,\ell,γ\right)$-sharing families. We show that our explicit construction of weak $\left(n,\ell,γ\right)$-sharing set families can be used to obtain a parallelizable pseudorandom number generator with a low memory footprint by using the pseudorandom number generator of Nisan and Wigderson. We also prove that $m\left(n,n/c_1,c_2n\right)$ must be a constant whenever $c_2 \leq \frac{2}{c_1^3+c_1^2}$. We show that this bound is nearly tight as $m\left(n,n/c_1,c_2n\right)$ grows exponentially fast whenever $c_2 > c_1^{-2}$.
△ Less
Submitted 17 April, 2014;
originally announced April 2014.
-
Towards Human Computable Passwords
Authors:
Jeremiah Blocki,
Manuel Blum,
Anupam Datta,
Santosh Vempala
Abstract:
An interesting challenge for the cryptography community is to design authentication protocols that are so simple that a human can execute them without relying on a fully trusted computer. We propose several candidate authentication protocols for a setting in which the human user can only receive assistance from a semi-trusted computer --- a computer that stores information and performs computation…
▽ More
An interesting challenge for the cryptography community is to design authentication protocols that are so simple that a human can execute them without relying on a fully trusted computer. We propose several candidate authentication protocols for a setting in which the human user can only receive assistance from a semi-trusted computer --- a computer that stores information and performs computations correctly but does not provide confidentiality. Our schemes use a semi-trusted computer to store and display public challenges $C_i\in[n]^k$. The human user memorizes a random secret map** $σ:[n]\rightarrow\mathbb{Z}_d$ and authenticates by computing responses $f(σ(C_i))$ to a sequence of public challenges where $f:\mathbb{Z}_d^k\rightarrow\mathbb{Z}_d$ is a function that is easy for the human to evaluate. We prove that any statistical adversary needs to sample $m=\tildeΩ(n^{s(f)})$ challenge-response pairs to recover $σ$, for a security parameter $s(f)$ that depends on two key properties of $f$. To obtain our results, we apply the general hypercontractivity theorem to lower bound the statistical dimension of the distribution over challenge-response pairs induced by $f$ and $σ$. Our lower bounds apply to arbitrary functions $f $ (not just to functions that are easy for a human to evaluate), and generalize recent results of Feldman et al. As an application, we propose a family of human computable password functions $f_{k_1,k_2}$ in which the user needs to perform $2k_1+2k_2+1$ primitive operations (e.g., adding two digits or remembering $σ(i)$), and we show that $s(f) = \min\{k_1+1, (k_2+1)/2\}$. For these schemes, we prove that forging passwords is equivalent to recovering the secret map**. Thus, our human computable password schemes can maintain strong security guarantees even after an adversary has observed the user login to many different accounts.
△ Less
Submitted 9 September, 2016; v1 submitted 31 March, 2014;
originally announced April 2014.
-
GOTCHA Password Hackers!
Authors:
Jeremiah Blocki,
Manuel Blum,
Anupam Datta
Abstract:
We introduce GOTCHAs (Generating panOptic Turing Tests to Tell Computers and Humans Apart) as a way of preventing automated offline dictionary attacks against user selected passwords. A GOTCHA is a randomized puzzle generation protocol, which involves interaction between a computer and a human. Informally, a GOTCHA should satisfy two key properties: (1) The puzzles are easy for the human to solve.…
▽ More
We introduce GOTCHAs (Generating panOptic Turing Tests to Tell Computers and Humans Apart) as a way of preventing automated offline dictionary attacks against user selected passwords. A GOTCHA is a randomized puzzle generation protocol, which involves interaction between a computer and a human. Informally, a GOTCHA should satisfy two key properties: (1) The puzzles are easy for the human to solve. (2) The puzzles are hard for a computer to solve even if it has the random bits used by the computer to generate the final puzzle --- unlike a CAPTCHA. Our main theorem demonstrates that GOTCHAs can be used to mitigate the threat of offline dictionary attacks against passwords by ensuring that a password cracker must receive constant feedback from a human being while mounting an attack. Finally, we provide a candidate construction of GOTCHAs based on Inkblot images. Our construction relies on the usability assumption that users can recognize the phrases that they originally used to describe each Inkblot image --- a much weaker usability assumption than previous password systems based on Inkblots which required users to recall their phrase exactly. We conduct a user study to evaluate the usability of our GOTCHA construction. We also generate a GOTCHA challenge where we encourage artificial intelligence and security researchers to try to crack several passwords protected with our scheme.
△ Less
Submitted 3 October, 2013;
originally announced October 2013.
-
Audit Games
Authors:
Jeremiah Blocki,
Nicolas Christin,
Anupam Datta,
Ariel D. Procaccia,
Arunesh Sinha
Abstract:
Effective enforcement of laws and policies requires expending resources to prevent and detect offenders, as well as appropriate punishment schemes to deter violators. In particular, enforcement of privacy laws and policies in modern organizations that hold large volumes of personal information (e.g., hospitals, banks, and Web services providers) relies heavily on internal audit mechanisms. We stud…
▽ More
Effective enforcement of laws and policies requires expending resources to prevent and detect offenders, as well as appropriate punishment schemes to deter violators. In particular, enforcement of privacy laws and policies in modern organizations that hold large volumes of personal information (e.g., hospitals, banks, and Web services providers) relies heavily on internal audit mechanisms. We study economic considerations in the design of these mechanisms, focusing in particular on effective resource allocation and appropriate punishment schemes. We present an audit game model that is a natural generalization of a standard security game model for resource allocation with an additional punishment parameter. Computing the Stackelberg equilibrium for this game is challenging because it involves solving an optimization problem with non-convex quadratic constraints. We present an additive FPTAS that efficiently computes a solution that is arbitrarily close to the optimal solution.
△ Less
Submitted 5 March, 2013; v1 submitted 2 March, 2013;
originally announced March 2013.
-
Naturally Rehearsing Passwords
Authors:
Jeremiah Blocki,
Manuel Blum,
Anupam Datta
Abstract:
We introduce quantitative usability and security models to guide the design of password management schemes --- systematic strategies to help users create and remember multiple passwords. In the same way that security proofs in cryptography are based on complexity-theoretic assumptions (e.g., hardness of factoring and discrete logarithm), we quantify usability by introducing usability assumptions.…
▽ More
We introduce quantitative usability and security models to guide the design of password management schemes --- systematic strategies to help users create and remember multiple passwords. In the same way that security proofs in cryptography are based on complexity-theoretic assumptions (e.g., hardness of factoring and discrete logarithm), we quantify usability by introducing usability assumptions. In particular, password management relies on assumptions about human memory, e.g., that a user who follows a particular rehearsal schedule will successfully maintain the corresponding memory. These assumptions are informed by research in cognitive science and validated through empirical studies. Given rehearsal requirements and a user's visitation schedule for each account, we use the total number of extra rehearsals that the user would have to do to remember all of his passwords as a measure of the usability of the password scheme. Our usability model leads us to a key observation: password reuse benefits users not only by reducing the number of passwords that the user has to memorize, but more importantly by increasing the natural rehearsal rate for each password. We also present a security model which accounts for the complexity of password management with multiple accounts and associated threats, including online, offline, and plaintext password leak attacks. Observing that current password management schemes are either insecure or unusable, we present Shared Cues--- a new scheme in which the underlying secret is strategically shared across accounts to ensure that most rehearsal requirements are satisfied naturally while simultaneously providing strong security. The construction uses the Chinese Remainder Theorem to achieve these competing goals.
△ Less
Submitted 9 September, 2013; v1 submitted 20 February, 2013;
originally announced February 2013.
-
Optimizing Password Composition Policies
Authors:
Jeremiah Blocki,
Saranga Komanduri,
Ariel Procaccia,
Or Sheffet
Abstract:
A password composition policy restricts the space of allowable passwords to eliminate weak passwords that are vulnerable to statistical guessing attacks. Usability studies have demonstrated that existing password composition policies can sometimes result in weaker password distributions; hence a more principled approach is needed. We introduce the first theoretical model for optimizing password co…
▽ More
A password composition policy restricts the space of allowable passwords to eliminate weak passwords that are vulnerable to statistical guessing attacks. Usability studies have demonstrated that existing password composition policies can sometimes result in weaker password distributions; hence a more principled approach is needed. We introduce the first theoretical model for optimizing password composition policies. We study the computational and sample complexity of this problem under different assumptions on the structure of policies and on users' preferences over passwords. Our main positive result is an algorithm that -- with high probability --- constructs almost optimal policies (which are specified as a union of subsets of allowed passwords), and requires only a small number of samples of users' preferred passwords. We complement our theoretical results with simulations using a real-world dataset of 32 million passwords.
△ Less
Submitted 25 February, 2013; v1 submitted 20 February, 2013;
originally announced February 2013.
-
Differentially Private Data Analysis of Social Networks via Restricted Sensitivity
Authors:
Jeremiah Blocki,
Avrim Blum,
Anupam Datta,
Or Sheffet
Abstract:
We introduce the notion of restricted sensitivity as an alternative to global and smooth sensitivity to improve accuracy in differentially private data analysis. The definition of restricted sensitivity is similar to that of global sensitivity except that instead of quantifying over all possible datasets, we take advantage of any beliefs about the dataset that a querier may have, to quantify over…
▽ More
We introduce the notion of restricted sensitivity as an alternative to global and smooth sensitivity to improve accuracy in differentially private data analysis. The definition of restricted sensitivity is similar to that of global sensitivity except that instead of quantifying over all possible datasets, we take advantage of any beliefs about the dataset that a querier may have, to quantify over a restricted class of datasets. Specifically, given a query f and a hypothesis H about the structure of a dataset D, we show generically how to transform f into a new query f_H whose global sensitivity (over all datasets including those that do not satisfy H) matches the restricted sensitivity of the query f. Moreover, if the belief of the querier is correct (i.e., D is in H) then f_H(D) = f(D). If the belief is incorrect, then f_H(D) may be inaccurate.
We demonstrate the usefulness of this notion by considering the task of answering queries regarding social-networks, which we model as a combination of a graph and a labeling of its vertices. In particular, while our generic procedure is computationally inefficient, for the specific definition of H as graphs of bounded degree, we exhibit efficient ways of constructing f_H using different projection-based techniques. We then analyze two important query classes: subgraph counting queries (e.g., number of triangles) and local profile queries (e.g., number of people who know a spy and a computer-scientist who know each other). We demonstrate that the restricted sensitivity of such queries can be significantly lower than their smooth sensitivity. Thus, using restricted sensitivity we can maintain privacy whether or not D is in H, while providing more accurate results in the event that H holds true.
△ Less
Submitted 1 February, 2013; v1 submitted 22 August, 2012;
originally announced August 2012.
-
The Johnson-Lindenstrauss Transform Itself Preserves Differential Privacy
Authors:
Jeremiah Blocki,
Avrim Blum,
Anupam Datta,
Or Sheffet
Abstract:
This paper proves that an "old dog", namely -- the classical Johnson-Lindenstrauss transform, "performs new tricks" -- it gives a novel way of preserving differential privacy. We show that if we take two databases, $D$ and $D'$, such that (i) $D'-D$ is a rank-1 matrix of bounded norm and (ii) all singular values of $D$ and $D'$ are sufficiently large, then multiplying either $D$ or $D'$ with a vec…
▽ More
This paper proves that an "old dog", namely -- the classical Johnson-Lindenstrauss transform, "performs new tricks" -- it gives a novel way of preserving differential privacy. We show that if we take two databases, $D$ and $D'$, such that (i) $D'-D$ is a rank-1 matrix of bounded norm and (ii) all singular values of $D$ and $D'$ are sufficiently large, then multiplying either $D$ or $D'$ with a vector of iid normal Gaussians yields two statistically close distributions in the sense of differential privacy. Furthermore, a small, deterministic and \emph{public} alteration of the input is enough to assert that all singular values of $D$ are large.
We apply the Johnson-Lindenstrauss transform to the task of approximating cut-queries: the number of edges crossing a $(S,\bar S)$-cut in a graph. We show that the JL transform allows us to \emph{publish a sanitized graph} that preserves edge differential privacy (where two graphs are neighbors if they differ on a single edge) while adding only $O(|S|/ε)$ random noise to any given query (w.h.p). Comparing the additive noise of our algorithm to existing algorithms for answering cut-queries in a differentially private manner, we outperform all others on small cuts ($|S| = o(n)$).
We also apply our technique to the task of estimating the variance of a given matrix in any given direction. The JL transform allows us to \emph{publish a sanitized covariance matrix} that preserves differential privacy w.r.t bounded changes (each row in the matrix can change by at most a norm-1 vector) while adding random noise of magnitude independent of the size of the matrix (w.h.p). In contrast, existing algorithms introduce an error which depends on the matrix dimensions.
△ Less
Submitted 18 August, 2012; v1 submitted 10 April, 2012;
originally announced April 2012.
-
Adaptive Regret Minimization in Bounded-Memory Games
Authors:
Jeremiah Blocki,
Nicolas Christin,
Anupam Datta,
Arunesh Sinha
Abstract:
Online learning algorithms that minimize regret provide strong guarantees in situations that involve repeatedly making decisions in an uncertain environment, e.g. a driver deciding what route to drive to work every day. While regret minimization has been extensively studied in repeated games, we study regret minimization for a richer class of games called bounded memory games. In each round of a t…
▽ More
Online learning algorithms that minimize regret provide strong guarantees in situations that involve repeatedly making decisions in an uncertain environment, e.g. a driver deciding what route to drive to work every day. While regret minimization has been extensively studied in repeated games, we study regret minimization for a richer class of games called bounded memory games. In each round of a two-player bounded memory-m game, both players simultaneously play an action, observe an outcome and receive a reward. The reward may depend on the last m outcomes as well as the actions of the players in the current round. The standard notion of regret for repeated games is no longer suitable because actions and rewards can depend on the history of play. To account for this generality, we introduce the notion of k-adaptive regret, which compares the reward obtained by playing actions prescribed by the algorithm against a hypothetical k-adaptive adversary with the reward obtained by the best expert in hindsight against the same adversary. Roughly, a hypothetical k-adaptive adversary adapts her strategy to the defender's actions exactly as the real adversary would within each window of k rounds. Our definition is parametrized by a set of experts, which can include both fixed and adaptive defender strategies.
We investigate the inherent complexity of and design algorithms for adaptive regret minimization in bounded memory games of perfect and imperfect information. We prove a hardness result showing that, with imperfect information, any k-adaptive regret minimizing algorithm (with fixed strategies as experts) must be inefficient unless NP=RP even when playing against an oblivious adversary. In contrast, for bounded memory games of perfect and imperfect information we present approximate 0-adaptive regret minimization algorithms against an oblivious adversary running in time n^{O(1)}.
△ Less
Submitted 5 September, 2013; v1 submitted 11 November, 2011;
originally announced November 2011.
-
Resolving the Complexity of Some Data Privacy Problems
Authors:
Jeremiah Blocki,
Ryan Williams
Abstract:
We formally study two methods for data sanitation that have been used extensively in the database community: k-anonymity and l-diversity. We settle several open problems concerning the difficulty of applying these methods optimally, proving both positive and negative results:
1. 2-anonymity is in P.
2. The problem of partitioning the edges of a triangle-free graph into 4-stars (degree-three ver…
▽ More
We formally study two methods for data sanitation that have been used extensively in the database community: k-anonymity and l-diversity. We settle several open problems concerning the difficulty of applying these methods optimally, proving both positive and negative results:
1. 2-anonymity is in P.
2. The problem of partitioning the edges of a triangle-free graph into 4-stars (degree-three vertices) is NP-hard. This yields an alternative proof that 3-anonymity is NP-hard even when the database attributes are all binary.
3. 3-anonymity with only 27 attributes per record is MAX SNP-hard.
4. For databases with n rows, k-anonymity is in O(4^n poly(n)) time for all k > 1.
5. For databases with n rows and l <= log_{2c+2} log n attributes over an alphabet of cardinality c = O(1), k-anonymity is in P. Assuming c, l = O(1), k-anonymity is in O(n).
6. 3-diversity with binary attributes is NP-hard, with one sensitive attribute.
7. 2-diversity with binary attributes is NP-hard, with three sensitive attributes.
△ Less
Submitted 23 April, 2010; v1 submitted 21 April, 2010;
originally announced April 2010.