-
Enabling Cost-Benefit Analysis of Data Sync Protocols
Authors:
Novak Boškov,
Ari Trachtenberg,
David Starobinski
Abstract:
The problem of data synchronization arises in networked applications that require some measure of consistency. Indeed data synchronization approaches have demonstrated a significant potential for improving performance in various applications ranging from distributed ledgers to fog-enabled storage offloading for IoT. Although several protocols for data sets synchronization have been proposed over t…
▽ More
The problem of data synchronization arises in networked applications that require some measure of consistency. Indeed data synchronization approaches have demonstrated a significant potential for improving performance in various applications ranging from distributed ledgers to fog-enabled storage offloading for IoT. Although several protocols for data sets synchronization have been proposed over the years, there is currently no widespread utility implementing them, unlike the popular Rsync utility available for file synchronization. To that end, we describe a new middleware called GenSync that abstracts the subtleties of the state-of-the-art data synchronization protocols, allows users to choose protocols based on a comparative evaluation under realistic system conditions, and seamlessly integrate protocols in existing applications through a public API. We showcase GenSync through a case study, in which we integrate it into one of the world's largest wireless emulators and compare the performance of its included protocols.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
SREP: Out-Of-Band Sync of Transaction Pools for Large-Scale Blockchains
Authors:
Novak Boškov,
Şevval Şimşek,
Ari Trachtenberg,
David Starobinski
Abstract:
Synchronization of transaction pools (mempools) has shown potential for improving the performance and block propagation delay of state-of-the-art blockchains. Indeed, various heuristics have been proposed in the literature to this end, all of which incorporate exchanges of unconfirmed transactions into their block propagation protocol. In this work, we take a different approach, maintaining transa…
▽ More
Synchronization of transaction pools (mempools) has shown potential for improving the performance and block propagation delay of state-of-the-art blockchains. Indeed, various heuristics have been proposed in the literature to this end, all of which incorporate exchanges of unconfirmed transactions into their block propagation protocol. In this work, we take a different approach, maintaining transaction synchronization outside (and independently) of the block propagation channel. In the process, we formalize the synchronization problem within a graph theoretic framework and introduce a novel algorithm (SREP - Set Reconciliation-Enhanced Propagation) with quantifiable guarantees. We analyze the algorithm's performance for various realistic network topologies, and show that it converges on any connected graph in a number of steps that is bounded by the diameter of the graph. We confirm our analytical findings through extensive simulations that include comparison with MempoolSync, a recent approach from the literature. Our simulations show that SREP incurs reasonable overall bandwidth overhead and, unlike MempoolSync, scales gracefully with the size of the network.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Anonymous Collocation Discovery: Harnessing Privacy to Tame the Coronavirus
Authors:
Ran Canetti,
Ari Trachtenberg,
Mayank Varia
Abstract:
Successful containment of the Coronavirus pandemic rests on the ability to quickly and reliably identify those who have been in close proximity to a contagious individual. Existing tools for doing so rely on the collection of exact location information of individuals over lengthy time periods, and combining this information with other personal information. This unprecedented encroachment on indivi…
▽ More
Successful containment of the Coronavirus pandemic rests on the ability to quickly and reliably identify those who have been in close proximity to a contagious individual. Existing tools for doing so rely on the collection of exact location information of individuals over lengthy time periods, and combining this information with other personal information. This unprecedented encroachment on individual privacy at national scales has created an outcry and risks rejection of these tools.
We propose an alternative: an extremely simple scheme for providing fine-grained and timely alerts to users who have been in the close vicinity of an infected individual. Crucially, this is done while preserving the anonymity of all individuals, and without collecting or storing any personal information or location history. Our approach is based on using short-range communication mechanisms, like Bluetooth, that are available in all modern cell phones. It can be deployed with very little infrastructure, and incurs a relatively low false-positive rate compared to other collocation methods. We also describe a number of extensions and tradeoffs.
We believe that the privacy guarantees provided by the scheme will encourage quick and broad voluntary adoption. When combined with sufficient testing capacity and existing best practices from healthcare professionals, we hope that this may significantly reduce the infection rate.
△ Less
Submitted 3 April, 2020; v1 submitted 30 March, 2020;
originally announced March 2020.
-
Characterizing Orphan Transactions in the Bitcoin Network
Authors:
Muhammad Anas Imtiaz,
David Starobinski,
Ari Trachtenberg
Abstract:
Orphan transactions are those whose parental income-sources are missing at the time that they are processed. These transactions are not propagated to other nodes until all of their missing parents are received, and they thus end up languishing in a local buffer until evicted or their parents are found. Although there has been little work in the literature on characterizing the nature and impact of…
▽ More
Orphan transactions are those whose parental income-sources are missing at the time that they are processed. These transactions are not propagated to other nodes until all of their missing parents are received, and they thus end up languishing in a local buffer until evicted or their parents are found. Although there has been little work in the literature on characterizing the nature and impact of such orphans, it is intuitive that they may affect throughput on the Bitcoin network. This work thus seeks to methodically research such effects through a measurement campaign of orphan transactions on live Bitcoin nodes. Our data show that, surprisingly, orphan transactions tend to have fewer parents on average than non-orphan transactions. Moreover, the salient features of their missing parents are a lower fee and larger size than their non-orphan counterparts, resulting in a lower transaction fee per byte. Finally, we note that the network overhead incurred by these orphan transactions can be significant, exceeding 17% when using the default orphan memory pool size (100 transactions). However, this overhead can be made negligible, without significant computational or memory demands, if the pool size is merely increased to 1000 transactions.
△ Less
Submitted 11 March, 2020; v1 submitted 23 December, 2019;
originally announced December 2019.
-
Scalable String Reconciliation by Recursive Content-Dependent Shingling
Authors:
Bowen Song,
Ari Trachtenberg
Abstract:
We consider the problem of reconciling similar, but remote, strings with minimum communication complexity. This "string reconciliation" problem is a fundamental building block for a variety of networking applications, including those that maintain large-scale distributed networks and perform remote file synchronization. We present the novel Recursive Content-Dependent Shingling (RCDS) protocol tha…
▽ More
We consider the problem of reconciling similar, but remote, strings with minimum communication complexity. This "string reconciliation" problem is a fundamental building block for a variety of networking applications, including those that maintain large-scale distributed networks and perform remote file synchronization. We present the novel Recursive Content-Dependent Shingling (RCDS) protocol that is computationally practical for large strings and scales linearly with the edit distance between the remote strings. We provide comparisons to the performance of Rsync, one of the most popular file synchronization tools in active use. Our experiments show that, with minimal engineering, RCDS outperforms the heavily optimized Rsync in reconciling release revisions for about 51% of the 5000 top starred git repositories on GitHub. The improvement is particularly evident for repositories that see frequent, but small, updates.
△ Less
Submitted 1 October, 2019;
originally announced October 2019.
-
Case Study: Disclosure of Indirect Device Fingerprinting in Privacy Policies
Authors:
Julissa Milligan,
Sarah Scheffler,
Andrew Sellars,
Trishita Tiwari,
Ari Trachtenberg,
Mayank Varia
Abstract:
Recent developments in online tracking make it harder for individuals to detect and block trackers. Some sites have deployed indirect tracking methods, which attempt to uniquely identify a device by asking the browser to perform a seemingly-unrelated task. One type of indirect tracking, Canvas fingerprinting, causes the browser to render a graphic recording rendering statistics as a unique identif…
▽ More
Recent developments in online tracking make it harder for individuals to detect and block trackers. Some sites have deployed indirect tracking methods, which attempt to uniquely identify a device by asking the browser to perform a seemingly-unrelated task. One type of indirect tracking, Canvas fingerprinting, causes the browser to render a graphic recording rendering statistics as a unique identifier. In this work, we observe how indirect device fingerprinting methods are disclosed in privacy policies, and consider whether the disclosures are sufficient to enable website visitors to block the tracking methods. We compare these disclosures to the disclosure of direct fingerprinting methods on the same websites.
Our case study analyzes one indirect fingerprinting technique, Canvas fingerprinting. We use an existing automated detector of this fingerprinting technique to conservatively detect its use on Alexa Top 500 websites that cater to United States consumers, and we examine the privacy policies of the resulting 28 websites. Disclosures of indirect fingerprinting vary in specificity. None described the specific methods with enough granularity to know the website used Canvas fingerprinting. Conversely, many sites did provide enough detail about usage of direct fingerprinting methods to allow a website visitor to reliably detect and block those techniques.
We conclude that indirect fingerprinting methods are often difficult to detect and are not identified with specificity in privacy policies. This makes indirect fingerprinting more difficult to block, and therefore risks disturbing the tentative armistice between individuals and websites currently in place for direct fingerprinting. This paper illustrates differences in fingerprinting approaches, and explains why technologists, technology lawyers, and policymakers need to appreciate the challenges of indirect fingerprinting.
△ Less
Submitted 21 August, 2019;
originally announced August 2019.
-
Collaborative Privacy for Web Applications
Authors:
Yihao Hu,
Ari Trachtenberg,
Prakash Ishwar
Abstract:
Real-time, online-editing web apps provide free and convenient services for collaboratively editing, sharing and storing files. The benefits of these web applications do not come for free: not only do service providers have full access to the users' files, but they also control access, transmission, and storage mechanisms for them. As a result, user data may be at risk of data mining, third-party…
▽ More
Real-time, online-editing web apps provide free and convenient services for collaboratively editing, sharing and storing files. The benefits of these web applications do not come for free: not only do service providers have full access to the users' files, but they also control access, transmission, and storage mechanisms for them. As a result, user data may be at risk of data mining, third-party interception, or even manipulation. To combat this, we propose a new system for hel** to preserve the privacy of user data within collaborative environments. There are several distinct challenges in producing such a system, including develo** an encryption mechanism that does not interfere with the back-end (and often proprietary) control mechanisms utilized by the service, and identifying transparent code hooks through which to obfuscate user data. Toward the first challenge, we develop a character-level encryption scheme that is more resilient to the types of attacks that plague classical substitution ciphers. For the second challenge, we design a browser extension that robustly demonstrates the feasibility of our approach, and show a concrete implementation for Google Chrome and the widely-used Google Docs platform. Our example tangibly demonstrates how several users with a shared key can collaboratively and transparently edit a Google Docs document without revealing the plaintext directly to Google.
△ Less
Submitted 17 November, 2019; v1 submitted 10 January, 2019;
originally announced January 2019.
-
Page Cache Attacks
Authors:
Daniel Gruss,
Erik Kraft,
Trishita Tiwari,
Michael Schwarz,
Ari Trachtenberg,
Jason Hennessey,
Alex Ionescu,
Anders Fogh
Abstract:
We present a new hardware-agnostic side-channel attack that targets one of the most fundamental software caches in modern computer systems: the operating system page cache. The page cache is a pure software cache that contains all disk-backed pages, including program binaries, shared libraries, and other files, and our attacks thus work across cores and CPUs. Our side-channel permits unprivileged…
▽ More
We present a new hardware-agnostic side-channel attack that targets one of the most fundamental software caches in modern computer systems: the operating system page cache. The page cache is a pure software cache that contains all disk-backed pages, including program binaries, shared libraries, and other files, and our attacks thus work across cores and CPUs. Our side-channel permits unprivileged monitoring of some memory accesses of other processes, with a spatial resolution of 4KB and a temporal resolution of 2 microseconds on Linux (restricted to 6.7 measurements per second) and 466 nanoseconds on Windows (restricted to 223 measurements per second); this is roughly the same order of magnitude as the current state-of-the-art cache attacks. We systematically analyze our side channel by demonstrating different local attacks, including a sandbox bypassing high-speed covert channel, timed user-interface redressing attacks, and an attack recovering automatically generated temporary passwords. We further show that we can trade off the side channel's hardware agnostic property for remote exploitability. We demonstrate this via a low profile remote covert channel that uses this page-cache side-channel to exfiltrate information from a malicious sender process through innocuous server requests. Finally, we propose mitigations for some of our attacks, which have been acknowledged by operating system vendors and slated for future security patches.
△ Less
Submitted 4 January, 2019;
originally announced January 2019.
-
Nothing But Net: Invading Android User Privacy Using Only Network Access Patterns
Authors:
Mikhail Andreev,
Avi Klausner,
Trishita Tiwari,
Ari Trachtenberg,
Arkady Yerukhimovich
Abstract:
We evaluate the power of simple networks side-channels to violate user privacy on Android devices. Specifically, we show that, using blackbox network metadata alone (i.e., traffic statistics such as transmission time and size of packets) it is possible to infer several elements of a user's location and also identify their web browsing history (i.e, which sites they visited). We do this with relati…
▽ More
We evaluate the power of simple networks side-channels to violate user privacy on Android devices. Specifically, we show that, using blackbox network metadata alone (i.e., traffic statistics such as transmission time and size of packets) it is possible to infer several elements of a user's location and also identify their web browsing history (i.e, which sites they visited). We do this with relatively simple learning and classification methods and basic network statistics. For most Android phones currently on the market, such process-level traffic statistics are available for any running process, without any permissions control and at fine-grained details, although, as we demonstrate, even device-level statistics are sufficient for some of our attacks. In effect, it may be possible for any application running on these phones to identify privacy-revealing elements of a user's location, for example, correlating travel with places of worship, point-of-care medical establishments, or political activity.
△ Less
Submitted 7 July, 2018;
originally announced July 2018.
-
Improving Bitcoin's Resilience to Churn
Authors:
Nabeel Younis,
Muhammad Anas Imtiaz,
David Starobinski,
Ari Trachtenberg
Abstract:
Efficient and reliable block propagation on the Bitcoin network is vital for ensuring the scalability of this peer-to-peer network. To this end, several schemes have been proposed over the last few years to speed up the block propagation, most notably the compact block protocol (BIP 152). Despite this, we show experimental evidence that nodes that have recently joined the network may need about te…
▽ More
Efficient and reliable block propagation on the Bitcoin network is vital for ensuring the scalability of this peer-to-peer network. To this end, several schemes have been proposed over the last few years to speed up the block propagation, most notably the compact block protocol (BIP 152). Despite this, we show experimental evidence that nodes that have recently joined the network may need about ten days until this protocol becomes 90% effective. This problem is endemic for nodes that do not have persistent network connectivity. We propose to mitigate this ineffectiveness by maintaining mempool synchronization among Bitcoin nodes. For this purpose, we design and implement into Bitcoin a new prioritized data synchronization protocol, called FalafelSync. Our experiments show that FalafelSync helps intermittently connected nodes to maintain better consistency with more stable nodes, thereby showing promise for improving block propagation in the broader network. In the process, we have also developed an effective logging mechanism for bitcoin nodes we release for public use.
△ Less
Submitted 17 March, 2018;
originally announced March 2018.
-
Fountain Codes with Nonuniform Selection Distributions through Feedback
Authors:
Morteza Hashemi,
Yuval Cassuto,
Ari Trachtenberg
Abstract:
One key requirement for fountain (rateless) coding schemes is to achieve a high intermediate symbol recovery rate. Recent coding schemes have incorporated the use of a feedback channel to improve intermediate performance of traditional rateless codes; however, these codes with feedback are designed based on uniformly at random selection of input symbols. In this paper, on the other hand, we develo…
▽ More
One key requirement for fountain (rateless) coding schemes is to achieve a high intermediate symbol recovery rate. Recent coding schemes have incorporated the use of a feedback channel to improve intermediate performance of traditional rateless codes; however, these codes with feedback are designed based on uniformly at random selection of input symbols. In this paper, on the other hand, we develop feedback-based fountain codes with dynamically-adjusted nonuniform symbol selection distributions, and show that this characteristic can enhance the intermediate decoding rate. We provide an analysis of our codes, including bounds on computational complexity and failure probability for a maximum likelihood decoder; the latter are tighter than bounds known for classical rateless codes. Through numerical simulations, we also show that feedback information paired with a nonuniform selection distribution can highly improve the symbol recovery rate, and that the amount of feedback sent can be tuned to the specific transmission properties of a given feedback channel.
△ Less
Submitted 7 April, 2015;
originally announced April 2015.
-
Securing Smartphones: A Micro-TCB Approach
Authors:
Yossi Gilad,
Amir Herzberg,
Ari Trachtenberg
Abstract:
As mobile phones have evolved into `smartphones', with complex operating systems running third- party software, they have become increasingly vulnerable to malicious applications (malware). We introduce a new design for mitigating malware attacks against smartphone users, based on a small trusted computing base module, denoted uTCB. The uTCB manages sensitive data and sensors, and provides core se…
▽ More
As mobile phones have evolved into `smartphones', with complex operating systems running third- party software, they have become increasingly vulnerable to malicious applications (malware). We introduce a new design for mitigating malware attacks against smartphone users, based on a small trusted computing base module, denoted uTCB. The uTCB manages sensitive data and sensors, and provides core services to applications, independently of the operating system. The user invokes uTCB using a simple secure attention key, which is pressed in order to validate physical possession of the device and authorize a sensitive action; this protects private information even if the device is infected with malware. We present a proof-of-concept implementation of uTCB based on ARM's TrustZone, a secure execution environment increasingly found in smartphones, and evaluate our implementation using simulations.
△ Less
Submitted 29 January, 2014;
originally announced January 2014.
-
Efficiently decoding strings from their shingles
Authors:
Aryeh Kontorovich,
Ari Trachtenberg
Abstract:
Determining whether an unordered collection of overlap** substrings (called shingles) can be uniquely decoded into a consistent string is a problem that lies within the foundation of a broad assortment of disciplines ranging from networking and information theory through cryptography and even genetic engineering and linguistics. We present three perspectives on this problem: a graph theoretic fr…
▽ More
Determining whether an unordered collection of overlap** substrings (called shingles) can be uniquely decoded into a consistent string is a problem that lies within the foundation of a broad assortment of disciplines ranging from networking and information theory through cryptography and even genetic engineering and linguistics. We present three perspectives on this problem: a graph theoretic framework due to Pevzner, an automata theoretic approach from our previous work, and a new insight that yields a time-optimal streaming algorithm for determining whether a string of $n$ characters over the alphabet $Σ$ can be uniquely decoded from its two-character shingles. Our algorithm achieves an overall time complexity $Θ(n)$ and space complexity $O(|Σ|)$. As an application, we demonstrate how this algorithm can be extended to larger shingles for efficient string reconciliation.
△ Less
Submitted 15 April, 2012;
originally announced April 2012.
-
Unique decodability of bigram counts by finite automata
Authors:
Aryeh Kontorovich,
Ari Trachtenberg
Abstract:
We revisit the problem of deciding whether a given string is uniquely decodable from its bigram counts by means of a finite automaton. An efficient algorithm for constructing a polynomial-size nondeterministic finite automaton that decides unique decodability is given. Conversely, we show that the minimum deterministic finite automaton for deciding unique decodability has at least exponentially ma…
▽ More
We revisit the problem of deciding whether a given string is uniquely decodable from its bigram counts by means of a finite automaton. An efficient algorithm for constructing a polynomial-size nondeterministic finite automaton that decides unique decodability is given. Conversely, we show that the minimum deterministic finite automaton for deciding unique decodability has at least exponentially many states in alphabet size.
△ Less
Submitted 28 November, 2011;
originally announced November 2011.
-
HLA and HIV Infection Progression: Application of the Minimum Description Length Principle to Statistical Genetics
Authors:
Peter T. Hraber,
Bette T. Korber,
Steven Wolinsky,
Henry A. Erlich,
Elizabeth A. Trachtenberg,
Thomas B. Kepler
Abstract:
The minimum description length (MDL) principle states that the best model to account for some data minimizes the sum of the lengths, in bits, of the descriptions of the model and the residual error. The description length is thus a criterion for model selection. Description-length analysis of HLA alleles from the Chicago MACS cohort enables classification of alleles associated with plasma HIV RN…
▽ More
The minimum description length (MDL) principle states that the best model to account for some data minimizes the sum of the lengths, in bits, of the descriptions of the model and the residual error. The description length is thus a criterion for model selection. Description-length analysis of HLA alleles from the Chicago MACS cohort enables classification of alleles associated with plasma HIV RNA, an indicator of infection progression. Progression variation is most strongly associated with HLA-B. Individuals without B58s supertype alleles average viral RNA levels 3.6-fold greater than individuals with them.
△ Less
Submitted 26 May, 2005;
originally announced May 2005.