Search | arXiv e-print repository

Once is Never Enough: Foundations for Sound Statistical Inference in Tor Network Experimentation

Authors: Rob Jansen, Justin Tracey, Ian Goldberg

Abstract: Tor is a popular low-latency anonymous communication system that focuses on usability and performance: a faster network will attract more users, which in turn will improve the anonymity of everyone using the system. The standard practice for previous research attempting to enhance Tor performance is to draw conclusions from the observed results of a single simulation for standard Tor and for each… ▽ More Tor is a popular low-latency anonymous communication system that focuses on usability and performance: a faster network will attract more users, which in turn will improve the anonymity of everyone using the system. The standard practice for previous research attempting to enhance Tor performance is to draw conclusions from the observed results of a single simulation for standard Tor and for each research variant. But because the simulations are run in sampled Tor networks, it is possible that sampling error alone could cause the observed effects. Therefore, we call into question the practical meaning of any conclusions that are drawn without considering the statistical significance of the reported results. In this paper, we build foundations upon which we improve the Tor experimental method. First, we present a new Tor network modeling methodology that produces more representative Tor networks as well as new and improved experimentation tools that run Tor simulations faster and at a larger scale than was previously possible. We showcase these contributions by running simulations with 6,489 relays and 792k simultaneously active users, the largest known Tor network simulations and the first at a network scale of 100%. Second, we present new statistical methodologies through which we: (i) show that running multiple simulations in independently sampled networks is necessary in order to produce informative results; and (ii) show how to use the results from multiple simulations to conduct sound statistical inference. We present a case study using 420 simulations to demonstrate how to apply our methodologies to a concrete set of Tor experiments and how to analyze the results. △ Less

Submitted 24 March, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

arXiv:2010.12112 [pdf, other]

Investigating Membership Inference Attacks under Data Dependencies

Authors: Thomas Humphries, Simon Oya, Lindsey Tulloch, Matthew Rafuse, Ian Goldberg, Urs Hengartner, Florian Kerschbaum

Abstract: Training machine learning models on privacy-sensitive data has become a popular practice, driving innovation in ever-expanding fields. This has opened the door to new attacks that can have serious privacy implications. One such attack, the Membership Inference Attack (MIA), exposes whether or not a particular data point was used to train a model. A growing body of literature uses Differentially Pr… ▽ More Training machine learning models on privacy-sensitive data has become a popular practice, driving innovation in ever-expanding fields. This has opened the door to new attacks that can have serious privacy implications. One such attack, the Membership Inference Attack (MIA), exposes whether or not a particular data point was used to train a model. A growing body of literature uses Differentially Private (DP) training algorithms as a defence against such attacks. However, these works evaluate the defence under the restrictive assumption that all members of the training set, as well as non-members, are independent and identically distributed. This assumption does not hold for many real-world use cases in the literature. Motivated by this, we evaluate membership inference with statistical dependencies among samples and explain why DP does not provide meaningful protection (the privacy parameter $ε$ scales with the training set size $n$) in this more general case. We conduct a series of empirical evaluations with off-the-shelf MIAs using training sets built from real-world data showing different types of dependencies among samples. Our results reveal that training set dependencies can severely increase the performance of MIAs, and therefore assuming that data samples are statistically independent can significantly underestimate the performance of MIAs. △ Less

Submitted 14 June, 2023; v1 submitted 22 October, 2020; originally announced October 2020.

Comments: IEEE 36th Computer Security Foundations Symposium (CSF)

arXiv:1908.09165 [pdf, other]

Augmented Unlocking Techniques for Smartphones Using Pre-Touch Information

Authors: Matthew Lakier, Dimcho Karakashev, Yixin Wang, Ian Goldberg

Abstract: Smartphones store a significant amount of personal and private information, and are playing an increasingly important role in people's lives. It is important for authentication techniques to be more resistant against two known attacks called shoulder surfing and smudge attacks. In this work, we propose a new technique called 3D Pattern. Our 3D Pattern technique takes advantage of a new input parad… ▽ More Smartphones store a significant amount of personal and private information, and are playing an increasingly important role in people's lives. It is important for authentication techniques to be more resistant against two known attacks called shoulder surfing and smudge attacks. In this work, we propose a new technique called 3D Pattern. Our 3D Pattern technique takes advantage of a new input paradigm called pre-touch, which could soon allow smartphones to sense a user's finger position at some distance from the screen. We implement the technique and evaluate it in a pilot study (n=6) by comparing it to PIN and pattern locks. Our results show that although our prototype takes about 8 seconds to authenticate, it is immune to smudge attacks and promises to be more resistant to shoulder surfing. △ Less

Submitted 24 August, 2019; originally announced August 2019.

arXiv:1710.00960 [pdf]

A Circulating Biomarker-based Framework for Diagnosis of Hepatocellular Carcinoma in a Clinically Relevant Model of Non-alcoholic Steatohepatitis; An OAD to NASH

Authors: ** Zhou, Anne Hwang, Christopher Shi, Edward Zhu, Farha Naaz, Zainab Rasheed, Michelle Liu, Lindsey S. Jung, **gsong Li, Kai Jiang, Latha Paka, Michael A. Yamin, Itzhak D. Goldberg, Prakash Narayan

Abstract: Although cirrhosis is a key risk factor for the development of hepatocellular carcinoma (HCC), mounting evidence indicates that in a subset of patients presenting with non-alcoholic steatohepatitis (NASH), HCC manifests in the absence of cirrhosis. Given the sheer size of the non-alcoholic fatty liver disease (NAFLD) epidemic, and the dismal prognosis associated with late-stage primary liver cance… ▽ More Although cirrhosis is a key risk factor for the development of hepatocellular carcinoma (HCC), mounting evidence indicates that in a subset of patients presenting with non-alcoholic steatohepatitis (NASH), HCC manifests in the absence of cirrhosis. Given the sheer size of the non-alcoholic fatty liver disease (NAFLD) epidemic, and the dismal prognosis associated with late-stage primary liver cancer, there is an urgent need for HCC surveillance in the NASH patient. In the present study, adult male mice randomized to control diet or a fast food diet (FFD) were followed for up to 14 mo and serum level of a panel of HCC-relevant biomarkers was compared with liver biopsies at 3 and 14 mo. Both NAFLD Activity Score (NAS) and hepatic hydroxyproline content were elevated at 3 and 14 mo on FFD. Picrosirius red staining of liver sections revealed a filigree pattern of fibrillar collagen deposition with no cirrhosis at 14 mo on FFD. Nevertheless, 46% of animals bore one or more tumors on their livers confirmed as HCC in hematoxylin-eosin-stained liver sections. Receiver operating characteristic (ROC) curves analysis for serum levels of the HCC biomarkers osteopontin (OPN), alpha-fetoprotein (AFP) and Dickkopf-1 (DKK1) returned concordance-statistic/area under ROC curve of > 0.89. These data suggest that serum levels of OPN (threshold, 218 ng/mL; sensitivity, 82%; specificity, 86%), AFP (136 ng/mL; 91%; 97%) and DKK1 (2.4 ng/mL; 82%; 81%) are diagnostic for HCC in a clinically relevant model of NASH △ Less

Submitted 19 November, 2017; v1 submitted 2 October, 2017; originally announced October 2017.

arXiv:1709.05748 [pdf, other]

Settling Payments Fast and Private: Efficient Decentralized Routing for Path-Based Transactions

Authors: Stefanie Roos, Pedro Moreno-Sanchez, Aniket Kate, Ian Goldberg

Abstract: Path-based transaction (PBT) networks, which settle payments from one user to another via a path of intermediaries, are a growing area of research. They overcome the scalability and privacy issues in cryptocurrencies like Bitcoin and Ethereum by replacing expensive and slow on-chain blockchain operations with inexpensive and fast off-chain transfers. In the form of credit networks such as Ripple a… ▽ More Path-based transaction (PBT) networks, which settle payments from one user to another via a path of intermediaries, are a growing area of research. They overcome the scalability and privacy issues in cryptocurrencies like Bitcoin and Ethereum by replacing expensive and slow on-chain blockchain operations with inexpensive and fast off-chain transfers. In the form of credit networks such as Ripple and Stellar, they also enable low-price real-time gross settlements across different currencies. For example, SilentWhsipers is a recently proposed fully distributed credit network relying on path-based transactions for secure and in particular private payments without a public ledger. At the core of a decentralized PBT network is a routing algorithm that discovers transaction paths between payer and payee. During the last year, a number of routing algorithms have been proposed. However, the existing ad hoc efforts lack either efficiency or privacy. In this work, we first identify several efficiency concerns in SilentWhsipers. Armed with this knowledge, we design and evaluate SpeedyMurmurs, a novel routing algorithm for decentralized PBT networks using efficient and flexible embedding-based path discovery and on-demand efficient stabilization to handle the dynamics of a PBT network. Our simulation study, based on real-world data from the currently deployed Ripple credit network, indicates that SpeedyMurmurs reduces the overhead of stabilization by up to two orders of magnitude and the overhead of routing a transaction by more than a factor of two. Furthermore, using SpeedyMurmurs maintains at least the same success ratio as decentralized landmark routing, while providing lower delays. Finally, SpeedyMurmurs achieves key privacy goals for routing in PBT networks. △ Less

Submitted 13 December, 2017; v1 submitted 17 September, 2017; originally announced September 2017.

Comments: 15 pages, 3 figures

arXiv:1702.06612 [pdf, ps, other]

Some results on the existence of t-all-or-nothing transforms over arbitrary alphabets

Authors: Navid Nasr Esfahani, Ian Goldberg, Douglas R. Stinson

Abstract: A $(t, s, v)$-all-or-nothing transform is a bijective map** defined on $s$-tuples over an alphabet of size $v$, which satisfies the condition that the values of any $t$ input co-ordinates are completely undetermined, given only the values of any $s-t$ output co-ordinates. The main question we address in this paper is: for which choices of parameters does a $(t, s, v)$-all-or-nothing transform (A… ▽ More A $(t, s, v)$-all-or-nothing transform is a bijective map** defined on $s$-tuples over an alphabet of size $v$, which satisfies the condition that the values of any $t$ input co-ordinates are completely undetermined, given only the values of any $s-t$ output co-ordinates. The main question we address in this paper is: for which choices of parameters does a $(t, s, v)$-all-or-nothing transform (AONT) exist? More specifically, if we fix $t$ and $v$, we want to determine the maximum integer $s$ such that a $(t, s, v)$-AONT exists. We mainly concentrate on the case $t=2$ for arbitrary values of $v$, where we obtain various necessary as well as sufficient conditions for existence of these objects. We consider both linear and general (linear or nonlinear) AONT. We also show some connections between AONT, orthogonal arrays and resilient functions. △ Less

Submitted 21 February, 2017; originally announced February 2017.

Comments: 13 pages

arXiv:1611.02317 [pdf]

Renal Parenchymal Area and Kidney Collagen Content

Authors: Jake A. Nieto, Janice Zhu, Bin Duan, **gsong Li, ** Zhou, Latha Paka, Michael A. Yamin, Itzhak D. Goldberg, Prakash Narayan

Abstract: The extent of renal scarring in chronic kidney disease (CKD) can only be ascertained by highly invasive, painful and sometimes risky tissue biopsy. Interestingly, CKD-related abnormalities in kidney size can often be visualized using ultrasound. Nevertheless, not only does the ellipsoid formula used today underestimate true renal size but also the relation governing renal size and collagen content… ▽ More The extent of renal scarring in chronic kidney disease (CKD) can only be ascertained by highly invasive, painful and sometimes risky tissue biopsy. Interestingly, CKD-related abnormalities in kidney size can often be visualized using ultrasound. Nevertheless, not only does the ellipsoid formula used today underestimate true renal size but also the relation governing renal size and collagen content remains unclear. We used coronal kidney sections from healthy mice and mice with renal disease to develop a new technique for estimating the renal parenchymal area. While treating the kidney as an ellipse with the major axis the polar distance, this technique involves extending the minor axis into the renal pelvis. The calculated renal parenchymal area is remarkably similar to the measured area. Biochemically determined kidney collagen content revealed a strong and positive correlation with the calculated renal parenchymal area. The extent of renal scarring, i.e. kidney collagen content, can now be computed by making just two renal axial measurements which can easily be accomplished via noninvasive imaging of this organ. △ Less

Submitted 10 November, 2016; v1 submitted 7 November, 2016; originally announced November 2016.

Comments: 17 pages, 6 figures, 3 equations

arXiv:1607.07359 [pdf]

doi 10.1371/journal.pone.0163063

An Empirical Biomarker-based Calculator for Autosomal Recessive Polycystic Kidney Disease - The Nieto-Narayan Formula

Authors: Jake A. Nieto, Michael A. Yamin, Itzhak D. Goldberg, Prakash Narayan

Abstract: Autosomal polycystic kidney disease (ARPKD) is associated with progressive enlargement of the kidneys fuelled by the formation and expansion of fluid-filled cysts. The disease is congenital and children that do not succumb to it during the neonatal period will, by age 10 years, more often than not, require nephrectomy+renal replacement therapy for management of both pain and renal insufficiency. S… ▽ More Autosomal polycystic kidney disease (ARPKD) is associated with progressive enlargement of the kidneys fuelled by the formation and expansion of fluid-filled cysts. The disease is congenital and children that do not succumb to it during the neonatal period will, by age 10 years, more often than not, require nephrectomy+renal replacement therapy for management of both pain and renal insufficiency. Since increasing cystic index (CI; percent of kidney occupied by cysts) drives both renal expansion and organ dysfunction, management of these patients, including decisions such as elective nephrectomy and prioritization on the transplant waitlist, could clearly benefit from serial determination of CI. So also, clinical trials in ARPKD evaluating the efficacy of novel drug candidates could benefit from serial determination of CI. Although ultrasound is currently the imaging modality of choice for diagnosis of ARPKD, its utilization for assessing disease progression is highly limited. Magnetic resonance imaging or computed tomography, although more reliable for determination of CI, are expensive, time-consuming and somewhat impractical in the pediatric population. Using a well-established mammalian model of ARPKD, we undertook a big data-like analysis of minimally- or non-invasive serum and urine biomarkers of renal injury/dysfunction to derive a family of equations for estimating CI. We then applied a signal averaging protocol to distill these equations to a single empirical formula for calculation of CI. Such a formula will eventually find use in identifying and monitoring patients at high risk for progressing to end-stage renal disease and aid in the conduct of clinical trials. △ Less

Submitted 26 July, 2016; v1 submitted 25 July, 2016; originally announced July 2016.

Comments: 3 tables and 8 figures

arXiv:1604.00223 [pdf, other]

Lower-Cost epsilon-Private Information Retrieval

Authors: Raphael R. Toledo, George Danezis, Ian Goldberg

Abstract: Private Information Retrieval (PIR), despite being well studied, is computationally costly and arduous to scale. We explore lower-cost relaxations of information-theoretic PIR, based on dummy queries, sparse vectors, and compositions with an anonymity system. We prove the security of each scheme using a flexible differentially private definition for private queries that can capture notions of impe… ▽ More Private Information Retrieval (PIR), despite being well studied, is computationally costly and arduous to scale. We explore lower-cost relaxations of information-theoretic PIR, based on dummy queries, sparse vectors, and compositions with an anonymity system. We prove the security of each scheme using a flexible differentially private definition for private queries that can capture notions of imperfect privacy. We show that basic schemes are weak, but some of them can be made arbitrarily safe by composing them with large anonymity systems. △ Less

Submitted 1 April, 2016; originally announced April 2016.

arXiv:1412.1859 [pdf, other]

doi 10.1515/popets-2016-0030

Censorship Resistance: Let a Thousand Flowers Bloom?

Authors: Tariq Elahi, Steven J. Murdoch, Ian Goldberg

Abstract: This paper argues that one of the most important decisions in designing and deploying censorship resistance systems is whether one set of system options should be selected (the best), or whether there should be several sets of good ones. We model the problem of choosing these options as a cat-and-mouse game and show that the best strategy depends on the value the censor associates with total syste… ▽ More This paper argues that one of the most important decisions in designing and deploying censorship resistance systems is whether one set of system options should be selected (the best), or whether there should be several sets of good ones. We model the problem of choosing these options as a cat-and-mouse game and show that the best strategy depends on the value the censor associates with total system censorship versus partial, and the tolerance of false positives. If the censor has a low tolerance to false positives then choosing one censorship resistance system is best. Otherwise choosing several systems is the better choice, but the way traffic should be distributed over the systems depends on the tolerance of the censor to false negatives. We demonstrate that establishing the censor's utility function is critical to discovering the best strategy for censorship resistance. △ Less

Submitted 4 December, 2014; originally announced December 2014.

arXiv:1107.1072 [pdf, other]

Adding Query Privacy to Robust DHTs

Authors: Michael Backes, Ian Goldberg, Aniket Kate, Tomas Toft

Abstract: Interest in anonymous communication over distributed hash tables (DHTs) has increased in recent years. However, almost all known solutions solely aim at achieving sender or requestor anonymity in DHT queries. In many application scenarios, it is crucial that the queried key remains secret from intermediate peers that (help to) route the queries towards their destinations. In this paper, we satisfy… ▽ More Interest in anonymous communication over distributed hash tables (DHTs) has increased in recent years. However, almost all known solutions solely aim at achieving sender or requestor anonymity in DHT queries. In many application scenarios, it is crucial that the queried key remains secret from intermediate peers that (help to) route the queries towards their destinations. In this paper, we satisfy this requirement by presenting an approach for providing privacy for the keys in DHT queries. We use the concept of oblivious transfer (OT) in communication over DHTs to preserve query privacy without compromising spam resistance. Although our OT-based approach can work over any DHT, we concentrate on communication over robust DHTs that can tolerate Byzantine faults and resist spam. We choose the best-known robust DHT construction, and employ an efficient OT protocol well-suited for achieving our goal of obtaining query privacy over robust DHTs. Finally, we compare the performance of our privacy-preserving protocols with their more privacy-invasive counterparts. We observe that there is no increase in the message complexity and only a small overhead in the computational complexity. △ Less

Submitted 3 April, 2012; v1 submitted 6 July, 2011; originally announced July 2011.

Comments: To appear at ACM ASIACCS 2012

Showing 1–11 of 11 results for author: Goldberg, I