-
Taking Advice from (Dis)Similar Machines: The Impact of Human-Machine Similarity on Machine-Assisted Decision-Making
Authors:
Nina Grgić-Hlača,
Claude Castelluccia,
Krishna P. Gummadi
Abstract:
Machine learning algorithms are increasingly used to assist human decision-making. When the goal of machine assistance is to improve the accuracy of human decisions, it might seem appealing to design ML algorithms that complement human knowledge. While neither the algorithm nor the human are perfectly accurate, one could expect that their complementary expertise might lead to improved outcomes. In…
▽ More
Machine learning algorithms are increasingly used to assist human decision-making. When the goal of machine assistance is to improve the accuracy of human decisions, it might seem appealing to design ML algorithms that complement human knowledge. While neither the algorithm nor the human are perfectly accurate, one could expect that their complementary expertise might lead to improved outcomes. In this study, we demonstrate that in practice decision aids that are not complementary, but make errors similar to human ones may have their own benefits.
In a series of human-subject experiments with a total of 901 participants, we study how the similarity of human and machine errors influences human perceptions of and interactions with algorithmic decision aids. We find that (i) people perceive more similar decision aids as more useful, accurate, and predictable, and that (ii) people are more likely to take opposing advice from more similar decision aids, while (iii) decision aids that are less similar to humans have more opportunities to provide opposing advice, resulting in a higher influence on people's decisions overall.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Towards a Governance Framework for Brain Data
Authors:
Marcello Ienca,
Joseph J. Fins,
Ralf J. Jox,
Fabrice Jotterand,
Silja Voeneky,
Roberto Andorno,
Tonio Ball,
Claude Castelluccia,
Ricardo Chavarriaga,
Hervé Chneiweiss,
Agata Ferretti,
Orsolya Friedrich,
Samia Hurst,
Grischa Merkel,
Fruzsina Molnar-Gabor,
Jean-Marc Rickli,
James Scheibner,
Effy Vayena,
Rafael Yuste,
Philipp Kellmeyer
Abstract:
The increasing availability of brain data within and outside the biomedical field, combined with the application of artificial intelligence (AI) to brain data analysis, poses a challenge for ethics and governance. We identify distinctive ethical implications of brain data acquisition and processing, and outline a multi-level governance framework. This framework is aimed at maximizing the benefits…
▽ More
The increasing availability of brain data within and outside the biomedical field, combined with the application of artificial intelligence (AI) to brain data analysis, poses a challenge for ethics and governance. We identify distinctive ethical implications of brain data acquisition and processing, and outline a multi-level governance framework. This framework is aimed at maximizing the benefits of facilitated brain data collection and further processing for science and medicine whilst minimizing risks and preventing harmful use. The framework consists of four primary areas of regulatory intervention: binding regulation, ethics and soft law, responsible innovation, and human rights.
△ Less
Submitted 28 September, 2021; v1 submitted 24 September, 2021;
originally announced September 2021.
-
Constrained Differentially Private Federated Learning for Low-bandwidth Devices
Authors:
Raouf Kerkouche,
Gergely Ács,
Claude Castelluccia,
Pierre Genevès
Abstract:
Federated learning becomes a prominent approach when different entities want to learn collaboratively a common model without sharing their training data. However, Federated learning has two main drawbacks. First, it is quite bandwidth inefficient as it involves a lot of message exchanges between the aggregating server and the participating entities. This bandwidth and corresponding processing cost…
▽ More
Federated learning becomes a prominent approach when different entities want to learn collaboratively a common model without sharing their training data. However, Federated learning has two main drawbacks. First, it is quite bandwidth inefficient as it involves a lot of message exchanges between the aggregating server and the participating entities. This bandwidth and corresponding processing costs could be prohibitive if the participating entities are, for example, mobile devices. Furthermore, although federated learning improves privacy by not sharing data, recent attacks have shown that it still leaks information about the training data. This paper presents a novel privacy-preserving federated learning scheme. The proposed scheme provides theoretical privacy guarantees, as it is based on Differential Privacy. Furthermore, it optimizes the model accuracy by constraining the model learning phase on few selected weights. Finally, as shown experimentally, it reduces the upstream and downstream bandwidth by up to 99.9% compared to standard federated learning, making it practical for mobile systems.
△ Less
Submitted 27 February, 2021;
originally announced March 2021.
-
Compression Boosts Differentially Private Federated Learning
Authors:
Raouf Kerkouche,
Gergely Ács,
Claude Castelluccia,
Pierre Genevès
Abstract:
Federated Learning allows distributed entities to train a common model collaboratively without sharing their own data. Although it prevents data collection and aggregation by exchanging only parameter updates, it remains vulnerable to various inference and reconstruction attacks where a malicious entity can learn private information about the participants' training data from the captured gradients…
▽ More
Federated Learning allows distributed entities to train a common model collaboratively without sharing their own data. Although it prevents data collection and aggregation by exchanging only parameter updates, it remains vulnerable to various inference and reconstruction attacks where a malicious entity can learn private information about the participants' training data from the captured gradients. Differential Privacy is used to obtain theoretically sound privacy guarantees against such inference attacks by noising the exchanged update vectors. However, the added noise is proportional to the model size which can be very large with modern neural networks. This can result in poor model quality. In this paper, compressive sensing is used to reduce the model size and hence increase model quality without sacrificing privacy. We show experimentally, using 2 datasets, that our privacy-preserving proposal can reduce the communication costs by up to 95% with only a negligible performance penalty compared to traditional non-private federated learning schemes.
△ Less
Submitted 10 November, 2020;
originally announced November 2020.
-
Federated Learning in Adversarial Settings
Authors:
Raouf Kerkouche,
Gergely Ács,
Claude Castelluccia
Abstract:
Federated Learning enables entities to collaboratively learn a shared prediction model while kee** their training data locally. It prevents data collection and aggregation and, therefore, mitigates the associated privacy risks. However, it still remains vulnerable to various security attacks where malicious participants aim at degrading the generated model, inserting backdoors, or inferring othe…
▽ More
Federated Learning enables entities to collaboratively learn a shared prediction model while kee** their training data locally. It prevents data collection and aggregation and, therefore, mitigates the associated privacy risks. However, it still remains vulnerable to various security attacks where malicious participants aim at degrading the generated model, inserting backdoors, or inferring other participants' training data. This paper presents a new federated learning scheme that provides different trade-offs between robustness, privacy, bandwidth efficiency, and model accuracy. Our scheme uses biased quantization of model updates and hence is bandwidth efficient. It is also robust against state-of-the-art backdoor as well as model degradation attacks even when a large proportion of the participant nodes are malicious. We propose a practical differentially private extension of this scheme which protects the whole dataset of participating entities. We show that this extension performs as efficiently as the non-private but robust scheme, even with stringent privacy requirements but are less robust against model degradation and backdoor attacks. This suggests a possible fundamental trade-off between Differential Privacy and robustness.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
DESIRE: A Third Way for a European Exposure Notification System Leveraging the best of centralized and decentralized systems
Authors:
Claude Castelluccia,
Nataliia Bielova,
Antoine Boutet,
Mathieu Cunche,
Cédric Lauradoux,
Daniel Le Métayer,
Vincent Roca
Abstract:
This document presents an evolution of the ROBERT protocol that decentralizes most of its operations on the mobile devices. DESIRE is based on the same architecture than ROBERT but implements major privacy improvements. In particular, it introduces the concept of Private Encounter Tokens, that are secret and cryptographically generated, to encode encounters. In the DESIRE protocol, the temporary I…
▽ More
This document presents an evolution of the ROBERT protocol that decentralizes most of its operations on the mobile devices. DESIRE is based on the same architecture than ROBERT but implements major privacy improvements. In particular, it introduces the concept of Private Encounter Tokens, that are secret and cryptographically generated, to encode encounters. In the DESIRE protocol, the temporary Identifiers that are broadcast on the Bluetooth interfaces are generated by the mobile devices providing more control to the users about which ones to disclose. The role of the server is merely to match PETs generated by diagnosed users with the PETs provided by requesting users. It stores minimal pseudonymous data. Finally, all data that are stored on the server are encrypted using keys that are stored on the mobile devices, protecting against data breach on the server. All these modifications improve the privacy of the scheme against malicious users and authority. However, as in the first version of ROBERT, risk scores and notifications are still managed and controlled by the server of the health authority, which provides high robustness, flexibility, and efficacy.
△ Less
Submitted 4 August, 2020;
originally announced August 2020.
-
Techniques d'anonymisation tabulaire : concepts et mise en oeuvre
Authors:
Benjamin Nguyen,
Claude Castelluccia
Abstract:
In this document, we present a state of the art of anonymization techniques for classical tabular datasets. This article is geared towards a general public having some knowledge of mathematics and computer science, but with no need for specific knowledge in anonymization. The objective of this document it to explain anonymization concepts in order to be able to sanitize a dataset and compute reind…
▽ More
In this document, we present a state of the art of anonymization techniques for classical tabular datasets. This article is geared towards a general public having some knowledge of mathematics and computer science, but with no need for specific knowledge in anonymization. The objective of this document it to explain anonymization concepts in order to be able to sanitize a dataset and compute reindentification risk. The document contains a large number of examples to help understand the calculations.
-----
Dans ce document, nous présentons l'état de l'art des techniques d'anonymisation pour des bases de données classiques (i.e. des tables), à destination d'un public technique ayant une formation universitaire de base en mathématiques et informatique, mais non spécialiste. L'objectif de ce document est d'expliquer les concepts permettant de réaliser une anonymisation de données tabulaires, et de calculer les risques de réidentification. Le document est largement composé d'exemples permettant au lecteur de comprendre comment mettre en oeuvre les calculs.
△ Less
Submitted 8 January, 2020;
originally announced January 2020.
-
To Extend or not to Extend: on the Uniqueness of Browser Extensions and Web Logins
Authors:
Gabor Gyorgy Gulyas,
Doliere Francis Some,
Nataliia Bielova,
Claude Castelluccia
Abstract:
Recent works showed that websites can detect browser extensions that users install and websites they are logged into. This poses significant privacy risks, since extensions and Web logins that reflect user's behavior, can be used to uniquely identify users on the Web. This paper reports on the first large-scale behavioral uniqueness study based on 16,393 users who visited our website. We test an…
▽ More
Recent works showed that websites can detect browser extensions that users install and websites they are logged into. This poses significant privacy risks, since extensions and Web logins that reflect user's behavior, can be used to uniquely identify users on the Web. This paper reports on the first large-scale behavioral uniqueness study based on 16,393 users who visited our website. We test and detect the presence of 16,743 Chrome extensions, covering 28% of all free Chrome extensions. We also detect whether the user is connected to 60 different websites.
We analyze how unique users are based on their behavior, and find out that 54.86% of users that have installed at least one detectable extension are unique; 19.53% of users are unique among those who have logged into one or more detectable websites; and 89.23% are unique among users with at least one extension and one login. We use an advanced fingerprinting algorithm and show that it is possible to identify a user in less than 625 milliseconds by selecting the most unique combinations of extensions.
Because privacy extensions contribute to the uniqueness of users, we study the trade-off between the amount of trackers blocked by such extensions and how unique the users of these extensions are. We have found that privacy extensions should be considered more useful than harmful. The paper concludes with possible countermeasures.
△ Less
Submitted 22 August, 2018;
originally announced August 2018.
-
Differentially Private Mixture of Generative Neural Networks
Authors:
Gergely Acs,
Luca Melis,
Claude Castelluccia,
Emiliano De Cristofaro
Abstract:
Generative models are used in a wide range of applications building on large amounts of contextually rich information. Due to possible privacy violations of the individuals whose data is used to train these models, however, publishing or sharing generative models is not always viable. In this paper, we present a novel technique for privately releasing generative models and entire high-dimensional…
▽ More
Generative models are used in a wide range of applications building on large amounts of contextually rich information. Due to possible privacy violations of the individuals whose data is used to train these models, however, publishing or sharing generative models is not always viable. In this paper, we present a novel technique for privately releasing generative models and entire high-dimensional datasets produced by these models. We model the generator distribution of the training data with a mixture of $k$ generative neural networks. These are trained together and collectively learn the generator distribution of a dataset. Data is divided into $k$ clusters, using a novel differentially private kernel $k$-means, then each cluster is given to separate generative neural networks, such as Restricted Boltzmann Machines or Variational Autoencoders, which are trained only on their own cluster using differentially private gradient descent. We evaluate our approach using the MNIST dataset, as well as call detail records and transit datasets, showing that it produces realistic synthetic samples, which can also be used to accurately compute arbitrary number of counting queries.
△ Less
Submitted 13 July, 2018; v1 submitted 13 September, 2017;
originally announced September 2017.
-
Near-Optimal Fingerprinting with Constraints
Authors:
Gabor Gyorgy Gulyas,
Gergely Acs,
Claude Castelluccia
Abstract:
Several recent studies have demonstrated that people show large behavioural uniqueness. This has serious privacy implications as most individuals become increasingly re-identifiable in large datasets or can be tracked while they are browsing the web using only a couple of their attributes, called as their fingerprints. Often, the success of these attacks depend on explicit constraints on the numbe…
▽ More
Several recent studies have demonstrated that people show large behavioural uniqueness. This has serious privacy implications as most individuals become increasingly re-identifiable in large datasets or can be tracked while they are browsing the web using only a couple of their attributes, called as their fingerprints. Often, the success of these attacks depend on explicit constraints on the number of attributes learnable about individuals, i.e., the size of their fingerprints. These constraints can be budget as well as technical constraints imposed by the data holder. For instance, Apple restricts the number of applications that can be called by another application on iOS in order to mitigate the potential privacy threats of leaking the list of installed applications on a device. In this work, we address the problem of identifying the attributes (e.g., smartphone applications) that can serve as a fingerprint of users given constraints on the size of the fingerprint. We give the best fingerprinting algorithms in general, and evaluate their effectiveness on several real-world datasets. Our results show that current privacy guards limiting the number of attributes that can be queried about individuals is insufficient to mitigate their potential privacy risks in many practical cases.
△ Less
Submitted 3 June, 2016; v1 submitted 27 May, 2016;
originally announced May 2016.
-
MobileAppScrutinator: A Simple yet Efficient Dynamic Analysis Approach for Detecting Privacy Leaks across Mobile OSs
Authors:
Jagdish Prasad Achara,
Vincent Roca,
Claude Castelluccia,
Aurelien Francillon
Abstract:
Smartphones, the devices we carry everywhere with us, are being heavily tracked and have undoubtedly become a major threat to our privacy. As "tracking the trackers" has become a necessity, various static and dynamic analysis tools have been developed in the past. However, today, we still lack suitable tools to detect, measure and compare the ongoing tracking across mobile OSs. To this end, we pro…
▽ More
Smartphones, the devices we carry everywhere with us, are being heavily tracked and have undoubtedly become a major threat to our privacy. As "tracking the trackers" has become a necessity, various static and dynamic analysis tools have been developed in the past. However, today, we still lack suitable tools to detect, measure and compare the ongoing tracking across mobile OSs. To this end, we propose MobileAppScrutinator, based on a simple yet efficient dynamic analysis approach, that works on both Android and iOS (the two most popular OSs today). To demonstrate the current trend in tracking, we select 140 most representative Apps available on both Android and iOS AppStores and test them with MobileAppScrutinator. In fact, choosing the same set of apps on both Android and iOS also enables us to compare the ongoing tracking on these two OSs. Finally, we also discuss the effectiveness of privacy safeguards available on Android and iOS. We show that neither Android nor iOS privacy safeguards in their present state are completely satisfying.
△ Less
Submitted 10 June, 2016; v1 submitted 26 May, 2016;
originally announced May 2016.
-
MyTrackingChoices: Pacifying the Ad-Block War by Enforcing User Privacy Preferences
Authors:
Jagdish Prasad Achara,
Javier Parra-Arnau,
Claude Castelluccia
Abstract:
Free content and services on the Web are often supported by ads. However, with the proliferation of intrusive and privacy-invasive ads, a significant proportion of users have started to use ad blockers. As existing ad blockers are radical (they block all ads) and are not designed taking into account their economic impact, ad-based economic model of the Web is in danger today. In this paper, we tar…
▽ More
Free content and services on the Web are often supported by ads. However, with the proliferation of intrusive and privacy-invasive ads, a significant proportion of users have started to use ad blockers. As existing ad blockers are radical (they block all ads) and are not designed taking into account their economic impact, ad-based economic model of the Web is in danger today. In this paper, we target privacy-sensitive users and provide them with fine-grained control over tracking. Our working assumption is that some categories of web pages (for example, related to health, religion, etc.) are more privacy-sensitive to users than others (education, science, etc.). Therefore, our proposed approach consists in providing users with an option to specify the categories of web pages that are privacy-sensitive to them and block trackers present on such web pages only. As tracking is prevented by blocking network connections of third-party domains, we avoid not only tracking but also third-party ads. Since users will continue receiving ads on web pages belonging to non-sensitive categories, our approach essentially provides a trade-off between privacy and economy. To test the viability of our solution, we implemented it as a Google Chrome extension, named MyTrackingChoices (available on Chrome Web Store). Our real-world experiments with MyTrackingChoices show that the economic impact of ad blocking exerted by privacy-sensitive users can be significantly reduced.
△ Less
Submitted 15 April, 2016;
originally announced April 2016.
-
MyAdChoices: Bringing Transparency and Control to Online Advertising
Authors:
Javier Parra-Arnau,
Jagdish Prasad Achara,
Claude Castelluccia
Abstract:
The intrusiveness and the increasing invasiveness of online advertising have, in the last few years, raised serious concerns regarding user privacy and Web usability. As a reaction to these concerns, we have witnessed the emergence of a myriad of ad-blocking and anti-tracking tools, whose aim is to return control to users over advertising. The problem with these technologies, however, is that they…
▽ More
The intrusiveness and the increasing invasiveness of online advertising have, in the last few years, raised serious concerns regarding user privacy and Web usability. As a reaction to these concerns, we have witnessed the emergence of a myriad of ad-blocking and anti-tracking tools, whose aim is to return control to users over advertising. The problem with these technologies, however, is that they are extremely limited and radical in their approach: users can only choose either to block or allow all ads. With around 200 million people regularly using these tools, the economic model of the Web ---in which users get content free in return for allowing advertisers to show them ads--- is at serious peril. In this paper, we propose a smart Web technology that aims at bringing transparency to online advertising, so that users can make an informed and equitable decision regarding ad blocking. The proposed technology is implemented as a Web-browser extension and enables users to exert fine-grained control over advertising, thus providing them with certain guarantees in terms of privacy and browsing experience, while preserving the Internet economic model. Experimental results in a real environment demonstrate the suitability and feasibility of our approach, and provide preliminary findings on behavioral targeting from real user browsing profiles.
△ Less
Submitted 5 February, 2016;
originally announced February 2016.
-
On the Unicity of Smartphone Applications
Authors:
Jagdish Prasad Achara,
Gergely Acs,
Claude Castelluccia
Abstract:
Prior works have shown that the list of apps installed by a user reveal a lot about user interests and behavior. These works rely on the semantics of the installed apps and show that various user traits could be learnt automatically using off-the-shelf machine-learning techniques. In this work, we focus on the re-identifiability issue and thoroughly study the unicity of smartphone apps on a datase…
▽ More
Prior works have shown that the list of apps installed by a user reveal a lot about user interests and behavior. These works rely on the semantics of the installed apps and show that various user traits could be learnt automatically using off-the-shelf machine-learning techniques. In this work, we focus on the re-identifiability issue and thoroughly study the unicity of smartphone apps on a dataset containing 54,893 Android users collected over a period of 7 months. Our study finds that any 4 apps installed by a user are enough (more than 95% times) for the re-identification of the user in our dataset. As the complete list of installed apps is unique for 99% of the users in our dataset, it can be easily used to track/profile the users by a service such as Twitter that has access to the whole list of installed apps of users. As our analyzed dataset is small as compared to the total population of Android users, we also study how unicity would vary with larger datasets. This work emphasizes the need of better privacy guards against collection, use and release of the list of installed apps.
△ Less
Submitted 29 October, 2015; v1 submitted 28 July, 2015;
originally announced July 2015.
-
Retargeting Without Tracking
Authors:
Minh-Dung Tran,
Gergely Acs,
Claude Castelluccia
Abstract:
Retargeting ads are increasingly prevalent on the Internet as their effectiveness has been shown to outperform conventional targeted ads. Retargeting ads are not only based on users' interests, but also on their intents, i.e. commercial products users have shown interest in. Existing retargeting systems heavily rely on tracking, as retargeting companies need to know not only the websites a user ha…
▽ More
Retargeting ads are increasingly prevalent on the Internet as their effectiveness has been shown to outperform conventional targeted ads. Retargeting ads are not only based on users' interests, but also on their intents, i.e. commercial products users have shown interest in. Existing retargeting systems heavily rely on tracking, as retargeting companies need to know not only the websites a user has visited but also the exact products on these sites. They are therefore very intrusive, and privacy threatening. Furthermore, these schemes are still sub-optimal since tracking is partial, and they often deliver ads that are obsolete (because, for example, the targeted user has already bought the advertised product).
This paper presents the first privacy-preserving retargeting ads system. In the proposed scheme, the retargeting algorithm is distributed between the user and the advertiser such that no systematic tracking is necessary, more control and transparency is provided to users, but still a lot of targeting flexibility is provided to advertisers. We show that our scheme, that relies on homomorphic encryption, can be efficiently implemented and trivially solves many problems of existing schemes, such as frequency cap** and ads freshness.
△ Less
Submitted 17 April, 2014;
originally announced April 2014.
-
When Privacy meets Security: Leveraging personal information for password cracking
Authors:
Claude Castelluccia,
Abdelberi Chaabane,
Markus Dürmuth,
Daniele Perito
Abstract:
Passwords are widely used for user authentication and, despite their weaknesses, will likely remain in use in the foreseeable future. Human-generated passwords typically have a rich structure, which makes them susceptible to guessing attacks. In this paper, we study the effectiveness of guessing attacks based on Markov models. Our contributions are two-fold. First, we propose a novel password crac…
▽ More
Passwords are widely used for user authentication and, despite their weaknesses, will likely remain in use in the foreseeable future. Human-generated passwords typically have a rich structure, which makes them susceptible to guessing attacks. In this paper, we study the effectiveness of guessing attacks based on Markov models. Our contributions are two-fold. First, we propose a novel password cracker based on Markov models, which builds upon and extends ideas used by Narayanan and Shmatikov (CCS 2005). In extensive experiments we show that it can crack up to 69% of passwords at 10 billion guesses, more than all probabilistic password crackers we compared again t. Second, we systematically analyze the idea that additional personal information about a user helps in speeding up password guessing. We find that, on average and by carefully choosing parameters, we can guess up to 5% more passwords, especially when the number of attempts is low. Furthermore, we show that the gain can go up to 30% for passwords that are actually based on personal attributes. These passwords are clearly weaker and should be avoided. Our cracker could be used by an organization to detect and reject them. To the best of our knowledge, we are the first to systematically study the relationship between chosen passwords and users' personal information. We test and validate our results over a wide collection of leaked password databases.
△ Less
Submitted 24 April, 2013;
originally announced April 2013.
-
DREAM: DiffeRentially privatE smArt Metering
Authors:
Gergely Acs,
Claude Castelluccia
Abstract:
This paper presents a new privacy-preserving smart metering system. Our scheme is private under the differential privacy model and therefore provides strong and provable guarantees. With our scheme, an (electricity) supplier can periodically collect data from smart meters and derive aggregated statistics while learning only limited information about the activities of individual households. For exa…
▽ More
This paper presents a new privacy-preserving smart metering system. Our scheme is private under the differential privacy model and therefore provides strong and provable guarantees. With our scheme, an (electricity) supplier can periodically collect data from smart meters and derive aggregated statistics while learning only limited information about the activities of individual households. For example, a supplier cannot tell from a user's trace when he watched TV or turned on heating. Our scheme is simple, efficient and practical. Processing cost is very limited: smart meters only have to add noise to their data and encrypt the results with an efficient stream cipher.
△ Less
Submitted 12 January, 2012;
originally announced January 2012.
-
One Bad Apple Spoils the Bunch: Exploiting P2P Applications to Trace and Profile Tor Users
Authors:
Stevens Le Blond,
Pere Manils,
Chaabane Abdelberi,
Mohamed Ali Dali Kaafar,
Claude Castelluccia,
Arnaud Legout,
Walid Dabbous
Abstract:
Tor is a popular low-latency anonymity network. However, Tor does not protect against the exploitation of an insecure application to reveal the IP address of, or trace, a TCP stream. In addition, because of the linkability of Tor streams sent together over a single circuit, tracing one stream sent over a circuit traces them all. Surprisingly, it is unknown whether this linkability allows in practi…
▽ More
Tor is a popular low-latency anonymity network. However, Tor does not protect against the exploitation of an insecure application to reveal the IP address of, or trace, a TCP stream. In addition, because of the linkability of Tor streams sent together over a single circuit, tracing one stream sent over a circuit traces them all. Surprisingly, it is unknown whether this linkability allows in practice to trace a significant number of streams originating from secure (i.e., proxied) applications. In this paper, we show that linkability allows us to trace 193% of additional streams, including 27% of HTTP streams possibly originating from "secure" browsers. In particular, we traced 9% of Tor streams carried by our instrumented exit nodes. Using BitTorrent as the insecure application, we design two attacks tracing BitTorrent users on Tor. We run these attacks in the wild for 23 days and reveal 10,000 IP addresses of Tor users. Using these IP addresses, we then profile not only the BitTorrent downloads but also the websites visited per country of origin of Tor users. We show that BitTorrent users on Tor are over-represented in some countries as compared to BitTorrent users outside of Tor. By analyzing the type of content downloaded, we then explain the observed behaviors by the higher concentration of pornographic content downloaded at the scale of a country. Finally, we present results suggesting the existence of an underground BitTorrent ecosystem on Tor.
△ Less
Submitted 8 March, 2011;
originally announced March 2011.
-
How Unique and Traceable are Usernames?
Authors:
Daniele Perito,
Claude Castelluccia,
Mohamed Ali Kaafar,
Pere Manils
Abstract:
Suppose you find the same username on different online services, what is the probability that these usernames refer to the same physical person? This work addresses what appears to be a fairly simple question, which has many implications for anonymity and privacy on the Internet. One possible way of estimating this probability would be to look at the public information associated to the two accoun…
▽ More
Suppose you find the same username on different online services, what is the probability that these usernames refer to the same physical person? This work addresses what appears to be a fairly simple question, which has many implications for anonymity and privacy on the Internet. One possible way of estimating this probability would be to look at the public information associated to the two accounts and try to match them. However, for most services, these information are chosen by the users themselves and are often very heterogeneous, possibly false and difficult to collect. Furthermore, several websites do not disclose any additional public information about users apart from their usernames (e.g., discus- sion forums or Blog comments), nonetheless, they might contain sensitive information about users. This paper explores the possibility of linking users profiles only by looking at their usernames. The intuition is that the probability that two usernames refer to the same physical person strongly depends on the "entropy" of the username string itself. Our experiments, based on crawls of real web services, show that a significant portion of the users' profiles can be linked using their usernames. To the best of our knowledge, this is the first time that usernames are considered as a source of information when profiling users on the Internet.
△ Less
Submitted 8 March, 2011; v1 submitted 28 January, 2011;
originally announced January 2011.
-
Compromising Tor Anonymity Exploiting P2P Information Leakage
Authors:
Pere Manils,
Chaabane Abdelberri,
Stevens Le Blond,
Mohamed Ali Kaafar,
Claude Castelluccia,
Arnaud Legout,
Walid Dabbous
Abstract:
Privacy of users in P2P networks goes far beyond their current usage and is a fundamental requirement to the adoption of P2P protocols for legal usage. In a climate of cold war between these users and anti-piracy groups, more and more users are moving to anonymizing networks in an attempt to hide their identity. However, when not designed to protect users information, a P2P protocol would leak inf…
▽ More
Privacy of users in P2P networks goes far beyond their current usage and is a fundamental requirement to the adoption of P2P protocols for legal usage. In a climate of cold war between these users and anti-piracy groups, more and more users are moving to anonymizing networks in an attempt to hide their identity. However, when not designed to protect users information, a P2P protocol would leak information that may compromise the identity of its users. In this paper, we first present three attacks targeting BitTorrent users on top of Tor that reveal their real IP addresses. In a second step, we analyze the Tor usage by BitTorrent users and compare it to its usage outside of Tor. Finally, we depict the risks induced by this de-anonymization and show that users' privacy violation goes beyond BitTorrent traffic and contaminates other protocols such as HTTP.
△ Less
Submitted 9 April, 2010;
originally announced April 2010.
-
EphPub: Toward Robust Ephemeral Publishing
Authors:
Claude Castelluccia,
Emiliano De Cristofaro,
Aurelien Francillon,
Mohamed-Ali Kaafar
Abstract:
The increasing amount of personal and sensitive information disseminated over the Internet prompts commensurately growing privacy concerns. Digital data often lingers indefinitely and users lose its control. This motivates the desire to restrict content availability to an expiration time set by the data owner. This paper presents and formalizes the notion of Ephemeral Publishing (EphPub), to preve…
▽ More
The increasing amount of personal and sensitive information disseminated over the Internet prompts commensurately growing privacy concerns. Digital data often lingers indefinitely and users lose its control. This motivates the desire to restrict content availability to an expiration time set by the data owner. This paper presents and formalizes the notion of Ephemeral Publishing (EphPub), to prevent the access to expired content. We propose an efficient and robust protocol that builds on the Domain Name System (DNS) and its caching mechanism. With EphPub, sensitive content is published encrypted and the key material is distributed, in a steganographic manner, to randomly selected and independent resolvers. The availability of content is then limited by the evanescence of DNS cache entries. The EphPub protocol is transparent to existing applications, and does not rely on trusted hardware, centralized servers, or user proactive actions. We analyze its robustness and show that it incurs a negligible overhead on the DNS infrastructure. We also perform a large-scale study of the caching behavior of 900K open DNS resolvers. Finally, we propose Firefox and Thunderbird extensions that provide ephemeral publishing capabilities, as well as a command-line tool to create ephemeral files.
△ Less
Submitted 18 October, 2011; v1 submitted 29 March, 2010;
originally announced March 2010.
-
Private Information Disclosure from Web Searches. (The case of Google Web History)
Authors:
Claude Castelluccia,
Emiliano De Cristofaro,
Daniele Perito
Abstract:
As the amount of personal information stored at remote service providers increases, so does the danger of data theft. When connections to remote services are made in the clear and authenticated sessions are kept using HTTP cookies, data theft becomes extremely easy to achieve. In this paper, we study the architecture of the world's largest service provider, i.e., Google. First, with the exception…
▽ More
As the amount of personal information stored at remote service providers increases, so does the danger of data theft. When connections to remote services are made in the clear and authenticated sessions are kept using HTTP cookies, data theft becomes extremely easy to achieve. In this paper, we study the architecture of the world's largest service provider, i.e., Google. First, with the exception of a few services that can only be accessed over HTTPS (e.g., Gmail), we find that many Google services are still vulnerable to simple session hijacking. Next, we present the Historiographer, a novel attack that reconstructs the web search history of Google users, i.e., Google's Web History, even though such a service is supposedly protected from session hijacking by a stricter access control policy. The Historiographer uses a reconstruction technique inferring search history from the personalized suggestions fed by the Google search engine. We validate our technique through experiments conducted over real network traffic and discuss possible countermeasures. Our attacks are general and not only specific to Google, and highlight privacy concerns of mixed architectures using both secure and insecure connections.
△ Less
Submitted 23 March, 2010; v1 submitted 16 March, 2010;
originally announced March 2010.
-
Code injection attacks on harvard-architecture devices
Authors:
Aurélien Francillon,
Claude Castelluccia
Abstract:
Harvard architecture CPU design is common in the embedded world. Examples of Harvard-based architecture devices are the Mica family of wireless sensors. Mica motes have limited memory and can process only very small packets. Stack-based buffer overflow techniques that inject code into the stack and then execute it are therefore not applicable. It has been a common belief that code injection is i…
▽ More
Harvard architecture CPU design is common in the embedded world. Examples of Harvard-based architecture devices are the Mica family of wireless sensors. Mica motes have limited memory and can process only very small packets. Stack-based buffer overflow techniques that inject code into the stack and then execute it are therefore not applicable. It has been a common belief that code injection is impossible on Harvard architectures. This paper presents a remote code injection attack for Mica sensors. We show how to exploit program vulnerabilities to permanently inject any piece of code into the program memory of an Atmel AVR-based sensor. To our knowledge, this is the first result that presents a code injection technique for such devices. Previous work only succeeded in injecting data or performing transient attacks. Injecting permanent code is more powerful since the attacker can gain full control of the target sensor. We also show that this attack can be used to inject a worm that can propagate through the wireless sensor network and possibly create a sensor botnet. Our attack combines different techniques such as return oriented programming and fake stack injection. We present implementation details and suggest some counter-measures.
△ Less
Submitted 22 January, 2009;
originally announced January 2009.