-
Designing for Passengers' Information Needs on Fellow Travelers: A Comparison of Day and Night Rides in Shared Automated Vehicles
Authors:
Lukas A. Flohr,
Martina Schuß,
Dieter P. Wallach,
Antonio Krüger,
Andreas Riener
Abstract:
Shared automated mobility-on-demand promises efficient, sustainable, and flexible transportation. Nevertheless, security concerns, resilience, and their mutual influence - especially at night - will likely be the most critical barriers to public adoption since passengers have to share rides with strangers without a human driver on board. As related work points out that information about fellow tra…
▽ More
Shared automated mobility-on-demand promises efficient, sustainable, and flexible transportation. Nevertheless, security concerns, resilience, and their mutual influence - especially at night - will likely be the most critical barriers to public adoption since passengers have to share rides with strangers without a human driver on board. As related work points out that information about fellow travelers might mitigate passengers' concerns, we designed two user interface variants to investigate the role of this information in an exploratory within-subjects user study (N = 24). Participants experienced four automated day and night rides with varying personal information about co-passengers in a simulated environment. The results of the mixed-method study indicate that having information about other passengers (e.g., photo, gender, and name) positively affects user experience at night. In contrast, it is less necessary during the day. Considering participants' simultaneously raised privacy demands poses a substantial challenge for resilient system design.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
The Design and Implementation of a Verified File System with End-to-End Data Integrity
Authors:
Daniel W. Song,
Konstantinos Mamouras,
Ang Chen,
Nathan Dautenhahn,
Dan S. Wallach
Abstract:
Despite significant research and engineering efforts, many of today's important computer systems suffer from bugs. To increase the reliability of software systems, recent work has applied formal verification to certify the correctness of such systems, with recent successes including certified file systems and certified cryptographic protocols, albeit using quite different proof tactics and toolcha…
▽ More
Despite significant research and engineering efforts, many of today's important computer systems suffer from bugs. To increase the reliability of software systems, recent work has applied formal verification to certify the correctness of such systems, with recent successes including certified file systems and certified cryptographic protocols, albeit using quite different proof tactics and toolchains. Unifying these concepts, we present the first certified file system that uses cryptographic primitives to protect itself against tampering. Our certified file system defends against adversaries that might wish to tamper with the raw disk. Such an "untrusted storage" threat model captures the behavior of storage devices that might silently return erroneous bits as well as adversaries who might have limited access to a disk, perhaps while in transit. In this paper, we present IFSCQ, a certified cryptographic file system with strong integrity guarantees. IFSCQ combines and extends work on cryptographic file systems and formally certified file systems to prove that our design is correct. It is the first certified file system that is secure against strong adversaries that can maliciously corrupt on-disk data and metadata, including attempting to roll back the disk to earlier versions of valid data. IFSCQ achieves this by constructing a Merkle hash tree of the whole disk, and by proving that tampered disk blocks will always be detected if they ever occur. We demonstrate that IFSCQ runs with reasonable overhead while detecting several kinds of attacks.
△ Less
Submitted 14 December, 2020;
originally announced December 2020.
-
Fairness and Decision-making in Collaborative Shift Scheduling Systems
Authors:
Alarith Uhde,
Nadine Schlicker,
Dieter P. Wallach,
Marc Hassenzahl
Abstract:
The strains associated with shift work decrease healthcare workers' well-being. However, shift schedules adapted to their individual needs can partially mitigate these problems. From a computing perspective, shift scheduling was so far mainly treated as an optimization problem with little attention given to the preferences, thoughts, and feelings of the healthcare workers involved. In the present…
▽ More
The strains associated with shift work decrease healthcare workers' well-being. However, shift schedules adapted to their individual needs can partially mitigate these problems. From a computing perspective, shift scheduling was so far mainly treated as an optimization problem with little attention given to the preferences, thoughts, and feelings of the healthcare workers involved. In the present study, we explore fairness as a central, human-oriented attribute of shift schedules as well as the scheduling process. Three in-depth qualitative interviews and a validating vignette study revealed that while on an abstract level healthcare workers agree on equality as the guiding norm for a fair schedule, specific scheduling conflicts should foremost be resolved by negotiating the importance of individual needs. We discuss elements of organizational fairness, including transparency and team spirit. Finally, we present a sketch for fair scheduling systems, summarizing key findings for designers in a readily usable way.
△ Less
Submitted 6 February, 2020; v1 submitted 16 January, 2020;
originally announced January 2020.
-
Investigating the effectiveness of web adblockers
Authors:
Clayton Drazner,
Nikola Đuza,
Hugo Jonker,
Dan S. Wallach
Abstract:
We investigate adblocking filters and the extent to which websites and advertisers react when their content is impacted by these filters. We collected data daily from the Alexa Top-5000 web sites for 120 days, and from specific sites that newly appeared in filter lists for 140 days. By evaluating how long a filter rule triggers on a website, we can gauge how long it remains effective. We matched w…
▽ More
We investigate adblocking filters and the extent to which websites and advertisers react when their content is impacted by these filters. We collected data daily from the Alexa Top-5000 web sites for 120 days, and from specific sites that newly appeared in filter lists for 140 days. By evaluating how long a filter rule triggers on a website, we can gauge how long it remains effective. We matched websites with both a regular adblocking filter list (EasyList) and with a specialized filter list that targets anti-adblocking logic (Nano Defender).
From our data, we observe that the effectiveness of the EasyList adblocking filter decays a modest 0.13\% per day, and after around 80 days seems to stabilize. We found no evidence for any significant decay in effectiveness of the more specialized, but less widely used, anti-adblocking removal filters.
△ Less
Submitted 12 December, 2019;
originally announced December 2019.
-
On the security of ballot marking devices
Authors:
Dan S. Wallach
Abstract:
A recent debate among election experts has considered whether electronic ballot marking devices (BMDs) have adequate security against the risks of malware. A malicious BMD might produce a printed ballot that disagrees with a voter's actual intent, with the hope that voters would be unlikely to detect this subterfuge. This essay considers how an election administrator can create reasonable auditing…
▽ More
A recent debate among election experts has considered whether electronic ballot marking devices (BMDs) have adequate security against the risks of malware. A malicious BMD might produce a printed ballot that disagrees with a voter's actual intent, with the hope that voters would be unlikely to detect this subterfuge. This essay considers how an election administrator can create reasonable auditing procedures to gain confidence that their fleet of BMDs is operating correctly, allowing voters to benefit from the usability and accessibility features of BMDs while the overall election still benefits from the same security and reliability properties we expect from hand-marked paper ballots.
△ Less
Submitted 12 December, 2019; v1 submitted 5 August, 2019;
originally announced August 2019.
-
An Historical Analysis of the SEAndroid Policy Evolution
Authors:
Bum** Im,
Ang Chen,
Dan Wallach
Abstract:
Android adopted SELinux's mandatory access control (MAC) mechanisms in 2013. Since then, billions of Android devices have benefited from mandatory access control security policies. These policies are expressed in a variety of rules, maintained by Google and extended by Android OEMs. Over the years, the rules have grown to be quite complex, making it challenging to properly understand or configure…
▽ More
Android adopted SELinux's mandatory access control (MAC) mechanisms in 2013. Since then, billions of Android devices have benefited from mandatory access control security policies. These policies are expressed in a variety of rules, maintained by Google and extended by Android OEMs. Over the years, the rules have grown to be quite complex, making it challenging to properly understand or configure these policies.
In this paper, we perform a measurement study on the SEAndroid repository to understand the evolution of these policies. We propose a new metric to measure the complexity of the policy by expanding policy rules, with their abstraction features such as macros and groups, into primitive "boxes", which we then use to show that the complexity of the SEAndroid policies has been growing exponentially over time. By analyzing the Git commits, snapshot by snapshot, we are also able to analyze the "age" of policy rules, the trend of changes, and the contributor composition. We also look at hallmark events in Android's history, such as the "Stagefright" vulnerability in Android's media facilities, pointing out how these events led to changes in the MAC policies. The growing complexity of Android's mandatory policies suggests that we will eventually hit the limits of our ability to understand these policies, requiring new tools and techniques.
△ Less
Submitted 3 December, 2018;
originally announced December 2018.
-
Public Evidence from Secret Ballots
Authors:
Matthew Bernhard,
Josh Benaloh,
J. Alex Halderman,
Ronald L. Rivest,
Peter Y. A. Ryan,
Philip B. Stark,
Vanessa Teague,
Poorvi L. Vora,
Dan S. Wallach
Abstract:
Elections seem simple---aren't they just counting? But they have a unique, challenging combination of security and privacy requirements. The stakes are high; the context is adversarial; the electorate needs to be convinced that the results are correct; and the secrecy of the ballot must be ensured. And they have practical constraints: time is of the essence, and voting systems need to be affordabl…
▽ More
Elections seem simple---aren't they just counting? But they have a unique, challenging combination of security and privacy requirements. The stakes are high; the context is adversarial; the electorate needs to be convinced that the results are correct; and the secrecy of the ballot must be ensured. And they have practical constraints: time is of the essence, and voting systems need to be affordable and maintainable, and usable by voters, election officials, and pollworkers. It is thus not surprising that voting is a rich research area spanning theory, applied cryptography, practical systems analysis, usable security, and statistics. Election integrity involves two key concepts: convincing evidence that outcomes are correct and privacy, which amounts to convincing assurance that there is no evidence about how any given person voted. These are obviously in tension. We examine how current systems walk this tightrope.
△ Less
Submitted 4 August, 2017; v1 submitted 26 July, 2017;
originally announced July 2017.
-
Verification of STAR-Vote and Evaluation of FDR and ProVerif
Authors:
Murat Moran,
Dan S. Wallach
Abstract:
We present the first automated privacy analysis of STAR-Vote, a real world voting system design with sophisticated "end-to-end" cryptography, using FDR and ProVerif. We also evaluate the effectiveness of these tools. Despite the complexity of the voting system, we were able to verify that our abstracted formal model of STAR-Vote provides ballot-secrecy using both formal approaches. Notably, ProVer…
▽ More
We present the first automated privacy analysis of STAR-Vote, a real world voting system design with sophisticated "end-to-end" cryptography, using FDR and ProVerif. We also evaluate the effectiveness of these tools. Despite the complexity of the voting system, we were able to verify that our abstracted formal model of STAR-Vote provides ballot-secrecy using both formal approaches. Notably, ProVerif is radically faster than FDR, making it more suitable for rapid iteration and refinement of the formal model.
△ Less
Submitted 1 May, 2017;
originally announced May 2017.
-
Sub-Linear Privacy-Preserving Near-Neighbor Search
Authors:
M. Sadegh Riazi,
Beidi Chen,
Anshumali Shrivastava,
Dan Wallach,
Farinaz Koushanfar
Abstract:
In Near-Neighbor Search (NNS), a new client queries a database (held by a server) for the most similar data (near-neighbors) given a certain similarity metric. The Privacy-Preserving variant (PP-NNS) requires that neither server nor the client shall learn information about the other party's data except what can be inferred from the outcome of NNS. The overwhelming growth in the size of current dat…
▽ More
In Near-Neighbor Search (NNS), a new client queries a database (held by a server) for the most similar data (near-neighbors) given a certain similarity metric. The Privacy-Preserving variant (PP-NNS) requires that neither server nor the client shall learn information about the other party's data except what can be inferred from the outcome of NNS. The overwhelming growth in the size of current datasets and the lack of a truly secure server in the online world render the existing solutions impractical; either due to their high computational requirements or non-realistic assumptions which potentially compromise privacy. PP-NNS having query time {\it sub-linear} in the size of the database has been suggested as an open research direction by Li et al. (CCSW'15). In this paper, we provide the first such algorithm, called Secure Locality Sensitive Indexing (SLSI) which has a sub-linear query time and the ability to handle honest-but-curious parties. At the heart of our proposal lies a secure binary embedding scheme generated from a novel probabilistic transformation over locality sensitive hashing family. We provide information theoretic bound for the privacy guarantees and support our theoretical claims using substantial empirical evidence on real-world datasets.
△ Less
Submitted 17 October, 2019; v1 submitted 6 December, 2016;
originally announced December 2016.
-
Finding Tizen security bugs through whole-system static analysis
Authors:
Daniel Song,
Jisheng Zhao,
Michael Burke,
Dragoş Sbîrlea,
Dan Wallach,
Vivek Sarkar
Abstract:
Tizen is a new Linux-based open source platform for consumer devices including smartphones, televisions, vehicles, and wearables. While Tizen provides kernel-level mandatory policy enforcement, it has a large collection of libraries, implemented in a mix of C and C++, which make their own security checks. In this research, we describe the design and engineering of a static analysis engine which dr…
▽ More
Tizen is a new Linux-based open source platform for consumer devices including smartphones, televisions, vehicles, and wearables. While Tizen provides kernel-level mandatory policy enforcement, it has a large collection of libraries, implemented in a mix of C and C++, which make their own security checks. In this research, we describe the design and engineering of a static analysis engine which drives a full information flow analysis for apps and a control flow analysis for the full library stack. We implemented these static analyses as extensions to LLVM, requiring us to improve LLVM's native analysis features to get greater precision and scalability, including knotty issues like the coexistence of C++ inheritance with C function pointer use. With our tools, we found several unexpected behaviors in the Tizen system, including paths through the system libraries that did not have inline security checks. We show how our tools can help the Tizen app store to verify important app properties as well as hel** the Tizen development process avoid the accidental introduction of subtle vulnerabilities.
△ Less
Submitted 22 April, 2015;
originally announced April 2015.
-
An Empirical Study of Mobile Ad Targeting
Authors:
Theodore Book,
Dan S. Wallach
Abstract:
Advertising, long the financial mainstay of the web ecosystem, has become nearly ubiquitous in the world of mobile apps. While ad targeting on the web is fairly well understood, mobile ad targeting is much less studied. In this paper, we use empirical methods to collect a database of over 225,000 ads on 32 simulated devices hosting one of three distinct user profiles. We then analyze how the ads a…
▽ More
Advertising, long the financial mainstay of the web ecosystem, has become nearly ubiquitous in the world of mobile apps. While ad targeting on the web is fairly well understood, mobile ad targeting is much less studied. In this paper, we use empirical methods to collect a database of over 225,000 ads on 32 simulated devices hosting one of three distinct user profiles. We then analyze how the ads are targeted by correlating ads to potential targeting profiles using Bayes' rule and Pearson's chi squared test. This enables us to measure the prevalence of different forms of targeting. We find that nearly all ads show the effects of application- and time-based targeting, while we are able to identify location-based targeting in 43% of the ads and user-based targeting in 39%.
△ Less
Submitted 23 February, 2015;
originally announced February 2015.
-
Glider: A GPU Library Driver for Improved System Security
Authors:
Ardalan Amiri Sani,
Lin Zhong,
Dan S. Wallach
Abstract:
Legacy device drivers implement both device resource management and isolation. This results in a large code base with a wide high-level interface making the driver vulnerable to security attacks. This is particularly problematic for increasingly popular accelerators like GPUs that have large, complex drivers. We solve this problem with library drivers, a new driver architecture. A library driver i…
▽ More
Legacy device drivers implement both device resource management and isolation. This results in a large code base with a wide high-level interface making the driver vulnerable to security attacks. This is particularly problematic for increasingly popular accelerators like GPUs that have large, complex drivers. We solve this problem with library drivers, a new driver architecture. A library driver implements resource management as an untrusted library in the application process address space, and implements isolation as a kernel module that is smaller and has a narrower lower-level interface (i.e., closer to hardware) than a legacy driver. We articulate a set of device and platform hardware properties that are required to retrofit a legacy driver into a library driver. To demonstrate the feasibility and superiority of library drivers, we present Glider, a library driver implementation for two GPUs of popular brands, Radeon and Intel. Glider reduces the TCB size and attack surface by about 35% and 84% respectively for a Radeon HD 6450 GPU and by about 38% and 90% respectively for an Intel Ivy Bridge GPU. Moreover, it incurs no performance cost. Indeed, Glider outperforms a legacy driver for applications requiring intensive interactions with the device driver, such as applications using the OpenGL immediate mode API.
△ Less
Submitted 13 November, 2014;
originally announced November 2014.
-
The Mason Test: A Defense Against Sybil Attacks in Wireless Networks Without Trusted Authorities
Authors:
Yue Liu,
David R. Bild,
Robert P. Dick,
Z. Morley Mao,
Dan S. Wallach
Abstract:
Wireless networks are vulnerable to Sybil attacks, in which a malicious node poses as many identities in order to gain disproportionate influence. Many defenses based on spatial variability of wireless channels exist, but depend either on detailed, multi-tap channel estimation - something not exposed on commodity 802.11 devices - or valid RSSI observations from multiple trusted sources, e.g., corp…
▽ More
Wireless networks are vulnerable to Sybil attacks, in which a malicious node poses as many identities in order to gain disproportionate influence. Many defenses based on spatial variability of wireless channels exist, but depend either on detailed, multi-tap channel estimation - something not exposed on commodity 802.11 devices - or valid RSSI observations from multiple trusted sources, e.g., corporate access points - something not directly available in ad hoc and delay-tolerant networks with potentially malicious neighbors. We extend these techniques to be practical for wireless ad hoc networks of commodity 802.11 devices. Specifically, we propose two efficient methods for separating the valid RSSI observations of behaving nodes from those falsified by malicious participants. Further, we note that prior signalprint methods are easily defeated by mobile attackers and develop an appropriate challenge-response defense. Finally, we present the Mason test, the first implementation of these techniques for ad hoc and delay-tolerant networks of commodity 802.11 devices. We illustrate its performance in several real-world scenarios.
△ Less
Submitted 24 March, 2014;
originally announced March 2014.
-
Performance Analysis of Location Profile Routing
Authors:
David R. Bild,
Yue Liu,
Robert P. Dick,
Z. Morley Mao,
Dan S. Wallach
Abstract:
We propose using the predictability of human motion to eliminate the overhead of distributed location services in human-carried MANETs, dubbing the technique location profile routing. This method outperforms the Geographic Hashing Location Service when nodes change locations 2x more frequently than they initiate connections (e.g., start new TCP streams), as in applications like text- and instant-m…
▽ More
We propose using the predictability of human motion to eliminate the overhead of distributed location services in human-carried MANETs, dubbing the technique location profile routing. This method outperforms the Geographic Hashing Location Service when nodes change locations 2x more frequently than they initiate connections (e.g., start new TCP streams), as in applications like text- and instant-messaging. Prior characterizations of human mobility are used to show that location profile routing achieves a 93% delivery ratio with a 1.75x first-packet latency increase relative to an oracle location service.
△ Less
Submitted 18 March, 2014;
originally announced March 2014.
-
Aggregate Characterization of User Behavior in Twitter and Analysis of the Retweet Graph
Authors:
David R. Bild,
Yue Liu,
Robert P. Dick,
Z. Morley Mao,
Dan S. Wallach
Abstract:
Most previous analysis of Twitter user behavior is focused on individual information cascades and the social followers graph. We instead study aggregate user behavior and the retweet graph with a focus on quantitative descriptions. We find that the lifetime tweet distribution is a type-II discrete Weibull stemming from a power law hazard function, the tweet rate distribution, although asymptotical…
▽ More
Most previous analysis of Twitter user behavior is focused on individual information cascades and the social followers graph. We instead study aggregate user behavior and the retweet graph with a focus on quantitative descriptions. We find that the lifetime tweet distribution is a type-II discrete Weibull stemming from a power law hazard function, the tweet rate distribution, although asymptotically power law, exhibits a lognormal cutoff over finite sample intervals, and the inter-tweet interval distribution is power law with exponential cutoff. The retweet graph is small-world and scale-free, like the social graph, but is less disassortative and has much stronger clustering. These differences are consistent with it better capturing the real-world social relationships of and trust between users. Beyond just understanding and modeling human communication patterns and social networks, applications for alternative, decentralized microblogging systems-both predicting real-word performance and detecting spam-are discussed.
△ Less
Submitted 11 February, 2014;
originally announced February 2014.
-
A Case of Collusion: A Study of the Interface Between Ad Libraries and their Apps
Authors:
Theodore Book,
Dan S. Wallach
Abstract:
A growing concern with advertisement libraries on Android is their ability to exfiltrate personal information from their host applications. While previous work has looked at the libraries' abilities to measure private information on their own, advertising libraries also include APIs through which a host application can deliberately leak private information about the user. This study considers a co…
▽ More
A growing concern with advertisement libraries on Android is their ability to exfiltrate personal information from their host applications. While previous work has looked at the libraries' abilities to measure private information on their own, advertising libraries also include APIs through which a host application can deliberately leak private information about the user. This study considers a corpus of 114,000 apps. We reconstruct the APIs for 103 ad libraries used in the corpus, and study how the privacy leaking APIs from the top 20 ad libraries are used by the applications. Notably, we have found that app popularity correlates with privacy leakage; the marginal increase in advertising revenue, multiplied over a larger user base, seems to incentivize these app vendors to violate their users' privacy.
△ Less
Submitted 23 July, 2013;
originally announced July 2013.
-
Automated generation of web server fingerprints
Authors:
Theodore Book,
Martha Witick,
Dan S. Wallach
Abstract:
In this paper, we demonstrate that it is possible to automatically generate fingerprints for various web server types using multifactor Bayesian inference on randomly selected servers on the Internet, without building an a priori catalog of server features or behaviors. This makes it possible to conclusively study web server distribution without relying on reported (and variable) version strings.…
▽ More
In this paper, we demonstrate that it is possible to automatically generate fingerprints for various web server types using multifactor Bayesian inference on randomly selected servers on the Internet, without building an a priori catalog of server features or behaviors. This makes it possible to conclusively study web server distribution without relying on reported (and variable) version strings. We gather data by sending a collection of specialized requests to 110,000 live web servers. Using only the server response codes, we then train an algorithm to successfully predict server types independently of the server version string. In the process, we note several distinguishing features of current web infrastructure.
△ Less
Submitted 1 May, 2013;
originally announced May 2013.
-
Longitudinal Analysis of Android Ad Library Permissions
Authors:
Theodore Book,
Adam Pridgen,
Dan S. Wallach
Abstract:
This paper investigates changes over time in the behavior of Android ad libraries. Taking a sample of 100,000 apps, we extract and classify the ad libraries. By considering the release dates of the applications that use a specific ad library version, we estimate the release date for the library, and thus build a chronological map of the permissions used by various ad libraries over time. We find t…
▽ More
This paper investigates changes over time in the behavior of Android ad libraries. Taking a sample of 100,000 apps, we extract and classify the ad libraries. By considering the release dates of the applications that use a specific ad library version, we estimate the release date for the library, and thus build a chronological map of the permissions used by various ad libraries over time. We find that the use of most permissions has increased over the last several years, and that more libraries are able to use permissions that pose particular risks to user privacy and security.
△ Less
Submitted 18 April, 2013; v1 submitted 4 March, 2013;
originally announced March 2013.
-
The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions
Authors:
Tao Zhu,
David Phipps,
Adam Pridgen,
Jedidiah R. Crandall,
Dan S. Wallach
Abstract:
Weibo and other popular Chinese microblogging sites are well known for exercising internal censorship, to comply with Chinese government requirements. This research seeks to quantify the mechanisms of this censorship: how fast and how comprehensively posts are deleted.Our analysis considered 2.38 million posts gathered over roughly two months in 2012, with our attention focused on repeatedly visit…
▽ More
Weibo and other popular Chinese microblogging sites are well known for exercising internal censorship, to comply with Chinese government requirements. This research seeks to quantify the mechanisms of this censorship: how fast and how comprehensively posts are deleted.Our analysis considered 2.38 million posts gathered over roughly two months in 2012, with our attention focused on repeatedly visiting "sensitive" users. This gives us a view of censorship events within minutes of their occurrence, albeit at a cost of our data no longer representing a random sample of the general Weibo population. We also have a larger 470 million post sampling from Weibo's public timeline, taken over a longer time period, that is more representative of a random sample.
We found that deletions happen most heavily in the first hour after a post has been submitted. Focusing on original posts, not reposts/retweets, we observed that nearly 30% of the total deletion events occur within 5- 30 minutes. Nearly 90% of the deletions happen within the first 24 hours. Leveraging our data, we also considered a variety of hypotheses about the mechanisms used by Weibo for censorship, such as the extent to which Weibo's censors use retrospective keyword-based censorship, and how repost/retweet popularity interacts with censorship. We also used natural language processing techniques to analyze which topics were more likely to be censored.
△ Less
Submitted 9 July, 2013; v1 submitted 3 March, 2013;
originally announced March 2013.
-
Language Without Words: A Pointillist Model for Natural Language Processing
Authors:
Peiyou Song,
Anhei Shu,
David Phipps,
Dan Wallach,
Mohit Tiwari,
Jedidiah Crandall,
George Luger
Abstract:
This paper explores two separate questions: Can we perform natural language processing tasks without a lexicon?; and, Should we? Existing natural language processing techniques are either based on words as units or use units such as grams only for basic classification tasks. How close can a machine come to reasoning about the meanings of words and phrases in a corpus without using any lexicon, bas…
▽ More
This paper explores two separate questions: Can we perform natural language processing tasks without a lexicon?; and, Should we? Existing natural language processing techniques are either based on words as units or use units such as grams only for basic classification tasks. How close can a machine come to reasoning about the meanings of words and phrases in a corpus without using any lexicon, based only on grams?
Our own motivation for posing this question is based on our efforts to find popular trends in words and phrases from online Chinese social media. This form of written Chinese uses so many neologisms, creative character placements, and combinations of writing systems that it has been dubbed the "Martian Language." Readers must often use visual queues, audible queues from reading out loud, and their knowledge and understanding of current events to understand a post. For analysis of popular trends, the specific problem is that it is difficult to build a lexicon when the invention of new ways to refer to a word or concept is easy and common. For natural language processing in general, we argue in this paper that new uses of language in social media will challenge machines' abilities to operate with words as the basic unit of understanding, not only in Chinese but potentially in other languages.
△ Less
Submitted 11 December, 2012;
originally announced December 2012.
-
Tracking and Quantifying Censorship on a Chinese Microblogging Site
Authors:
Tao Zhu,
David Phipps,
Adam Pridgen,
Jedidiah R. Crandall,
Dan S. Wallach
Abstract:
We present measurements and analysis of censorship on Weibo, a popular microblogging site in China. Since we were limited in the rate at which we could download posts, we identified users likely to participate in sensitive topics and recursively followed their social contacts. We also leveraged new natural language processing techniques to pick out trending topics despite the use of neologisms, na…
▽ More
We present measurements and analysis of censorship on Weibo, a popular microblogging site in China. Since we were limited in the rate at which we could download posts, we identified users likely to participate in sensitive topics and recursively followed their social contacts. We also leveraged new natural language processing techniques to pick out trending topics despite the use of neologisms, named entities, and informal language usage in Chinese social media. We found that Weibo dynamically adapts to the changing interests of its users through multiple layers of filtering. The filtering includes both retroactively searching posts by keyword or repost links to delete them, and rejecting posts as they are posted. The trend of sensitive topics is short-lived, suggesting that the censorship is effective in stop** the "viral" spread of sensitive issues. We also give evidence that sensitive topics in Weibo only scarcely propagate beyond a core of sensitive posters.
△ Less
Submitted 26 November, 2012;
originally announced November 2012.
-
STAR-Vote: A Secure, Transparent, Auditable, and Reliable Voting System
Authors:
Josh Benaloh,
Mike Byrne,
Philip Kortum,
Neal McBurnett,
Olivier Pereira,
Philip B. Stark,
Dan S. Wallach
Abstract:
In her 2011 EVT/WOTE keynote, Travis County, Texas County Clerk Dana DeBeauvoir described the qualities she wanted in her ideal election system to replace their existing DREs. In response, in April of 2012, the authors, working with DeBeauvoir and her staff, jointly architected STAR-Vote, a voting system with a DRE-style human interface and a "belt and suspenders" approach to verifiability. It pro…
▽ More
In her 2011 EVT/WOTE keynote, Travis County, Texas County Clerk Dana DeBeauvoir described the qualities she wanted in her ideal election system to replace their existing DREs. In response, in April of 2012, the authors, working with DeBeauvoir and her staff, jointly architected STAR-Vote, a voting system with a DRE-style human interface and a "belt and suspenders" approach to verifiability. It provides both a paper trail and end-to-end cryptography using COTS hardware. It is designed to support both ballot-level risk-limiting audits, and auditing by individual voters and observers. The human interface and process flow is based on modern usability research. This paper describes the STAR-Vote architecture, which could well be the next-generation voting system for Travis County and perhaps elsewhere.
△ Less
Submitted 8 November, 2012;
originally announced November 2012.
-
A Pointillism Approach for Natural Language Processing of Social Media
Authors:
Peiyou Song,
Anhei Shu,
Anyu Zhou,
Dan Wallach,
Jedidiah R. Crandall
Abstract:
The Chinese language poses challenges for natural language processing based on the unit of a word even for formal uses of the Chinese language, social media only makes word segmentation in Chinese even more difficult. In this document we propose a pointillism approach to natural language processing. Rather than words that have individual meanings, the basic unit of a pointillism approach is trigra…
▽ More
The Chinese language poses challenges for natural language processing based on the unit of a word even for formal uses of the Chinese language, social media only makes word segmentation in Chinese even more difficult. In this document we propose a pointillism approach to natural language processing. Rather than words that have individual meanings, the basic unit of a pointillism approach is trigrams of characters. These grams take on meaning in aggregate when they appear together in a way that is correlated over time.
Our results from three kinds of experiments show that when words and topics do have a meme-like trend, they can be reconstructed from only trigrams. For example, for 4-character idioms that appear at least 99 times in one day in our data, the unconstrained precision (that is, precision that allows for deviation from a lexicon when the result is just as correct as the lexicon version of the word or phrase) is 0.93. For longer words and phrases collected from Wiktionary, including neologisms, the unconstrained precision is 0.87. We consider these results to be very promising, because they suggest that it is feasible for a machine to reconstruct complex idioms, phrases, and neologisms with good precision without any notion of words. Thus the colorful and baroque uses of language that typify social media in challenging languages such as Chinese may in fact be accessible to machines.
△ Less
Submitted 21 June, 2012;
originally announced June 2012.
-
AdSplit: Separating smartphone advertising from applications
Authors:
Shashi Shekhar,
Michael Dietz,
Dan S. Wallach
Abstract:
A wide variety of smartphone applications today rely on third-party advertising services, which provide libraries that are linked into the hosting application. This situation is undesirable for both the application author and the advertiser. Advertising libraries require additional permissions, resulting in additional permission requests to users. Likewise, a malicious application could simulate t…
▽ More
A wide variety of smartphone applications today rely on third-party advertising services, which provide libraries that are linked into the hosting application. This situation is undesirable for both the application author and the advertiser. Advertising libraries require additional permissions, resulting in additional permission requests to users. Likewise, a malicious application could simulate the behavior of the advertising library, forging the user's interaction and effectively stealing money from the advertiser. This paper describes AdSplit, where we extended Android to allow an application and its advertising to run as separate processes, under separate user-ids, eliminating the need for applications to request permissions on behalf of their advertising libraries.
We also leverage mechanisms from Quire to allow the remote server to validate the authenticity of client-side behavior. In this paper, we quantify the degree of permission bloat caused by advertising, with a study of thousands of downloaded apps. AdSplit automatically recompiles apps to extract their ad services, and we measure minimal runtime overhead. We also observe that most ad libraries just embed an HTML widget within and describe how AdSplit can be designed with this in mind to avoid any need for ads to have native code.
△ Less
Submitted 17 February, 2012;
originally announced February 2012.
-
#h00t: Censorship Resistant Microblogging
Authors:
Dustin Bachrach,
Christopher Nunu,
Dan S. Wallach,
Matthew Wright
Abstract:
Microblogging services such as Twitter are an increasingly important way to communicate, both for individuals and for groups through the use of hashtags that denote topics of conversation. However, groups can be easily blocked from communicating through blocking of posts with the given hashtags. We propose #h00t, a system for censorship resistant microblogging. #h00t presents an interface that is…
▽ More
Microblogging services such as Twitter are an increasingly important way to communicate, both for individuals and for groups through the use of hashtags that denote topics of conversation. However, groups can be easily blocked from communicating through blocking of posts with the given hashtags. We propose #h00t, a system for censorship resistant microblogging. #h00t presents an interface that is much like Twitter, except that hashtags are replaced with very short hashes (e.g., 24 bits) of the group identifier. Naturally, with such short hashes, hashtags from different groups may collide and #h00t users will actually seek to create collisions. By encrypting all posts with keys derived from the group identifiers, #h00t client software can filter out other groups' posts while making such filtering difficult for the adversary. In essence, by leveraging collisions, groups can tunnel their posts in other groups' posts. A censor could not block a given group without also blocking the other groups with colliding hashtags. We evaluate the feasibility of #h00t through traces collected from Twitter, showing that a single modern computer has enough computational throughput to encrypt every tweet sent through Twitter in real time. We also use these traces to analyze the bandwidth and anonymity tradeoffs that would come with different variations on how group identifiers are encoded and hashtags are selected to purposefully collide with one another.
△ Less
Submitted 30 September, 2011;
originally announced September 2011.
-
The BitTorrent Anonymity Marketplace
Authors:
Seth James Nielson,
Dan S. Wallach
Abstract:
The very nature of operations in peer-to-peer systems such as BitTorrent exposes information about participants to their peers. Nodes desiring anonymity, therefore, often chose to route their peer-to-peer traffic through anonymity relays, such as Tor. Unfortunately, these relays have little incentive for contribution and struggle to scale with the high loads that P2P traffic foists upon them. We p…
▽ More
The very nature of operations in peer-to-peer systems such as BitTorrent exposes information about participants to their peers. Nodes desiring anonymity, therefore, often chose to route their peer-to-peer traffic through anonymity relays, such as Tor. Unfortunately, these relays have little incentive for contribution and struggle to scale with the high loads that P2P traffic foists upon them. We propose a novel modification for BitTorrent that we call the BitTorrent Anonymity Marketplace. Peers in our system trade in k swarms obscuring the actual intent of the participants. But because peers can cross-trade torrents, the k-1 cover traffic can actually serve a useful purpose. This creates a system wherein a neighbor cannot determine if a node actually wants a given torrent, or if it is only using it as leverage to get the one it really wants. In this paper, we present our design, explore its operation in simulation, and analyze its effectiveness. We demonstrate that the upload and download characteristics of cover traffic and desired torrents are statistically difficult to distinguish.
△ Less
Submitted 12 August, 2011;
originally announced August 2011.
-
Building Better Incentives for Robustness in BitTorrent
Authors:
Seth James Nielson,
Caleb E. Spare,
Dan S. Wallach
Abstract:
BitTorrent is a widely-deployed, peer-to-peer file transfer protocol engineered with a "tit for tat" mechanism that encourages cooperation. Unfortunately, there is little incentive for nodes to altruistically provide service to their peers after they finish downloading a file, and what altruism there is can be exploited by aggressive clients like Bit- Tyrant. This altruism, called seeding, is alwa…
▽ More
BitTorrent is a widely-deployed, peer-to-peer file transfer protocol engineered with a "tit for tat" mechanism that encourages cooperation. Unfortunately, there is little incentive for nodes to altruistically provide service to their peers after they finish downloading a file, and what altruism there is can be exploited by aggressive clients like Bit- Tyrant. This altruism, called seeding, is always beneficial and sometimes essential to BitTorrent's real-world performance. We propose a new long-term incentives mechanism in BitTorrent to encourage peers to seed and we evaluate its effectiveness via simulation. We show that when nodes running our algorithm reward one another for good behavior in previous swarms, they experience as much as a 50% improvement in download times over unrewarded nodes. Even when aggressive clients, such as BitTyrant, participate in the swarm, our rewarded nodes still outperform them, although by smaller margins.
△ Less
Submitted 12 August, 2011;
originally announced August 2011.
-
Attacks on Local Searching Tools
Authors:
Seth James Nielson,
Seth J. Fogarty,
Dan S. Wallach
Abstract:
The Google Desktop Search is an indexing tool, currently in beta testing, designed to allow users fast, intuitive, searching for local files. The principle interface is provided through a local web server which supports an interface similar to Google.com's normal web page. Indexing of local files occurs when the system is idle, and understands a number of common file types. A optional feature is t…
▽ More
The Google Desktop Search is an indexing tool, currently in beta testing, designed to allow users fast, intuitive, searching for local files. The principle interface is provided through a local web server which supports an interface similar to Google.com's normal web page. Indexing of local files occurs when the system is idle, and understands a number of common file types. A optional feature is that Google Desktop can integrate a short summary of a local search results with Google.com web searches. This summary includes 30-40 character snippets of local files. We have uncovered a vulnerability that would release private local data to an unauthorized remote entity. Using two different attacks, we expose the small snippets of private local data to a remote third party.
△ Less
Submitted 12 August, 2011;
originally announced August 2011.
-
An Analysis of Chinese Search Engine Filtering
Authors:
Tao Zhu,
Christopher Bronk,
Dan S. Wallach
Abstract:
The imposition of government mandates upon Internet search engine operation is a growing area of interest for both computer science and public policy. Users of these search engines often observe evidence of censorship, but the government policies that impose this censorship are not generally public. To better understand these policies, we conducted a set of experiments on major search engines empl…
▽ More
The imposition of government mandates upon Internet search engine operation is a growing area of interest for both computer science and public policy. Users of these search engines often observe evidence of censorship, but the government policies that impose this censorship are not generally public. To better understand these policies, we conducted a set of experiments on major search engines employed by Internet users in China, issuing queries against a variety of different words: some neutral, some with names of important people, some political, and some pornographic. We conducted these queries, in Chinese, against Baidu, Google (including google.cn, before it was terminated), Yahoo!, and Bing. We found remarkably aggressive filtering of pornographic terms, in some cases causing non-pornographic terms which use common characters to also be filtered. We also found that names of prominent activists and organizers as well as top political and military leaders, were also filtered in whole or in part. In some cases, we found search terms which we believe to be "blacklisted". In these cases, the only results that appeared, for any of them, came from a short "whitelist" of sites owned or controlled directly by the Chinese government. By repeating observations over a long observation period, we also found that the keyword blocking policies of the Great Firewall of China vary over time. While our results don't offer any fundamental insight into how to defeat or work around Chinese internet censorship, they are still helpful to understand the structure of how censorship duties are shared between the Great Firewall and Chinese search engines.
△ Less
Submitted 19 July, 2011;
originally announced July 2011.
-
Quire: Lightweight Provenance for Smart Phone Operating Systems
Authors:
Michael Dietz,
Shashi Shekhar,
Yuliy Pisetsky,
Anhei Shu,
Dan S. Wallach
Abstract:
Smartphone apps often run with full privileges to access the network and sensitive local resources, making it difficult for remote systems to have any trust in the provenance of network connections they receive. Even within the phone, different apps with different privileges can communicate with one another, allowing one app to trick another into improperly exercising its privileges (a Confused De…
▽ More
Smartphone apps often run with full privileges to access the network and sensitive local resources, making it difficult for remote systems to have any trust in the provenance of network connections they receive. Even within the phone, different apps with different privileges can communicate with one another, allowing one app to trick another into improperly exercising its privileges (a Confused Deputy attack). In Quire, we engineered two new security mechanisms into Android to address these issues. First, we track the call chain of IPCs, allowing an app the choice of operating with the diminished privileges of its callers or to act explicitly on its own behalf. Second, a lightweight signature scheme allows any app to create a signed statement that can be verified anywhere inside the phone. Both of these mechanisms are reflected in network RPCs, allowing remote systems visibility into the state of the phone when an RPC is made. We demonstrate the usefulness of Quire with two example applications. We built an advertising service, running distinctly from the app which wants to display ads, which can validate clicks passed to it from its host. We also built a payment service, allowing an app to issue a request which the payment service validates with the user. An app cannot not forge a payment request by directly connecting to the remote server, nor can the local payment service tamper with the request.
△ Less
Submitted 11 February, 2011;
originally announced February 2011.