-
Hopper: Modeling and Detecting Lateral Movement (Extended Report)
Authors:
Grant Ho,
Mayank Dhiman,
Devdatta Akhawe,
Vern Paxson,
Stefan Savage,
Geoffrey M. Voelker,
David Wagner
Abstract:
In successful enterprise attacks, adversaries often need to gain access to additional machines beyond their initial point of compromise, a set of internal movements known as lateral movement. We present Hopper, a system for detecting lateral movement based on commonly available enterprise logs. Hopper constructs a graph of login activity among internal machines and then identifies suspicious seque…
▽ More
In successful enterprise attacks, adversaries often need to gain access to additional machines beyond their initial point of compromise, a set of internal movements known as lateral movement. We present Hopper, a system for detecting lateral movement based on commonly available enterprise logs. Hopper constructs a graph of login activity among internal machines and then identifies suspicious sequences of loginsthat correspond to lateral movement. To understand the larger context of each login, Hopper employs an inference algorithm to identify the broader path(s) of movement that each login belongs to and the causal user responsible for performing a path's logins. Hopper then leverages this path inference algorithm, in conjunction with a set of detection rules and a new anomaly scoring algorithm, to surface the login paths most likely to reflect lateral movement. On a 15-month enterprise dataset consisting of over 780 million internal logins, Hop-per achieves a 94.5% detection rate across over 300 realistic attack scenarios, including one red team attack, while generating an average of <9 alerts per day. In contrast, to detect the same number of attacks, prior state-of-the-art systems would need to generate nearly 8x as many false positives.
△ Less
Submitted 27 May, 2021;
originally announced May 2021.
-
Detecting and Characterizing Lateral Phishing at Scale
Authors:
Grant Ho,
Asaf Cidon,
Lior Gavish,
Marco Schweighauser,
Vern Paxson,
Stefan Savage,
Geoffrey M. Voelker,
David Wagner
Abstract:
We present the first large-scale characterization of lateral phishing attacks, based on a dataset of 113 million employee-sent emails from 92 enterprise organizations. In a lateral phishing attack, adversaries leverage a compromised enterprise account to send phishing emails to other users, benefitting from both the implicit trust and the information in the hijacked user's account. We develop a cl…
▽ More
We present the first large-scale characterization of lateral phishing attacks, based on a dataset of 113 million employee-sent emails from 92 enterprise organizations. In a lateral phishing attack, adversaries leverage a compromised enterprise account to send phishing emails to other users, benefitting from both the implicit trust and the information in the hijacked user's account. We develop a classifier that finds hundreds of real-world lateral phishing emails, while generating under four false positives per every one-million employee-sent emails. Drawing on the attacks we detect, as well as a corpus of user-reported incidents, we quantify the scale of lateral phishing, identify several thematic content and recipient targeting strategies that attackers follow, illuminate two types of sophisticated behaviors that attackers exhibit, and estimate the success rate of these attacks. Collectively, these results expand our mental models of the 'enterprise attacker' and shed light on the current state of enterprise phishing attacks.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
A Bestiary of Blocking: The Motivations and Modes behind Website Unavailability
Authors:
Sadia Afroz,
Mobin Javed,
Vern Paxson,
Shoaib Asif Qazi,
Shaarif Sajid,
Michael Carl Tschantz
Abstract:
This paper examines different reasons the websites may vary in their availability by location. Prior works on availability mostly focus on censorship by nation states. We look at three forms of server-side blocking: blocking visitors from the EU to avoid GDPR compliance, blocking based upon the visitor's country, and blocking due to security concerns. We argue that these and other forms of blockin…
▽ More
This paper examines different reasons the websites may vary in their availability by location. Prior works on availability mostly focus on censorship by nation states. We look at three forms of server-side blocking: blocking visitors from the EU to avoid GDPR compliance, blocking based upon the visitor's country, and blocking due to security concerns. We argue that these and other forms of blocking warrant more research.
△ Less
Submitted 1 June, 2018;
originally announced June 2018.
-
Exploring Server-side Blocking of Regions
Authors:
Sadia Afroz,
Michael Carl Tschantz,
Shaarif Sajid,
Shoaib Asif Qazi,
Mobin Javed,
Vern Paxson
Abstract:
One of the Internet's greatest strengths is the degree to which it facilitates access to any of its resources from users anywhere in the world. However, users in the develo** world have complained of websites blocking their countries. We explore this phenomenon using a measurement study. With a combination of automated page loads, manual checking, and traceroutes, we can say, with high confidenc…
▽ More
One of the Internet's greatest strengths is the degree to which it facilitates access to any of its resources from users anywhere in the world. However, users in the develo** world have complained of websites blocking their countries. We explore this phenomenon using a measurement study. With a combination of automated page loads, manual checking, and traceroutes, we can say, with high confidence, that some websites do block users from some regions. We cannot say, with high confidence, why, or even based on what criteria, they do so except for in some cases where the website states a reason. We do report qualitative evidence that fears of abuse and the costs of serving requests to some regions may play a role.
△ Less
Submitted 29 May, 2018;
originally announced May 2018.
-
Identifying Products in Online Cybercrime Marketplaces: A Dataset for Fine-grained Domain Adaptation
Authors:
Greg Durrett,
Jonathan K. Kummerfeld,
Taylor Berg-Kirkpatrick,
Rebecca S. Portnoff,
Sadia Afroz,
Damon McCoy,
Kirill Levchenko,
Vern Paxson
Abstract:
One weakness of machine-learned NLP models is that they typically perform poorly on out-of-domain data. In this work, we study the task of identifying products being bought and sold in online cybercrime forums, which exhibits particularly challenging cross-domain effects. We formulate a task that represents a hybrid of slot-filling information extraction and named entity recognition and annotate d…
▽ More
One weakness of machine-learned NLP models is that they typically perform poorly on out-of-domain data. In this work, we study the task of identifying products being bought and sold in online cybercrime forums, which exhibits particularly challenging cross-domain effects. We formulate a task that represents a hybrid of slot-filling information extraction and named entity recognition and annotate data from four different forums. Each of these forums constitutes its own "fine-grained domain" in that the forums cover different market sectors with different properties, even though all forums are in the broad domain of cybercrime. We characterize these domain differences in the context of a learning-based system: supervised models see decreased accuracy when applied to new forums, and standard techniques for semi-supervised learning and domain adaptation have limited effectiveness on this data, which suggests the need to improve these techniques. We release a dataset of 1,938 annotated posts from across the four forums.
△ Less
Submitted 31 August, 2017;
originally announced August 2017.
-
A Multi-perspective Analysis of Carrier-Grade NAT Deployment
Authors:
Philipp Richter,
Florian Wohlfart,
Narseo Vallina-Rodriguez,
Mark Allman,
Randy Bush,
Anja Feldmann,
Christian Kreibich,
Nicholas Weaver,
Vern Paxson
Abstract:
As ISPs face IPv4 address scarcity they increasingly turn to network address translation (NAT) to accommodate the address needs of their customers. Recently, ISPs have moved beyond employing NATs only directly at individual customers and instead begun deploying Carrier-Grade NATs (CGNs) to apply address translation to many independent and disparate endpoints spanning physical locations, a phenomen…
▽ More
As ISPs face IPv4 address scarcity they increasingly turn to network address translation (NAT) to accommodate the address needs of their customers. Recently, ISPs have moved beyond employing NATs only directly at individual customers and instead begun deploying Carrier-Grade NATs (CGNs) to apply address translation to many independent and disparate endpoints spanning physical locations, a phenomenon that so far has received little in the way of empirical assessment. In this work we present a broad and systematic study of the deployment and behavior of these middleboxes. We develop a methodology to detect the existence of hosts behind CGNs by extracting non-routable IP addresses from peer lists we obtain by crawling the BitTorrent DHT. We complement this approach with improvements to our Netalyzr troubleshooting service, enabling us to determine a range of indicators of CGN presence as well as detailed insights into key properties of CGNs. Combining the two data sources we illustrate the scope of CGN deployment on today's Internet, and report on characteristics of commonly deployed CGNs and their effect on end users.
△ Less
Submitted 13 September, 2016; v1 submitted 18 May, 2016;
originally announced May 2016.
-
Haystack: A Multi-Purpose Mobile Vantage Point in User Space
Authors:
Abbas Razaghpanah,
Narseo Vallina-Rodriguez,
Srikanth Sundaresan,
Christian Kreibich,
Phillipa Gill,
Mark Allman,
Vern Paxson
Abstract:
Despite our growing reliance on mobile phones for a wide range of daily tasks, their operation remains largely opaque. A number of previous studies have addressed elements of this problem in a partial fashion, trading off analytic comprehensiveness and deployment scale. We overcome the barriers to large-scale deployment (e.g., requiring rooted devices) and comprehensiveness of previous efforts by…
▽ More
Despite our growing reliance on mobile phones for a wide range of daily tasks, their operation remains largely opaque. A number of previous studies have addressed elements of this problem in a partial fashion, trading off analytic comprehensiveness and deployment scale. We overcome the barriers to large-scale deployment (e.g., requiring rooted devices) and comprehensiveness of previous efforts by taking a novel approach that leverages the VPN API on mobile devices to design Haystack, an in-situ mobile measurement platform that operates exclusively on the device, providing full access to the device's network traffic and local context without requiring root access. We present the design of Haystack and its implementation in an Android app that we deploy via standard distribution channels. Using data collected from 450 users of the app, we exemplify the advantages of Haystack over the state of the art and demonstrate its seamless experience even under demanding conditions. We also demonstrate its utility to users and researchers in characterizing mobile traffic and privacy risks.
△ Less
Submitted 29 October, 2016; v1 submitted 5 October, 2015;
originally announced October 2015.
-
Exploring Privacy Preservation in Outsourced K-Nearest Neighbors with Multiple Data Owners
Authors:
Frank Li,
Richard Shin,
Vern Paxson
Abstract:
The k-nearest neighbors (k-NN) algorithm is a popular and effective classification algorithm. Due to its large storage and computational requirements, it is suitable for cloud outsourcing. However, k-NN is often run on sensitive data such as medical records, user images, or personal information. It is important to protect the privacy of data in an outsourced k-NN system.
Prior works have all ass…
▽ More
The k-nearest neighbors (k-NN) algorithm is a popular and effective classification algorithm. Due to its large storage and computational requirements, it is suitable for cloud outsourcing. However, k-NN is often run on sensitive data such as medical records, user images, or personal information. It is important to protect the privacy of data in an outsourced k-NN system.
Prior works have all assumed the data owners (who submit data to the outsourced k-NN system) are a single trusted party. However, we observe that in many practical scenarios, there may be multiple mutually distrusting data owners. In this work, we present the first framing and exploration of privacy preservation in an outsourced k-NN system with multiple data owners. We consider the various threat models introduced by this modification. We discover that under a particularly practical threat model that covers numerous scenarios, there exists a set of adaptive attacks that breach the data privacy of any exact k-NN system. The vulnerability is a result of the mathematical properties of k-NN and its output. Thus, we propose a privacy-preserving alternative system supporting kernel density estimation using a Gaussian kernel, a classification algorithm from the same family as k-NN. In many applications, this similar algorithm serves as a good substitute for k-NN. We additionally investigate solutions for other threat models, often through extensions on prior single data owner systems.
△ Less
Submitted 29 July, 2015;
originally announced July 2015.
-
A Primer on IPv4 Scarcity
Authors:
Philipp Richter,
Mark Allman,
Randy Bush,
Vern Paxson
Abstract:
With the ongoing exhaustion of free address pools at the registries serving the global demand for IPv4 address space, scarcity has become reality. Networks in need of address space can no longer get more address allocations from their respective registries.
In this work we frame the fundamentals of the IPv4 address exhaustion phenomena and connected issues. We elaborate on how the current ecosys…
▽ More
With the ongoing exhaustion of free address pools at the registries serving the global demand for IPv4 address space, scarcity has become reality. Networks in need of address space can no longer get more address allocations from their respective registries.
In this work we frame the fundamentals of the IPv4 address exhaustion phenomena and connected issues. We elaborate on how the current ecosystem of IPv4 address space has evolved since the standardization of IPv4, leading to the rather complex and opaque scenario we face today. We outline the evolution in address space management as well as address space use patterns, identifying key factors of the scarcity issues. We characterize the possible solution space to overcome these issues and open the perspective of address blocks as virtual resources, which involves issues such as differentiation between address blocks, the need for resource certification, and issues arising when transferring address space between networks.
△ Less
Submitted 27 February, 2015; v1 submitted 10 November, 2014;
originally announced November 2014.
-
On Modeling the Costs of Censorship
Authors:
Michael Carl Tschantz,
Sadia Afroz,
Vern Paxson,
J. D. Tygar
Abstract:
We argue that the evaluation of censorship evasion tools should depend upon economic models of censorship. We illustrate our position with a simple model of the costs of censorship. We show how this model makes suggestions for how to evade censorship. In particular, from it, we develop evaluation criteria. We examine how our criteria compare to the traditional methods of evaluation employed in pri…
▽ More
We argue that the evaluation of censorship evasion tools should depend upon economic models of censorship. We illustrate our position with a simple model of the costs of censorship. We show how this model makes suggestions for how to evade censorship. In particular, from it, we develop evaluation criteria. We examine how our criteria compare to the traditional methods of evaluation employed in prior works.
△ Less
Submitted 10 September, 2014;
originally announced September 2014.
-
Fast, Approximate Synthesis of Fractional Gaussian Noise for Generating Self-Similar Network Traffic
Authors:
Vern Paxson
Abstract:
Recent network traffic studies argue that network arrival processes are much more faithfully modeled using statistically self-similar processes instead of traditional Poisson processes [LTWW94,PF95]. One difficulty in dealing with self-similar models is how to efficiently synthesize traces (sample paths) corresponding to self-similar traffic. We present a fast Fourier transform method for synthe…
▽ More
Recent network traffic studies argue that network arrival processes are much more faithfully modeled using statistically self-similar processes instead of traditional Poisson processes [LTWW94,PF95]. One difficulty in dealing with self-similar models is how to efficiently synthesize traces (sample paths) corresponding to self-similar traffic. We present a fast Fourier transform method for synthesizing approximate self-similar sample paths for one type of self-similar process, Fractional Gaussian Noise, and assess its performance and validity. We find that the method is as fast or faster than existing methods and appears to generate close approximations to true self-similar sample paths. We also discuss issues in using such synthesized sample paths for simulating network traffic, and how an approximation used by our method can dramatically speed up evaluation of Whittle's estimator for H, the Hurst parameter giving the strength of long-range dependence present in a self-similar time series.
△ Less
Submitted 18 September, 1998;
originally announced September 1998.