Botcha: Detecting Malicious Non-Human Traffic in the Wild

Dhamnani, Sunny; Sinha, Ritwik; Vinay, Vishwa; Kumari, Lilly; Savova, Margarita

Computer Science > Machine Learning

arXiv:2103.01428 (cs)

[Submitted on 2 Mar 2021]

Title:Botcha: Detecting Malicious Non-Human Traffic in the Wild

Authors:Sunny Dhamnani, Ritwik Sinha, Vishwa Vinay, Lilly Kumari, Margarita Savova

View PDF

Abstract:Malicious bots make up about a quarter of all traffic on the web, and degrade the performance of personalization and recommendation algorithms that operate on e-commerce sites. Positive-Unlabeled learning (PU learning) provides the ability to train a binary classifier using only positive (P) and unlabeled (U) instances. The unlabeled data comprises of both positive and negative classes. It is possible to find labels for strict subsets of non-malicious actors, e.g., the assumption that only humans purchase during web sessions, or clear CAPTCHAs. However, finding signals of malicious behavior is almost impossible due to the ever-evolving and adversarial nature of bots. Such a set-up naturally lends itself to PU learning. Unfortunately, standard PU learning approaches assume that the labeled set of positives are a random sample of all positives, this is unlikely to hold in practice. In this work, we propose two modifications to PU learning that make it more robust to violations of the selected-completely-at-random assumption, leading to a system that can filter out malicious bots. In one public and one proprietary dataset, we show that proposed approaches are better at identifying humans in web data than standard PU learning methods.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2103.01428 [cs.LG]
	(or arXiv:2103.01428v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2103.01428
Journal reference:	OHARS'20: Workshop on Online Misinformation- and Harm-Aware Recommender Systems, September 25, 2020, OHARS@RecSys 2020: 51-59

Submission history

From: Sunny Dhamnani [view email]
[v1] Tue, 2 Mar 2021 02:49:49 UTC (145 KB)

Computer Science > Machine Learning

Title:Botcha: Detecting Malicious Non-Human Traffic in the Wild

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Botcha: Detecting Malicious Non-Human Traffic in the Wild

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators