-
URLTran: Improving Phishing URL Detection Using Transformers
Authors:
Pranav Maneriker,
Jack W. Stokes,
Edir Garcia Lazo,
Diana Carutasu,
Farid Tajaddodianfar,
Arun Gururajan
Abstract:
Browsers often include security features to detect phishing web pages. In the past, some browsers evaluated an unknown URL for inclusion in a list of known phishing pages. However, as the number of URLs and known phishing pages continued to increase at a rapid pace, browsers started to include one or more machine learning classifiers as part of their security services that aim to better protect en…
▽ More
Browsers often include security features to detect phishing web pages. In the past, some browsers evaluated an unknown URL for inclusion in a list of known phishing pages. However, as the number of URLs and known phishing pages continued to increase at a rapid pace, browsers started to include one or more machine learning classifiers as part of their security services that aim to better protect end users from harm. While additional information could be used, browsers typically evaluate every unknown URL using some classifier in order to quickly detect these phishing pages. Early phishing detection used standard machine learning classifiers, but recent research has instead proposed the use of deep learning models for the phishing URL detection task. Concurrently, text embedding research using transformers has led to state-of-the-art results in many natural language processing tasks. In this work, we perform a comprehensive analysis of transformer models on the phishing URL detection task. We consider standard masked language model and additional domain-specific pre-training tasks, and compare these models to fine-tuned BERT and RoBERTa models. Combining the insights from these experiments, we propose URLTran which uses transformers to significantly improve the performance of phishing URL detection over a wide range of very low false positive rates (FPRs) compared to other deep learning-based methods. For example, URLTran yields a true positive rate (TPR) of 86.80% compared to 71.20% for the next best baseline at an FPR of 0.01%, resulting in a relative improvement of over 21.9%. Further, we consider some classical adversarial black-box phishing attacks such as those based on homoglyphs and compound word splits to improve the robustness of URLTran. We consider additional fine tuning with these adversarial samples and demonstrate that URLTran can maintain low FPRs under these scenarios.
△ Less
Submitted 27 August, 2021; v1 submitted 9 June, 2021;
originally announced June 2021.
-
Constraints induced delocalization
Authors:
Piotr Sierant,
Eduardo Gonzalez Lazo,
Marcello Dalmonte,
Antonello Scardicchio,
Jakub Zakrzewski
Abstract:
We study the impact of quenched disorder on the dynamics of locally constrained quantum spin chains, that describe 1D arrays of Rydberg atoms in both frozen (Ising-type) and dressed (XY-type) regime. Performing large-scale numerical experiments, we observe no trace of many-body localization even at large disorder. Analyzing the role of quenched disorder terms in constrained systems we show that th…
▽ More
We study the impact of quenched disorder on the dynamics of locally constrained quantum spin chains, that describe 1D arrays of Rydberg atoms in both frozen (Ising-type) and dressed (XY-type) regime. Performing large-scale numerical experiments, we observe no trace of many-body localization even at large disorder. Analyzing the role of quenched disorder terms in constrained systems we show that they act in two, distinct and competing ways: as an on-site disorder term for the basic excitations of the system, and as an interaction between excitations. The two contributions are of the same order, and as they compete (one towards localization, the other against it), one does never enter a truly strong disorder, weak interaction limit, where many-body localization occurs. Such a mechanism is further clarified in the case of XY-type constrained models: there, a term which would represent a bona-fide local quenched disorder term acting on the excitations of the clean model must be written as a series of non-local terms in the unconstrained variables. Our observations provide a simple picture to interpret the role of quenched disorder that could be immediately extended to other constrained models or quenched gauge theories.
△ Less
Submitted 25 March, 2021;
originally announced March 2021.
-
Exact many-body scars and their stability in constrained quantum chains
Authors:
Federica Maria Surace,
Matteo Votto,
Eduardo Gonzalez Lazo,
Alessandro Silva,
Marcello Dalmonte,
Giuliano Giudici
Abstract:
Quantum scars are non-thermal eigenstates characterized by low entanglement entropy, initially detected in systems subject to nearest-neighbor Rydberg blockade, the so called PXP model. While most of these special eigenstates elude an analytical description and seem to hybridize with nearby thermal eigenstates for large systems, some of them can be written as matrix product states (MPS) with size-…
▽ More
Quantum scars are non-thermal eigenstates characterized by low entanglement entropy, initially detected in systems subject to nearest-neighbor Rydberg blockade, the so called PXP model. While most of these special eigenstates elude an analytical description and seem to hybridize with nearby thermal eigenstates for large systems, some of them can be written as matrix product states (MPS) with size-independent bond dimension. We study the response of these exact quantum scars to perturbations by analysing the scaling of the fidelity susceptibility with system size. We find that some of them are anomalously stable at first order in perturbation theory, in sharp contrast to the eigenstate thermalization hypothesis. However, this stability seems to breakdown when all orders are taken into account. We further investigate models with larger blockade radius and find a novel set of exact quantum scars, that we write down analytically and compare with the PXP exact eigenstates. We show that they exhibit the same robustness against perturbations at first order.
△ Less
Submitted 23 March, 2021; v1 submitted 16 November, 2020;
originally announced November 2020.