-
Sound-skwatter (Did You Mean: Sound-squatter?) AI-powered Generator for Phishing Prevention
Authors:
Rodolfo Valentim,
Idilio Drago,
Marco Mellia,
Federico Cerutti
Abstract:
Sound-squatting is a phishing attack that tricks users into malicious resources by exploiting similarities in the pronunciation of words. Proactive defense against sound-squatting candidates is complex, and existing solutions rely on manually curated lists of homophones. We here introduce Sound-skwatter, a multi-language AI-based system that generates sound-squatting candidates for proactive defen…
▽ More
Sound-squatting is a phishing attack that tricks users into malicious resources by exploiting similarities in the pronunciation of words. Proactive defense against sound-squatting candidates is complex, and existing solutions rely on manually curated lists of homophones. We here introduce Sound-skwatter, a multi-language AI-based system that generates sound-squatting candidates for proactive defense. Sound-skwatter relies on an innovative multi-modal combination of Transformers Networks and acoustic models to learn sound similarities. We show that Sound-skwatter can automatically list known homophones and thousands of high-quality candidates. In addition, it covers cross-language sound-squatting, i.e., when the reader and the listener speak different languages, supporting any combination of languages. We apply Sound-skwatter to network-centric phishing via squatted domain names. We find ~ 10% of the generated domains exist in the wild, the vast majority unknown to protection solutions. Next, we show attacks on the PyPI package manager, where ~ 17% of the popular packages have at least one existing candidate. We believe Sound-skwatter is a crucial asset to mitigate the sound-squatting phenomenon proactively on the Internet. To increase its impact, we publish an online demo and release our models and code as open source.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
LogPrécis: Unleashing Language Models for Automated Malicious Log Analysis
Authors:
Matteo Boffa,
Rodolfo Vieira Valentim,
Luca Vassio,
Danilo Giordano,
Idilio Drago,
Marco Mellia,
Zied Ben Houidi
Abstract:
The collection of security-related logs holds the key to understanding attack behaviors and diagnosing vulnerabilities. Still, their analysis remains a daunting challenge. Recently, Language Models (LMs) have demonstrated unmatched potential in understanding natural and programming languages. The question arises whether and how LMs could be also useful for security experts since their logs contain…
▽ More
The collection of security-related logs holds the key to understanding attack behaviors and diagnosing vulnerabilities. Still, their analysis remains a daunting challenge. Recently, Language Models (LMs) have demonstrated unmatched potential in understanding natural and programming languages. The question arises whether and how LMs could be also useful for security experts since their logs contain intrinsically confused and obfuscated information. In this paper, we systematically study how to benefit from the state-of-the-art in LM to automatically analyze text-like Unix shell attack logs. We present a thorough design methodology that leads to LogPrécis. It receives as input raw shell sessions and automatically identifies and assigns the attacker tactic to each portion of the session, i.e., unveiling the sequence of the attacker's goals. We demonstrate LogPrécis capability to support the analysis of two large datasets containing about 400,000 unique Unix shell attacks. LogPrécis reduces them into about 3,000 fingerprints, each grou** sessions with the same sequence of tactics. The abstraction it provides lets the analyst better understand attacks, identify fingerprints, detect novelty, link similar attacks, and track families and mutations. Overall, LogPrécis, released as open source, paves the way for better and more responsive defense against cyberattacks.
△ Less
Submitted 22 March, 2024; v1 submitted 17 July, 2023;
originally announced July 2023.
-
Tracking Knowledge Propagation Across Wikipedia Languages
Authors:
Roldolfo Valentim,
Giovanni Comarela,
Souneil Park,
Diego Saez-Trumper
Abstract:
In this paper, we present a dataset of inter-language knowledge propagation in Wikipedia. Covering the entire 309 language editions and 33M articles, the dataset aims to track the full propagation history of Wikipedia concepts, and allow follow up research on building predictive models of them. For this purpose, we align all the Wikipedia articles in a language-agnostic manner according to the con…
▽ More
In this paper, we present a dataset of inter-language knowledge propagation in Wikipedia. Covering the entire 309 language editions and 33M articles, the dataset aims to track the full propagation history of Wikipedia concepts, and allow follow up research on building predictive models of them. For this purpose, we align all the Wikipedia articles in a language-agnostic manner according to the concept they cover, which results in 13M propagation instances. To the best of our knowledge, this dataset is the first to explore the full inter-language propagation at a large scale. Together with the dataset, a holistic overview of the propagation and key insights about the underlying structural factors are provided to aid future research. For example, we find that although long cascades are unusual, the propagation tends to continue further once it reaches more than four language editions. We also find that the size of language editions is associated with the speed of propagation. We believe the dataset not only contributes to the prior literature on Wikipedia growth but also enables new use cases such as edit recommendation for addressing knowledge gaps, detection of disinformation, and cultural relationship analysis.
△ Less
Submitted 30 March, 2021;
originally announced March 2021.
-
Automatic lesion segmentation and Pathological Myopia classification in fundus images
Authors:
Cefas Rodrigues Freire,
Julio Cesar da Costa Moura,
Daniele Montenegro da Silva Barros,
Ricardo Alexsandro de Medeiros Valentim
Abstract:
In this paper we present algorithms to diagnosis Pathological Myopia (PM) and detection of retinal structures and lesions such asOptic Disc (OD), Fovea, Atrophy and Detachment. All these tasks were performed in fundus imaging from PM patients and they are requirements to participate in the Pathologic Myopia Challenge (PALM). The challenge was organized as a half day Challenge, a Satellite Event of…
▽ More
In this paper we present algorithms to diagnosis Pathological Myopia (PM) and detection of retinal structures and lesions such asOptic Disc (OD), Fovea, Atrophy and Detachment. All these tasks were performed in fundus imaging from PM patients and they are requirements to participate in the Pathologic Myopia Challenge (PALM). The challenge was organized as a half day Challenge, a Satellite Event of The IEEE International Symposium on Biomedical Imaging in Venice Italy.Our method applies different Deep Learning techniques for each task. Transfer learning is applied in all tasks using Xception as the baseline model. Also, some key ideas of YOLO architecture are used in the Optic Disc segmentation algorithm pipeline. We have evaluated our model's performance according the challenge rules in terms of AUC-ROC, F1-Score, Mean Dice Score and Mean Euclidean Distance. For initial activities our method has shown satisfactory results.
△ Less
Submitted 15 February, 2020;
originally announced February 2020.
-
RDNA Balance: Load Balancing by Isolation of Elephant Flows using Strict Source Routing
Authors:
Rodolfo V. Valentim,
Rodolfo S. Villaca,
Moises R. N. Ribeiro,
Magnos Martinello,
Cristina K. Dominicini,
Diego R. Mafioletti
Abstract:
Data center networks need load balancing mechanisms to dynamically serve a large number of flows with different service requirements. However, traditional load-balancing approaches do not allow the full utilization of network resources in a simple, programmable, and scalable way. In this context, this paper proposes RDNA Balance that exploits elephant flow isolation and source routing in core node…
▽ More
Data center networks need load balancing mechanisms to dynamically serve a large number of flows with different service requirements. However, traditional load-balancing approaches do not allow the full utilization of network resources in a simple, programmable, and scalable way. In this context, this paper proposes RDNA Balance that exploits elephant flow isolation and source routing in core nodes. Flow classification operations are performed on the edge using features of the OpenFlow protocol. The results show that with this approach it is possible to provide a simple, scalable, and programmable load balancing for data centers.
△ Less
Submitted 11 April, 2019;
originally announced April 2019.