Search | arXiv e-print repository

The Performance of Sequential Deep Learning Models in Detecting Phishing Websites Using Contextual Features of URLs

Authors: Saroj Gopali, Akbar S. Namin, Faranak Abri, Keith S. Jones

Abstract: Cyber attacks continue to pose significant threats to individuals and organizations, stealing sensitive data such as personally identifiable information, financial information, and login credentials. Hence, detecting malicious websites before they cause any harm is critical to preventing fraud and monetary loss. To address the increasing number of phishing attacks, protective mechanisms must be hi… ▽ More Cyber attacks continue to pose significant threats to individuals and organizations, stealing sensitive data such as personally identifiable information, financial information, and login credentials. Hence, detecting malicious websites before they cause any harm is critical to preventing fraud and monetary loss. To address the increasing number of phishing attacks, protective mechanisms must be highly responsive, adaptive, and scalable. Fortunately, advances in the field of machine learning, coupled with access to vast amounts of data, have led to the adoption of various deep learning models for timely detection of these cyber crimes. This study focuses on the detection of phishing websites using deep learning models such as Multi-Head Attention, Temporal Convolutional Network (TCN), BI-LSTM, and LSTM where URLs of the phishing websites are treated as a sequence. The results demonstrate that Multi-Head Attention and BI-LSTM model outperform some other deep learning-based algorithms such as TCN and LSTM in producing better precision, recall, and F1-scores. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2111.04145 [pdf, other]

doi 10.1103/PhysRevB.106.054114

Humble planar defects in SiGe nanopillars

Authors: Hongbin Yang, Shang Ren, Sobhit Singh, Emily M. Turner, Kevin S. Jones, Philip E. Batson, David Vanderbilt, Eric Garfunkel

Abstract: We report a new \{001\} planar defect found in SiGe nanopillars. The defect structure, determined by atomic resolution electron microscopy, matches the Humble defect model proposed for diamond. We also investigated several possible variants of the Humble structure using first principles calculations and found that the one lowest in energy was also in best agreement with the STEM images. The pillar… ▽ More We report a new \{001\} planar defect found in SiGe nanopillars. The defect structure, determined by atomic resolution electron microscopy, matches the Humble defect model proposed for diamond. We also investigated several possible variants of the Humble structure using first principles calculations and found that the one lowest in energy was also in best agreement with the STEM images. The pillar composition has been analyzed with electron energy loss spectroscopy, which hints at how the defect is formed. Our results show that the structure and formation process of defects in nanostructured group IV semiconductors can be different from their bulk counterparts. △ Less

Submitted 7 November, 2021; originally announced November 2021.

arXiv:2106.01998 [pdf, other]

Toward Explainable Users: Using NLP to Enable AI to Understand Users' Perceptions of Cyber Attacks

Authors: Faranak Abri, Luis Felipe Gutierrez, Chaitra T. Kulkarni, Akbar Siami Namin, Keith S. Jones

Abstract: To understand how end-users conceptualize consequences of cyber security attacks, we performed a card sorting study, a well-known technique in Cognitive Sciences, where participants were free to group the given consequences of chosen cyber attacks into as many categories as they wished using rationales they see fit. The results of the open card sorting study showed a large amount of inter-particip… ▽ More To understand how end-users conceptualize consequences of cyber security attacks, we performed a card sorting study, a well-known technique in Cognitive Sciences, where participants were free to group the given consequences of chosen cyber attacks into as many categories as they wished using rationales they see fit. The results of the open card sorting study showed a large amount of inter-participant variation making the research team wonder how the consequences of security attacks were comprehended by the participants. As an exploration of whether it is possible to explain user's mental model and behavior through Artificial Intelligence (AI) techniques, the research team compared the card sorting data with the outputs of a number of Natural Language Processing (NLP) techniques with the goal of understanding how participants perceived and interpreted the consequences of cyber attacks written in natural languages. The results of the NLP-based exploration methods revealed an interesting observation implying that participants had mostly employed checking individual keywords in each sentence to group cyber attack consequences together and less considered the semantics behind the description of consequences of cyber attacks. The results reported in this paper are seemingly useful and important for cyber attacks comprehension from user's perspectives. To the best of our knowledge, this paper is the first introducing the use of AI techniques in explaining and modeling users' behavior and their perceptions about a context. The novel idea introduced here is about explaining users using AI. △ Less

Submitted 3 June, 2021; originally announced June 2021.

Comments: 20 pages, 3 figures, COMPSAC'21

arXiv:2012.14488 [pdf, other]

Phishing Detection through Email Embeddings

Authors: Luis Felipe Gutiérrez, Faranak Abri, Miriam Armstrong, Akbar Siami Namin, Keith S. Jones

Abstract: The problem of detecting phishing emails through machine learning techniques has been discussed extensively in the literature. Conventional and state-of-the-art machine learning algorithms have demonstrated the possibility of building classifiers with high accuracy. The existing research studies treat phishing and genuine emails through general indicators and thus it is not exactly clear what phis… ▽ More The problem of detecting phishing emails through machine learning techniques has been discussed extensively in the literature. Conventional and state-of-the-art machine learning algorithms have demonstrated the possibility of building classifiers with high accuracy. The existing research studies treat phishing and genuine emails through general indicators and thus it is not exactly clear what phishing features are contributing to variations of the classifiers. In this paper, we crafted a set of phishing and legitimate emails with similar indicators in order to investigate whether these cues are captured or disregarded by email embeddings, i.e., vectorizations. We then fed machine learning classifiers with the carefully crafted emails to find out about the performance of email embeddings developed. Our results show that using these indicators, email embeddings techniques is effective for classifying emails as phishing or legitimate. △ Less

Submitted 28 December, 2020; originally announced December 2020.

arXiv:2012.02643 [pdf, other]

Predicting Emotions Perceived from Sounds

Authors: Faranak Abri, Luis Felipe Gutiérrez, Akbar Siami Namin, David R. W. Sears, Keith S. Jones

Abstract: Sonification is the science of communication of data and events to users through sounds. Auditory icons, earcons, and speech are the common auditory display schemes utilized in sonification, or more specifically in the use of audio to convey information. Once the captured data are perceived, their meanings, and more importantly, intentions can be interpreted more easily and thus can be employed as… ▽ More Sonification is the science of communication of data and events to users through sounds. Auditory icons, earcons, and speech are the common auditory display schemes utilized in sonification, or more specifically in the use of audio to convey information. Once the captured data are perceived, their meanings, and more importantly, intentions can be interpreted more easily and thus can be employed as a complement to visualization techniques. Through auditory perception it is possible to convey information related to temporal, spatial, or some other context-oriented information. An important research question is whether the emotions perceived from these auditory icons or earcons are predictable in order to build an automated sonification platform. This paper conducts an experiment through which several mainstream and conventional machine learning algorithms are developed to study the prediction of emotions perceived from sounds. To do so, the key features of sounds are captured and then are modeled using machine learning algorithms using feature reduction techniques. We observe that it is possible to predict perceived emotions with high accuracy. In particular, the regression based on Random Forest demonstrated its superiority compared to other machine learning algorithms. △ Less

Submitted 4 December, 2020; originally announced December 2020.

Comments: 10 pages

arXiv:2012.00648 [pdf, other]

Cyber-Attack Consequence Prediction

Authors: Prerit Datta, Natalie Lodinger, Akbar Siami Namin, Keith S. Jones

Abstract: Cyber-physical systems posit a complex number of security challenges due to interconnection of heterogeneous devices having limited processing, communication, and power capabilities. Additionally, the conglomeration of both physical and cyber-space further makes it difficult to devise a single security plan spanning both these spaces. Cyber-security researchers are often overloaded with a variety… ▽ More Cyber-physical systems posit a complex number of security challenges due to interconnection of heterogeneous devices having limited processing, communication, and power capabilities. Additionally, the conglomeration of both physical and cyber-space further makes it difficult to devise a single security plan spanning both these spaces. Cyber-security researchers are often overloaded with a variety of cyber-alerts on a daily basis many of which turn out to be false positives. In this paper, we use machine learning and natural language processing techniques to predict the consequences of cyberattacks. The idea is to enable security researchers to have tools at their disposal that makes it easier to communicate the attack consequences with various stakeholders who may have little to no cybersecurity expertise. Additionally, with the proposed approach researchers' cognitive load can be reduced by automatically predicting the consequences of attacks in case new attacks are discovered. We compare the performance through various machine learning models employing word vectors obtained using both tf-idf and Doc2Vec models. In our experiments, an accuracy of 60% was obtained using tf-idf features and 57% using Doc2Vec method for models based on LinearSVC model. △ Less

Submitted 2 December, 2020; v1 submitted 1 December, 2020; originally announced December 2020.

Comments: 9 pages. The pre-print of a paper to appear in the proceedings of the 3rd Workshop on Big Data Engineering and Analytics in Cyber-Physical Systems (BigEACPS'20), IEEE BigData Conference 2020

arXiv:2010.04260 [pdf, other]

Fake Reviews Detection through Analysis of Linguistic Features

Authors: Faranak Abri, Luis Felipe Gutierrez, Akbar Siami Namin, Keith S. Jones, David R. W. Sears

Abstract: Online reviews play an integral part for success or failure of businesses. Prior to purchasing services or goods, customers first review the online comments submitted by previous customers. However, it is possible to superficially boost or hinder some businesses through posting counterfeit and fake reviews. This paper explores a natural language processing approach to identify fake reviews. We pre… ▽ More Online reviews play an integral part for success or failure of businesses. Prior to purchasing services or goods, customers first review the online comments submitted by previous customers. However, it is possible to superficially boost or hinder some businesses through posting counterfeit and fake reviews. This paper explores a natural language processing approach to identify fake reviews. We present a detailed analysis of linguistic features for distinguishing fake and trustworthy online reviews. We study 15 linguistic features and measure their significance and importance towards the classification schemes employed in this study. Our results indicate that fake reviews tend to include more redundant terms and pauses, and generally contain longer sentences. The application of several machine learning classification algorithms revealed that we were able to discriminate fake from real reviews with high accuracy using these linguistic features. △ Less

Submitted 8 October, 2020; originally announced October 2020.

Comments: The pre-print of a paper to appear in the proceedings of the IEEE International Conference on Machine Learning Applications (ICMLA 2020), 11 pages, 3 figures, 5 tables

arXiv:2006.07914 [pdf, other]

Cloud as an Attack Platform

Authors: Moitrayee Chatterjee, Prerit Datta, Faranak Abri, Akbar Siami Namin, Keith S. Jones

Abstract: We present an exploratory study of responses from $75$ security professionals and ethical hackers in order to understand how they abuse cloud platforms for attack purposes. The participants were recruited at the Black Hat and DEF CON conferences. We presented the participants' with various attack scenarios and asked them to explain the steps they would have carried out for launching the attack in… ▽ More We present an exploratory study of responses from $75$ security professionals and ethical hackers in order to understand how they abuse cloud platforms for attack purposes. The participants were recruited at the Black Hat and DEF CON conferences. We presented the participants' with various attack scenarios and asked them to explain the steps they would have carried out for launching the attack in each scenario. Participants' responses were studied to understand attackers' mental models, which would improve our understanding of necessary security controls and recommendations regarding precautionary actions to circumvent the exploitation of clouds for malicious activities. We observed that in 93.78% of the responses, participants are abusing cloud services to establish their attack environment and launch attacks. △ Less

Submitted 14 June, 2020; originally announced June 2020.

arXiv:2006.07912 [pdf, other]

Fake Reviews Detection through Ensemble Learning

Authors: Luis Gutierrez-Espinoza, Faranak Abri, Akbar Siami Namin, Keith S. Jones, David R. W. Sears

Abstract: Customers represent their satisfactions of consuming products by sharing their experiences through the utilization of online reviews. Several machine learning-based approaches can automatically detect deceptive and fake reviews. Recently, there have been studies reporting the performance of ensemble learning-based approaches in comparison to conventional machine learning techniques. Motivated by t… ▽ More Customers represent their satisfactions of consuming products by sharing their experiences through the utilization of online reviews. Several machine learning-based approaches can automatically detect deceptive and fake reviews. Recently, there have been studies reporting the performance of ensemble learning-based approaches in comparison to conventional machine learning techniques. Motivated by the recent trends in ensemble learning, this paper evaluates the performance of ensemble learning-based approaches to identify bogus online information. The application of a number of ensemble learning-based approaches to a collection of fake restaurant reviews that we developed show that these ensemble learning-based approaches detect deceptive information better than conventional machine learning algorithms. △ Less

Submitted 14 June, 2020; originally announced June 2020.

arXiv:2006.07908 [pdf, other]

Launching Stealth Attacks using Cloud

Authors: Moitrayee Chatterjee, Prerit Datta, Faranak Abri, Akbar Siami Namin, Keith S. Jones

Abstract: Cloud computing offers users scalable platforms and low resource cost. At the same time, the off-site location of the resources of this service model makes it more vulnerable to certain types of adversarial actions. Cloud computing has not only gained major user base, but also, it has the features that attackers can leverage to remain anonymous and stealth. With convenient access to data and techn… ▽ More Cloud computing offers users scalable platforms and low resource cost. At the same time, the off-site location of the resources of this service model makes it more vulnerable to certain types of adversarial actions. Cloud computing has not only gained major user base, but also, it has the features that attackers can leverage to remain anonymous and stealth. With convenient access to data and technology, cloud has turned into an attack platform among other utilization. This paper reports our study to show that cyber attackers heavily abuse the public cloud platforms to setup their attack environments and launch stealth attacks. The paper first reviews types of attacks launched through cloud environment. It then reports case studies through which the processes of launching cyber attacks using clouds are demonstrated. △ Less

Submitted 14 June, 2020; originally announced June 2020.

arXiv:1805.08272 [pdf]

The Sounds of Cyber Threats

Authors: Akbar Siami Namin, Rattikorn Hewett, Keith S. Jones, Rona Pogrund

Abstract: The Internet enables users to access vast resources, but it can also expose users to harmful cyber-attacks. This paper investigates human factors issues concerning the use of sounds in a cyber-security domain. It describes a methodology, referred to as sonification, to effectively design and develop auditory cyber-security threat indicators to warn users about cyber-attacks. A case study is presen… ▽ More The Internet enables users to access vast resources, but it can also expose users to harmful cyber-attacks. This paper investigates human factors issues concerning the use of sounds in a cyber-security domain. It describes a methodology, referred to as sonification, to effectively design and develop auditory cyber-security threat indicators to warn users about cyber-attacks. A case study is presented, along with the results, of various types of usability testing with a number of Internet users who are visually impaired. The paper concludes with a discussion of future steps to enhance this work. △ Less

Submitted 21 May, 2018; originally announced May 2018.

Comments: 5 pages, 3 figures, 1 table, A poster paper presented at the 12th Symposium on Usable Privacy and Security (SOUPS 2016)

ACM Class: H.5.2

arXiv:cmp-lg/9805011 [pdf, ps, other]

Automatic summarising: factors and directions

Authors: Karen Sparck Jones

Abstract: This position paper suggests that progress with automatic summarising demands a better research methodology and a carefully focussed research strategy. In order to develop effective procedures it is necessary to identify and respond to the context factors, i.e. input, purpose, and output factors, that bear on summarising and its evaluation. The paper analyses and illustrates these factors and th… ▽ More This position paper suggests that progress with automatic summarising demands a better research methodology and a carefully focussed research strategy. In order to develop effective procedures it is necessary to identify and respond to the context factors, i.e. input, purpose, and output factors, that bear on summarising and its evaluation. The paper analyses and illustrates these factors and their implications for evaluation. It then argues that this analysis, together with the state of the art and the intrinsic difficulty of summarising, imply a nearer-term strategy concentrating on shallow, but not surface, text analysis and on indicative summarising. This is illustrated with current work, from which a potentially productive research programme can be developed. △ Less

Submitted 29 May, 1998; originally announced May 1998.

arXiv:cmp-lg/9702011 [pdf, ps, other]

How much has information technology contributed to linguistics?

Authors: Karen Sparck Jones

Abstract: Information technology should have much to offer linguistics, not only through the opportunities offered by large-scale data analysis and the stimulus to develop formal computational models, but through the chance to use language in systems for automatic natural language processing. The paper discusses these possibilities in detail, and then examines the actual work that has been done. It is evi… ▽ More Information technology should have much to offer linguistics, not only through the opportunities offered by large-scale data analysis and the stimulus to develop formal computational models, but through the chance to use language in systems for automatic natural language processing. The paper discusses these possibilities in detail, and then examines the actual work that has been done. It is evident that this has so far been primarily research within a new field, computational linguistics, which is largely motivated by the demands, and interest, of practical processing systems, and that information technology has had rather little influence on linguistics at large. There are different reasons for this, and not all good ones: information technology deserves more attention from linguists. △ Less

Submitted 17 February, 1997; originally announced February 1997.

Comments: Prepared for a British Academy Symposium on Information Technology and Scholarly Disciplines

arXiv:cmp-lg/9512004 [pdf, ps, other]

Natural language processing: she needs something old and something new (maybe something borrowed and something blue, too)

Authors: Karen Sparck Jones

Abstract: Given the present state of work in natural language processing, this address argues first, that advance in both science and applications requires a revival of concern about what language is about, broadly speaking the world; and second, that an attack on the summarising task, which is made ever more important by the growth of electronic text resources and requires an understanding of the role of… ▽ More Given the present state of work in natural language processing, this address argues first, that advance in both science and applications requires a revival of concern about what language is about, broadly speaking the world; and second, that an attack on the summarising task, which is made ever more important by the growth of electronic text resources and requires an understanding of the role of large-scale discourse structure in marking important text content, is a good way forward. △ Less

Submitted 21 December, 1995; originally announced December 1995.

Comments: Presidential Address, 1994, Association for Computational Linguistics

Showing 1–14 of 14 results for author: Jones, K S