Search | arXiv e-print repository

doi 10.1007/978-3-031-19097-1_23

Deception Detection with Feature-Augmentation by soft Domain Transfer

Authors: Sadat Shahriar, Arjun Mukherjee, Omprakash Gnawali

Abstract: In this era of information explosion, deceivers use different domains or mediums of information to exploit the users, such as News, Emails, and Tweets. Although numerous research has been done to detect deception in all these domains, information shortage in a new event necessitates these domains to associate with each other to battle deception. To form this association, we propose a feature augme… ▽ More In this era of information explosion, deceivers use different domains or mediums of information to exploit the users, such as News, Emails, and Tweets. Although numerous research has been done to detect deception in all these domains, information shortage in a new event necessitates these domains to associate with each other to battle deception. To form this association, we propose a feature augmentation method by harnessing the intermediate layer representation of neural models. Our approaches provide an improvement over the self-domain baseline models by up to 6.60%. We find Tweets to be the most helpful information provider for Fake News and Phishing Email detection, whereas News helps most in Tweet Rumor detection. Our analysis provides a useful insight for domain knowledge transfer which can help build a stronger deception detection system than the existing literature. △ Less

Submitted 1 May, 2023; originally announced May 2023.

arXiv:2301.04781 [pdf, other]

Bug Hunters' Perspectives on the Challenges and Benefits of the Bug Bounty Ecosystem

Authors: Omer Akgul, Taha Eghtesad, Amit Elazari, Omprakash Gnawali, Jens Grossklags, Michelle L. Mazurek, Daniel Votipka, Aron Laszka

Abstract: Although researchers have characterized the bug-bounty ecosystem from the point of view of platforms and programs, minimal effort has been made to understand the perspectives of the main workers: bug hunters. To improve bug bounties, it is important to understand hunters' motivating factors, challenges, and overall benefits. We address this research gap with three studies: identifying key factors… ▽ More Although researchers have characterized the bug-bounty ecosystem from the point of view of platforms and programs, minimal effort has been made to understand the perspectives of the main workers: bug hunters. To improve bug bounties, it is important to understand hunters' motivating factors, challenges, and overall benefits. We address this research gap with three studies: identifying key factors through a free listing survey (n=56), rating each factor's importance with a larger-scale factor-rating survey (n=159), and conducting semi-structured interviews to uncover details (n=24). Of 54 factors that bug hunters listed, we find that rewards and learning opportunities are the most important benefits. Further, we find scope to be the top differentiator between programs. Surprisingly, we find earning reputation to be one of the least important motivators for hunters. Of the challenges we identify, communication problems, such as unresponsiveness and disputes, are the most substantial. We present recommendations to make the bug-bounty ecosystem accommodating to more bug hunters and ultimately increase participation in an underutilized market. △ Less

Submitted 7 March, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

arXiv:2208.06792 [pdf]

Improving Phishing Detection Via Psychological Trait Scoring

Authors: Sadat Shahriar, Arjun Mukherjee, Omprakash Gnawali

Abstract: Phishing emails exhibit some unique psychological traits which are not present in legitimate emails. From empirical analysis and previous research, we find three psychological traits most dominant in Phishing emails - A Sense of Urgency, Inducing Fear by Threatening, and Enticement with Desire. We manually label 10% of all phishing emails in our training dataset for these three traits. We leverage… ▽ More Phishing emails exhibit some unique psychological traits which are not present in legitimate emails. From empirical analysis and previous research, we find three psychological traits most dominant in Phishing emails - A Sense of Urgency, Inducing Fear by Threatening, and Enticement with Desire. We manually label 10% of all phishing emails in our training dataset for these three traits. We leverage that knowledge by training BERT, Sentence-BERT (SBERT), and Character-level-CNN models and capturing the nuances via the last layers that form the Phishing Psychological Trait (PPT) scores. For the phishing email detection task, we use the pretrained BERT and SBERT model, and concatenate the PPT scores to feed into a fully-connected neural network model. Our results show that the addition of PPT scores improves the model performance significantly, thus indicating the effectiveness of PPT scores in capturing the psychological nuances. Furthermore, to mitigate the effect of the imbalanced training dataset, we use the GPT-2 model to generate phishing emails (Radford et al., 2019). Our best model outperforms the current State-of-the-Art (SOTA) model's F1 score by 4.54%. Additionally, our analysis of individual PPTs suggests that Fear provides the strongest cue in detecting phishing emails. △ Less

Submitted 14 August, 2022; originally announced August 2022.

arXiv:2111.10711 [pdf, other]

doi 10.26615/978-954-452-072-4_147

A Domain-Independent Holistic Approach to Deception Detection

Authors: Sadat Shahriar, Arjun Mukherjee, Omprakash Gnawali

Abstract: The deception in the text can be of different forms in different domains, including fake news, rumor tweets, and spam emails. Irrespective of the domain, the main intent of the deceptive text is to deceit the reader. Although domain-specific deception detection exists, domain-independent deception detection can provide a holistic picture, which can be crucial to understand how deception occurs in… ▽ More The deception in the text can be of different forms in different domains, including fake news, rumor tweets, and spam emails. Irrespective of the domain, the main intent of the deceptive text is to deceit the reader. Although domain-specific deception detection exists, domain-independent deception detection can provide a holistic picture, which can be crucial to understand how deception occurs in the text. In this paper, we detect deception in a domain-independent setting using deep learning architectures. Our method outperforms the State-of-the-Art (SOTA) performance of most benchmark datasets with an overall accuracy of 93.42% and F1-Score of 93.22%. The domain-independent training allows us to capture subtler nuances of deceptive writing style. Furthermore, we analyze how much in-domain data may be helpful to accurately detect deception, especially for the cases where data may not be readily available to train. Our results and analysis indicate that there may be a universal pattern of deception lying in-between the text independent of the domain, which can create a novel area of research and open up new avenues in the field of deception detection. △ Less

Submitted 20 November, 2021; originally announced November 2021.

arXiv:2108.00270 [pdf, other]

Opinion Prediction with User Fingerprinting

Authors: Kishore Tumarada, Yifan Zhang, Fan Yang, Eduard Dragut, Omprakash Gnawali, Arjun Mukherjee

Abstract: Opinion prediction is an emerging research area with diverse real-world applications, such as market research and situational awareness. We identify two lines of approaches to the problem of opinion prediction. One uses topic-based sentiment analysis with time-series modeling, while the other uses static embedding of text. The latter approaches seek user-specific solutions by generating user finge… ▽ More Opinion prediction is an emerging research area with diverse real-world applications, such as market research and situational awareness. We identify two lines of approaches to the problem of opinion prediction. One uses topic-based sentiment analysis with time-series modeling, while the other uses static embedding of text. The latter approaches seek user-specific solutions by generating user fingerprints. Such approaches are useful in predicting user's reactions to unseen content. In this work, we propose a novel dynamic fingerprinting method that leverages contextual embedding of user's comments conditioned on relevant user's reading history. We integrate BERT variants with a recurrent neural network to generate predictions. The results show up to 13\% improvement in micro F1-score compared to previous approaches. Experimental results show novel insights that were previously unknown such as better predictions for an increase in dynamic history length, the impact of the nature of the article on performance, thereby laying the foundation for further research. △ Less

Submitted 10 September, 2021; v1 submitted 31 July, 2021; originally announced August 2021.

Comments: 10 pages, 6 figures, RANLP conference 2021

arXiv:2102.10260 [pdf, other]

Wireless sensor network for in situ soil moisture monitoring

Authors: Jianing Fang, Chuheng Hu, Nour Smaoui, Doug Carlson, Jayant Gupchup, Razvan Musaloiu-E., Chieh-Jan Mike Liang, Marcus Chang, Omprakash Gnawali, Tamas Budavari, Andreas Terzis, Katalin Szlavecz, Alexander S. Szalay

Abstract: We discuss the history and lessons learned from a series of deployments of environmental sensors measuring soil parameters and CO2 fluxes over the last fifteen years, in an outdoor environment. We present the hardware and software architecture of our current Gen-3 system, and then discuss how we are simplifying the user facing part of the software, to make it easier and friendlier for the environm… ▽ More We discuss the history and lessons learned from a series of deployments of environmental sensors measuring soil parameters and CO2 fluxes over the last fifteen years, in an outdoor environment. We present the hardware and software architecture of our current Gen-3 system, and then discuss how we are simplifying the user facing part of the software, to make it easier and friendlier for the environmental scientist to be in full control of the system. Finally, we describe the current effort to build a large-scale Gen-4 sensing platform consisting of hundreds of nodes to track the environmental parameters for urban green spaces in Baltimore, Maryland. △ Less

Submitted 20 February, 2021; originally announced February 2021.

Comments: 12 pages, 16 figures, Sensornets 2021 Conference

arXiv:2008.13064 [pdf, other]

doi 10.1145/3416506.3423580

Towards Demystifying Dimensions of Source Code Embeddings

Authors: Md Rafiqul Islam Rabin, Arjun Mukherjee, Omprakash Gnawali, Mohammad Amin Alipour

Abstract: Source code representations are key in applying machine learning techniques for processing and analyzing programs. A popular approach in representing source code is neural source code embeddings that represents programs with high-dimensional vectors computed by training deep neural networks on a large volume of programs. Although successful, there is little known about the contents of these vector… ▽ More Source code representations are key in applying machine learning techniques for processing and analyzing programs. A popular approach in representing source code is neural source code embeddings that represents programs with high-dimensional vectors computed by training deep neural networks on a large volume of programs. Although successful, there is little known about the contents of these vectors and their characteristics. In this paper, we present our preliminary results towards better understanding the contents of code2vec neural source code embeddings. In particular, in a small case study, we use the code2vec embeddings to create binary SVM classifiers and compare their performance with the handcrafted features. Our results suggest that the handcrafted features can perform very close to the highly-dimensional code2vec embeddings, and the information gains are more evenly distributed in the code2vec embeddings compared to the handcrafted features. We also find that the code2vec embeddings are more resilient to the removal of dimensions with low information gains than the handcrafted features. We hope our results serve a step** stone toward principled analysis and evaluation of these code representations. △ Less

Submitted 28 September, 2020; v1 submitted 29 August, 2020; originally announced August 2020.

Comments: 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Languages, Co-located with ESEC/FSE (RL+SE&PL'20)

arXiv:2008.00087 [pdf, other]

Adaptive Bitrate Video Streaming for Wireless nodes: A Survey

Authors: Kamran Nishat, Omprakash Gnawali, Ahmed Abdelhadi

Abstract: In today's Internet, video is the most dominant application and in addition to this, wireless networks such as WiFi, Cellular, and Bluetooth have become ubiquitous. Hence, most of the Internet traffic is video over wireless nodes. There is a plethora of research to improve video streaming to achieve high Quality of Experience (QoE) over the Internet. Many of them focus on wireless nodes. Recent me… ▽ More In today's Internet, video is the most dominant application and in addition to this, wireless networks such as WiFi, Cellular, and Bluetooth have become ubiquitous. Hence, most of the Internet traffic is video over wireless nodes. There is a plethora of research to improve video streaming to achieve high Quality of Experience (QoE) over the Internet. Many of them focus on wireless nodes. Recent measurement studies often show QoE of video suffers in many wireless clients over the Internet. Recently, many research papers have presented models and schemes to optimize the Adaptive BitRate (ABR) based video streaming for wireless and mobile users. In this survey, we present a comprehensive overview of recent work in the area of Internet video specially designed for wireless network. Recent research has suggested that there are some new challenges added by the connectivity of clients through wireless. Also these challenges become more difficult to handle when these nodes are mobile. This survey also discusses new potential areas of future research due to the increasing scarcity of wireless spectrum. △ Less

Submitted 27 July, 2020; originally announced August 2020.

arXiv:2006.13499 [pdf, other]

Less is More: Exploiting Social Trust to Increase the Effectiveness of a Deception Attack

Authors: Shahryar Baki, Rakesh M. Verma, Arjun Mukherjee, Omprakash Gnawali

Abstract: Cyber attacks such as phishing, IRS scams, etc., still are successful in fooling Internet users. Users are the last line of defense against these attacks since attackers seem to always find a way to bypass security systems. Understanding users' reason about the scams and frauds can help security providers to improve users security hygiene practices. In this work, we study the users' reasoning and… ▽ More Cyber attacks such as phishing, IRS scams, etc., still are successful in fooling Internet users. Users are the last line of defense against these attacks since attackers seem to always find a way to bypass security systems. Understanding users' reason about the scams and frauds can help security providers to improve users security hygiene practices. In this work, we study the users' reasoning and the effectiveness of several variables within the context of the company representative fraud. Some of the variables that we study are: 1) the effect of using LinkedIn as a medium for delivering the phishing message instead of using email, 2) the effectiveness of natural language generation techniques in generating phishing emails, and 3) how some simple customizations, e.g., adding sender's contact info to the email, affect participants perception. The results obtained from the within-subject study show that participants are not prepared even for a well-known attack - company representative fraud. Findings include: approximately 65% mean detection rate and insights into how the success rate changes with the facade and correspondent (sender/receiver) information. A significant finding is that a smaller set of well-chosen strategies is better than a large `mess' of strategies. We also find significant differences in how males and females approach the same company representative fraud. Insights from our work could help defenders in develo** better strategies to evaluate their defenses and in devising better training strategies. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: 15 pages, 6 figures

ACM Class: H.5.m; I.2.7; J.4

arXiv:1906.10607 [pdf, other]

Newswire versus Social Media for Disaster Response and Recovery

Authors: Rakesh Verma, Samaneh Karimi, Daniel Lee, Omprakash Gnawali, Azadeh Shakery

Abstract: In a disaster situation, first responders need to quickly acquire situational awareness and prioritize response based on the need, resources available and impact. Can they do this based on digital media such as Twitter alone, or newswire alone, or some combination of the two? We examine this question in the context of the 2015 Nepal Earthquakes. Because newswire articles are longer, effective summ… ▽ More In a disaster situation, first responders need to quickly acquire situational awareness and prioritize response based on the need, resources available and impact. Can they do this based on digital media such as Twitter alone, or newswire alone, or some combination of the two? We examine this question in the context of the 2015 Nepal Earthquakes. Because newswire articles are longer, effective summaries can be helpful in saving time yet giving key content. We evaluate the effectiveness of several unsupervised summarization techniques in capturing key content. We propose a method to link tweets written by the public and newswire articles, so that we can compare their key characteristics: timeliness, whether tweets appear earlier than their corresponding news articles, and content. A novel idea is to view relevant tweets as a summary of the matching news article and evaluate these summaries. Whenever possible, we present both quantitative and qualitative evaluations. One of our main findings is that tweets and newswire articles provide complementary perspectives that form a holistic view of the disaster situation. △ Less

Submitted 25 June, 2019; originally announced June 2019.

arXiv:1902.06384 [pdf, ps, other]

Topics of Concern: Identifying User Issues in Reviews of IoT Apps and Devices

Authors: Andrew Truelove, Farah Naz Chowdhury, Omprakash Gnawali, Mohammad Amin Alipour

Abstract: Internet of Things (IoT) systems are bundles of networked sensors and actuators that are deployed in an environment and act upon the sensory data that they receive. These systems, especially consumer electronics, have two main cooperating components: a device and a mobile app. The unique combination of hardware and software in IoT systems presents challenges that are lesser known to mainstream sof… ▽ More Internet of Things (IoT) systems are bundles of networked sensors and actuators that are deployed in an environment and act upon the sensory data that they receive. These systems, especially consumer electronics, have two main cooperating components: a device and a mobile app. The unique combination of hardware and software in IoT systems presents challenges that are lesser known to mainstream software developers. They might require innovative solutions to support the development and integration of such systems. In this paper, we analyze more than 90,000 reviews of ten IoT devices and their corresponding apps and extract the issues that users encountered while using these systems. Our results indicate that issues with connectivity, timing, and updates are particularly prevalent in the reviews. Our results call for a new software-hardware development framework to assist the development of reliable IoT systems. △ Less

Submitted 29 March, 2019; v1 submitted 17 February, 2019; originally announced February 2019.

Comments: 1st International Workshop on Software Engineering Research & Practices for the Internet of Things (SERP4IoT 2019)

Showing 1–11 of 11 results for author: Gnawali, O