-
Geometry Based Machining Feature Retrieval with Inductive Transfer Learning
Authors:
N S Kamal,
Barathi Ganesh HB,
Sajith Variyar VV,
Sowmya V,
Soman KP
Abstract:
Manufacturing industries have widely adopted the reuse of machine parts as a method to reduce costs and as a sustainable manufacturing practice. Identification of reusable features from the design of the parts and finding their similar features from the database is an important part of this process. In this project, with the help of fully convolutional geometric features, we are able to extract an…
▽ More
Manufacturing industries have widely adopted the reuse of machine parts as a method to reduce costs and as a sustainable manufacturing practice. Identification of reusable features from the design of the parts and finding their similar features from the database is an important part of this process. In this project, with the help of fully convolutional geometric features, we are able to extract and learn the high level semantic features from CAD models with inductive transfer learning. The extracted features are then compared with that of other CAD models from the database using Frobenius norm and identical features are retrieved. Later we passed the extracted features to a deep convolutional neural network with a spatial pyramid pooling layer and the performance of the feature retrieval increased significantly. It was evident from the results that the model could effectively capture the geometrical elements from machining features.
△ Less
Submitted 15 November, 2021; v1 submitted 26 August, 2021;
originally announced August 2021.
-
Deep Learning based Frameworks for Handling Imbalance in DGA, Email, and URL Data Analysis
Authors:
Simran K,
Prathiksha Balakrishna,
Vinayakumar Ravi,
Soman KP
Abstract:
Deep learning is a state of the art method for a lot of applications. The main issue is that most of the real-time data is highly imbalanced in nature. In order to avoid bias in training, cost-sensitive approach can be used. In this paper, we propose cost-sensitive deep learning based frameworks and the performance of the frameworks is evaluated on three different Cyber Security use cases which ar…
▽ More
Deep learning is a state of the art method for a lot of applications. The main issue is that most of the real-time data is highly imbalanced in nature. In order to avoid bias in training, cost-sensitive approach can be used. In this paper, we propose cost-sensitive deep learning based frameworks and the performance of the frameworks is evaluated on three different Cyber Security use cases which are Domain Generation Algorithm (DGA), Electronic mail (Email), and Uniform Resource Locator (URL). Various experiments were performed using cost-insensitive as well as cost-sensitive methods and parameters for both of these methods are set based on hyperparameter tuning. In all experiments, the cost-sensitive deep learning methods performed better than the cost-insensitive approaches. This is mainly due to the reason that cost-sensitive approach gives importance to the classes which have a very less number of samples during training and this helps to learn all the classes in a more efficient manner.
△ Less
Submitted 17 October, 2020; v1 submitted 30 March, 2020;
originally announced April 2020.
-
Deep Learning Approach for Enhanced Cyber Threat Indicators in Twitter Stream
Authors:
Simran K,
Prathiksha Balakrishna,
Vinayakumar R,
Soman KP
Abstract:
In recent days, the amount of Cyber Security text data shared via social media resources mainly Twitter has increased. An accurate analysis of this data can help to develop cyber threat situational awareness framework for a cyber threat. This work proposes a deep learning based approach for tweet data analysis. To convert the tweets into numerical representations, various text representations are…
▽ More
In recent days, the amount of Cyber Security text data shared via social media resources mainly Twitter has increased. An accurate analysis of this data can help to develop cyber threat situational awareness framework for a cyber threat. This work proposes a deep learning based approach for tweet data analysis. To convert the tweets into numerical representations, various text representations are employed. These features are feed into deep learning architecture for optimal feature extraction as well as classification. Various hyperparameter tuning approaches are used for identifying optimal text representation method as well as optimal network parameters and network structures for deep learning models. For comparative analysis, the classical text representation method with classical machine learning algorithm is employed. From the detailed analysis of experiments, we found that the deep learning architecture with advanced text representation methods performed better than the classical text representation and classical machine learning algorithms. The primary reason for this is that the advanced text representation methods have the capability to learn sequential properties which exist among the textual data and deep learning architectures learns the optimal features along with decreasing the feature size.
△ Less
Submitted 30 March, 2020;
originally announced April 2020.
-
Deep Learning Approach for Intelligent Named Entity Recognition of Cyber Security
Authors:
Simran K,
Sriram S,
Vinayakumar R,
Soman KP
Abstract:
In recent years, the amount of Cyber Security data generated in the form of unstructured texts, for example, social media resources, blogs, articles, and so on has exceptionally increased. Named Entity Recognition (NER) is an initial step towards converting this unstructured data into structured data which can be used by a lot of applications. The existing methods on NER for Cyber Security data ar…
▽ More
In recent years, the amount of Cyber Security data generated in the form of unstructured texts, for example, social media resources, blogs, articles, and so on has exceptionally increased. Named Entity Recognition (NER) is an initial step towards converting this unstructured data into structured data which can be used by a lot of applications. The existing methods on NER for Cyber Security data are based on rules and linguistic characteristics. A Deep Learning (DL) based approach embedded with Conditional Random Fields (CRFs) is proposed in this paper. Several DL architectures are evaluated to find the most optimal architecture. The combination of Bidirectional Gated Recurrent Unit (Bi-GRU), Convolutional Neural Network (CNN), and CRF performed better compared to various other DL frameworks on a publicly available benchmark dataset. This may be due to the reason that the bidirectional structures preserve the features related to the future and previous words in a sequence.
△ Less
Submitted 30 March, 2020;
originally announced April 2020.
-
Dynamic Mode Decomposition based feature for Image Classification
Authors:
Rahul-Vigneswaran K,
Sachin-Kumar S,
Neethu Mohan,
Soman KP
Abstract:
Irrespective of the fact that Machine learning has produced groundbreaking results, it demands an enormous amount of data in order to perform so. Even though data production has been in its all-time high, almost all the data is unlabelled, hence making them unsuitable for training the algorithms. This paper proposes a novel method of extracting the features using Dynamic Mode Decomposition (DMD).…
▽ More
Irrespective of the fact that Machine learning has produced groundbreaking results, it demands an enormous amount of data in order to perform so. Even though data production has been in its all-time high, almost all the data is unlabelled, hence making them unsuitable for training the algorithms. This paper proposes a novel method of extracting the features using Dynamic Mode Decomposition (DMD). The experiment is performed using data samples from Imagenet. The learning is done using SVM-linear, SVM-RBF, Random Kitchen Sink approach (RKS). The results have shown that DMD features with RKS give competing results.
△ Less
Submitted 7 October, 2019;
originally announced October 2019.
-
Data-driven Computing in Elasticity via Chebyshev Approximation
Authors:
Rahul-Vigneswaran K,
Neethu Mohan,
Soman KP
Abstract:
This paper proposes a data-driven approach for computing elasticity by means of a non-parametric regression approach rather than an optimization approach. The Chebyshev approximation is utilized for tackling the material data-sets non-linearity of the elasticity. Also, additional efforts have been taken to compare the results with several other state-of-the-art methodologies.
This paper proposes a data-driven approach for computing elasticity by means of a non-parametric regression approach rather than an optimization approach. The Chebyshev approximation is utilized for tackling the material data-sets non-linearity of the elasticity. Also, additional efforts have been taken to compare the results with several other state-of-the-art methodologies.
△ Less
Submitted 23 April, 2019;
originally announced April 2019.
-
A Compendium on Network and Host based Intrusion Detection Systems
Authors:
Rahul-Vigneswaran K,
Prabaharan Poornachandran,
Soman KP
Abstract:
The techniques of deep learning have become the state of the art methodology for executing complicated tasks from various domains of computer vision, natural language processing, and several other areas. Due to its rapid development and promising benchmarks in those fields, researchers started experimenting with this technique to perform in the area of, especially in intrusion detection related ta…
▽ More
The techniques of deep learning have become the state of the art methodology for executing complicated tasks from various domains of computer vision, natural language processing, and several other areas. Due to its rapid development and promising benchmarks in those fields, researchers started experimenting with this technique to perform in the area of, especially in intrusion detection related tasks. Deep learning is a subset and a natural extension of classical Machine learning and an evolved model of neural networks. This paper contemplates and discusses all the methodologies related to the leading edge Deep learning and Neural network models purposing to the arena of Intrusion Detection Systems.
△ Less
Submitted 6 April, 2019;
originally announced April 2019.
-
RNNSecureNet: Recurrent neural networks for Cyber security use-cases
Authors:
Mohammed Harun Babu R,
Vinayakumar R,
Soman KP
Abstract:
Recurrent neural network (RNN) is an effective neural network in solving very complex supervised and unsupervised tasks. There has been a significant improvement in RNN field such as natural language processing, speech processing, computer vision and other multiple domains. This paper deals with RNN application on different use cases like Incident Detection, Fraud Detection, and Android Malware Cl…
▽ More
Recurrent neural network (RNN) is an effective neural network in solving very complex supervised and unsupervised tasks. There has been a significant improvement in RNN field such as natural language processing, speech processing, computer vision and other multiple domains. This paper deals with RNN application on different use cases like Incident Detection, Fraud Detection, and Android Malware Classification. The best performing neural network architecture is chosen by conducting different chain of experiments for different network parameters and structures. The network is run up to 1000 epochs with learning rate set in the range of 0.01 to 0.5.Obviously, RNN performed very well when compared to classical machine learning algorithms. This is mainly possible because RNNs implicitly extracts the underlying features and also identifies the characteristics of the data. This helps to achieve better accuracy.
△ Less
Submitted 5 January, 2019;
originally announced January 2019.
-
Emotion Detection using Data Driven Models
Authors:
Naveenkumar K S,
Vinayakumar R,
Soman KP
Abstract:
Text is the major method that is used for communication now a days, each and every day lots of text are created. In this paper the text data is used for the classification of the emotions. Emotions are the way of expression of the persons feelings which has an high influence on the decision making tasks. Datasets are collected which are available publically and combined together based on the three…
▽ More
Text is the major method that is used for communication now a days, each and every day lots of text are created. In this paper the text data is used for the classification of the emotions. Emotions are the way of expression of the persons feelings which has an high influence on the decision making tasks. Datasets are collected which are available publically and combined together based on the three emotions that are considered here positive, negative and neutral. In this paper we have proposed the text representation method TFIDF and keras embedding and then given to the classical machine learning algorithms of which Logistics Regression gives the highest accuracy of about 75.6%, after which it is passed to the deep learning algorithm which is the CNN which gives the state of art accuracy of about 45.25%. For the research purpose the datasets that has been collected are released.
△ Less
Submitted 10 January, 2019;
originally announced January 2019.
-
An Insight into the Dynamics and State Space Modelling of a 3-D Quadrotor
Authors:
Rahul Vigneswaran K,
Soman KP
Abstract:
Drones have gained popularity in a wide range of field ranging from aerial photography, aerial map**, and investigation of electric power lines. Every drone that we know today is carrying out some kind of control algorithm at the low level in order to manoeuvre itself around. For the quadrotor to either control itself autonomously or to develop a high-level user interface for us to control it, w…
▽ More
Drones have gained popularity in a wide range of field ranging from aerial photography, aerial map**, and investigation of electric power lines. Every drone that we know today is carrying out some kind of control algorithm at the low level in order to manoeuvre itself around. For the quadrotor to either control itself autonomously or to develop a high-level user interface for us to control it, we need to understand the basic mathematics behind how it functions. This paper aims to explain the mathematical modelling of the dynamics of a 3 Dimensional quadrotor. As it may seem like a trivial task, it plays a vital role in how we control the drone. Also, additional effort has been taken to explain the transformations of the drone's frame of reference to the inertial frame of reference.
△ Less
Submitted 4 January, 2019;
originally announced January 2019.
-
A Deep Learning Approach for Similar Languages, Varieties and Dialects
Authors:
Vidya Prasad K,
Akarsh S,
Vinayakumar R,
Soman KP
Abstract:
Deep learning mechanisms are prevailing approaches in recent days for the various tasks in natural language processing, speech recognition, image processing and many others. To leverage this we use deep learning based mechanism specifically Bidirectional- Long Short-Term Memory (B-LSTM) for the task of dialectic identification in Arabic and German broadcast speech and Long Short-Term Memory (LSTM)…
▽ More
Deep learning mechanisms are prevailing approaches in recent days for the various tasks in natural language processing, speech recognition, image processing and many others. To leverage this we use deep learning based mechanism specifically Bidirectional- Long Short-Term Memory (B-LSTM) for the task of dialectic identification in Arabic and German broadcast speech and Long Short-Term Memory (LSTM) for discriminating between similar Languages. Two unique B-LSTM models are created using the Large-vocabulary Continuous Speech Recognition (LVCSR) based lexical features and a fixed length of 400 per utterance bottleneck features generated by i-vector framework. These models were evaluated on the VarDial 2017 datasets for the tasks Arabic, German dialect identification with dialects of Egyptian, Gulf, Levantine, North African, and MSA for Arabic and Basel, Bern, Lucerne, and Zurich for German. Also for the task of Discriminating between Similar Languages like Bosnian, Croatian and Serbian. The B-LSTM model showed accuracy of 0.246 on lexical features and accuracy of 0.577 bottleneck features of i-Vector framework.
△ Less
Submitted 2 January, 2019;
originally announced January 2019.
-
A short review on Applications of Deep learning for Cyber security
Authors:
Mohammed Harun Babu R,
Vinayakumar R,
Soman KP
Abstract:
Deep learning is an advanced model of traditional machine learning. This has the capability to extract optimal feature representation from raw input samples. This has been applied towards various use cases in cyber security such as intrusion detection, malware classification, android malware detection, spam and phishing detection and binary analysis. This paper outlines the survey of all the works…
▽ More
Deep learning is an advanced model of traditional machine learning. This has the capability to extract optimal feature representation from raw input samples. This has been applied towards various use cases in cyber security such as intrusion detection, malware classification, android malware detection, spam and phishing detection and binary analysis. This paper outlines the survey of all the works related to deep learning based solutions for various cyber security use cases. Keywords: Deep learning, intrusion detection, malware detection, Android malware detection, spam & phishing detection, traffic analysis, binary analysis.
△ Less
Submitted 29 January, 2019; v1 submitted 15 December, 2018;
originally announced December 2018.
-
Deep-Net: Deep Neural Network for Cyber Security Use Cases
Authors:
Vinayakumar R,
Barathi Ganesh HB,
Prabaharan Poornachandran,
Anand Kumar M,
Soman KP
Abstract:
Deep neural networks (DNNs) have witnessed as a powerful approach in this year by solving long-standing Artificial intelligence (AI) supervised and unsupervised tasks exists in natural language processing, speech processing, computer vision and others. In this paper, we attempt to apply DNNs on three different cyber security use cases: Android malware classification, incident detection and fraud d…
▽ More
Deep neural networks (DNNs) have witnessed as a powerful approach in this year by solving long-standing Artificial intelligence (AI) supervised and unsupervised tasks exists in natural language processing, speech processing, computer vision and others. In this paper, we attempt to apply DNNs on three different cyber security use cases: Android malware classification, incident detection and fraud detection. The data set of each use case contains real known benign and malicious activities samples. The efficient network architecture for DNN is chosen by conducting various trails of experiments for network parameters and network structures. The experiments of such chosen efficient configurations of DNNs are run up to 1000 epochs with learning rate set in the range [0.01-0.5]. Experiments of DNN performed well in comparison to the classical machine learning algorithms in all cases of experiments of cyber security use cases. This is due to the fact that DNNs implicitly extract and build better features, identifies the characteristics of the data that lead to better accuracy. The best accuracy obtained by DNN and XGBoost on Android malware classification 0.940 and 0.741, incident detection 1.00 and 0.997 fraud detection 0.972 and 0.916 respectively.
△ Less
Submitted 9 December, 2018;
originally announced December 2018.
-
A Brief Survey on Autonomous Vehicle Possible Attacks, Exploits and Vulnerabilities
Authors:
Amara Dinesh Kumar,
Koti Naga Renu Chebrolu,
Vinayakumar R,
Soman KP
Abstract:
Advanced driver assistance systems are advancing at a rapid pace and all major companies started investing in develo** the autonomous vehicles. But the security and reliability is still uncertain and debatable. Imagine that a vehicle is compromised by the attackers and then what they can do. An attacker can control brake, accelerate and even steering which can lead to catastrophic consequences.…
▽ More
Advanced driver assistance systems are advancing at a rapid pace and all major companies started investing in develo** the autonomous vehicles. But the security and reliability is still uncertain and debatable. Imagine that a vehicle is compromised by the attackers and then what they can do. An attacker can control brake, accelerate and even steering which can lead to catastrophic consequences. This paper gives a very short and brief overview of most of the possible attacks on autonomous vehicle software and hardware and their potential implications.
△ Less
Submitted 3 October, 2018;
originally announced October 2018.
-
DeepImageSpam: Deep Learning based Image Spam Detection
Authors:
Amara Dinesh Kumar,
Vinayakumar R,
Soman KP
Abstract:
Hackers and spammers are employing innovative and novel techniques to deceive novice and even knowledgeable internet users. Image spam is one of such technique where the spammer varies and changes some portion of the image such that it is indistinguishable from the original image fooling the users. This paper proposes a deep learning based approach for image spam detection using the convolutional…
▽ More
Hackers and spammers are employing innovative and novel techniques to deceive novice and even knowledgeable internet users. Image spam is one of such technique where the spammer varies and changes some portion of the image such that it is indistinguishable from the original image fooling the users. This paper proposes a deep learning based approach for image spam detection using the convolutional neural networks which uses a dataset with 810 natural images and 928 spam images for classification achieving an accuracy of 91.7% outperforming the existing image processing and machine learning techniques
△ Less
Submitted 3 October, 2018;
originally announced October 2018.
-
DeepProteomics: Protein family classification using Shallow and Deep Networks
Authors:
Anu Vazhayil,
Vinayakumar R,
Soman KP
Abstract:
The knowledge regarding the function of proteins is necessary as it gives a clear picture of biological processes. Nevertheless, there are many protein sequences found and added to the databases but lacks functional annotation. The laboratory experiments take a considerable amount of time for annotation of the sequences. This arises the need to use computational techniques to classify proteins bas…
▽ More
The knowledge regarding the function of proteins is necessary as it gives a clear picture of biological processes. Nevertheless, there are many protein sequences found and added to the databases but lacks functional annotation. The laboratory experiments take a considerable amount of time for annotation of the sequences. This arises the need to use computational techniques to classify proteins based on their functions. In our work, we have collected the data from Swiss-Prot containing 40433 proteins which is grouped into 30 families. We pass it to recurrent neural network(RNN), long short term memory(LSTM) and gated recurrent unit(GRU) model and compare it by applying trigram with deep neural network and shallow neural network on the same dataset. Through this approach, we could achieve maximum of around 78% accuracy for the classification of protein families.
△ Less
Submitted 11 September, 2018;
originally announced September 2018.
-
Deep Health Care Text Classification
Authors:
Vinayakumar R,
Barathi Ganesh HB,
Anand Kumar M,
Soman KP
Abstract:
Health related social media mining is a valuable apparatus for the early recognition of the diverse antagonistic medicinal conditions. Mostly, the existing methods are based on machine learning with knowledge-based learning. This working note presents the Recurrent neural network (RNN) and Long short-term memory (LSTM) based embedding for automatic health text classification in the social media mi…
▽ More
Health related social media mining is a valuable apparatus for the early recognition of the diverse antagonistic medicinal conditions. Mostly, the existing methods are based on machine learning with knowledge-based learning. This working note presents the Recurrent neural network (RNN) and Long short-term memory (LSTM) based embedding for automatic health text classification in the social media mining. For each task, two systems are built and that classify the tweet at the tweet level. RNN and LSTM are used for extracting features and non-linear activation function at the last layer facilitates to distinguish the tweets of different categories. The experiments are conducted on 2nd Social Media Mining for Health Applications Shared Task at AMIA 2017. The experiment results are considerable; however the proposed method is appropriate for the health text classification. This is primarily due to the reason that, it doesn't rely on any feature engineering mechanisms.
△ Less
Submitted 23 October, 2017;
originally announced October 2017.
-
Vector Space Model as Cognitive Space for Text Classification
Authors:
Barathi Ganesh HB,
Anand Kumar M,
Soman KP
Abstract:
In this era of digitization, knowing the user's sociolect aspects have become essential features to build the user specific recommendation systems. These sociolect aspects could be found by mining the user's language sharing in the form of text in social media and reviews. This paper describes about the experiment that was performed in PAN Author Profiling 2017 shared task. The objective of the ta…
▽ More
In this era of digitization, knowing the user's sociolect aspects have become essential features to build the user specific recommendation systems. These sociolect aspects could be found by mining the user's language sharing in the form of text in social media and reviews. This paper describes about the experiment that was performed in PAN Author Profiling 2017 shared task. The objective of the task is to find the sociolect aspects of the users from their tweets. The sociolect aspects considered in this experiment are user's gender and native language information. Here user's tweets written in a different language from their native language are represented as Document - Term Matrix with document frequency as the constraint. Further classification is done using the Support Vector Machine by taking gender and native language as target classes. This experiment attains the average accuracy of 73.42% in gender prediction and 76.26% in the native language identification task.
△ Less
Submitted 20 August, 2017;
originally announced August 2017.
-
An Algorithm for Alignment-free Sequence Comparison using Logical Match
Authors:
Sanil Shanker KP,
Elizabeth Sherly,
Jim Austin
Abstract:
This paper proposes an algorithm for alignment-free sequence comparison using Logical Match. Here, we compute the score using fuzzy membership values which generate automatically from the number of matches and mismatches. We demonstrate the method with both the artificial and real datum. The results show the uniqueness of the proposed method by analyzing DNA sequences taken from NCBI databank with…
▽ More
This paper proposes an algorithm for alignment-free sequence comparison using Logical Match. Here, we compute the score using fuzzy membership values which generate automatically from the number of matches and mismatches. We demonstrate the method with both the artificial and real datum. The results show the uniqueness of the proposed method by analyzing DNA sequences taken from NCBI databank with a novel computational time.
△ Less
Submitted 6 July, 2014;
originally announced July 2014.
-
Sequential Data Mining using Correlation Matrix Memory
Authors:
Sanil Shanker KP,
Aaron Turner,
Elizabeth Sherly,
Jim Austin
Abstract:
This paper proposes a method for sequential data mining using correlation matrix memory. Here, we use the concept of the Logical Match to mine the indices of the sequential pattern. We demonstrate the uniqueness of the method with both the artificial and the real datum taken from NCBI databank.
This paper proposes a method for sequential data mining using correlation matrix memory. Here, we use the concept of the Logical Match to mine the indices of the sequential pattern. We demonstrate the uniqueness of the method with both the artificial and the real datum taken from NCBI databank.
△ Less
Submitted 6 July, 2014;
originally announced July 2014.