Search | arXiv e-print repository

A Quantum Approach to Synthetic Minority Oversampling Technique (SMOTE)

Authors: Nishikanta Mohanty, Bikash K. Behera, Christopher Ferrie, Pravat Dash

Abstract: The paper proposes the Quantum-SMOTE method, a novel solution that uses quantum computing techniques to solve the prevalent problem of class imbalance in machine learning datasets. Quantum-SMOTE, inspired by the Synthetic Minority Oversampling Technique (SMOTE), generates synthetic data points using quantum processes such as swap tests and quantum rotation. The process varies from the conventional… ▽ More The paper proposes the Quantum-SMOTE method, a novel solution that uses quantum computing techniques to solve the prevalent problem of class imbalance in machine learning datasets. Quantum-SMOTE, inspired by the Synthetic Minority Oversampling Technique (SMOTE), generates synthetic data points using quantum processes such as swap tests and quantum rotation. The process varies from the conventional SMOTE algorithm's usage of K-Nearest Neighbors (KNN) and Euclidean distances, enabling synthetic instances to be generated from minority class data points without relying on neighbor proximity. The algorithm asserts greater control over the synthetic data generation process by introducing hyperparameters such as rotation angle, minority percentage, and splitting factor, which allow for customization to specific dataset requirements. Due to the use of a compact swap test, the algorithm can accommodate a large number of features. Furthermore, the approach is tested on a public dataset of Telecom Churn and evaluated alongside two prominent classification algorithms, Random Forest and Logistic Regression, to determine its impact along with varying proportions of synthetic data. △ Less

Submitted 4 July, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: 42 Pages, 23 Figures, 2 Tables

arXiv:2310.00594 [pdf]

Performance evaluation of Machine learning algorithms for Intrusion Detection System

Authors: Sudhanshu Sekhar Tripathy, Bichitrananda Behera

Abstract: The escalation of hazards to safety and hijacking of digital networks are among the strongest perilous difficulties that must be addressed in the present day. Numerous safety procedures were set up to track and recognize any illicit activity on the network's infrastructure. IDS are the best way to resist and recognize intrusions on internet connections and digital technologies. To classify network… ▽ More The escalation of hazards to safety and hijacking of digital networks are among the strongest perilous difficulties that must be addressed in the present day. Numerous safety procedures were set up to track and recognize any illicit activity on the network's infrastructure. IDS are the best way to resist and recognize intrusions on internet connections and digital technologies. To classify network traffic as normal or anomalous, Machine Learning (ML) classifiers are increasingly utilized. An IDS with machine learning increases the accuracy with which security attacks are detected. This paper focuses on intrusion detection systems (IDSs) analysis using ML techniques. IDSs utilizing ML techniques are efficient and precise at identifying network assaults. In data with large dimensional spaces, however, the efficacy of these systems degrades. correspondingly, the case is essential to execute a feasible feature removal technique capable of getting rid of characteristics that have little effect on the classification process. In this paper, we analyze the KDD CUP-'99' intrusion detection dataset used for training and validating ML models. Then, we implement ML classifiers such as Logistic Regression, Decision Tree, K-Nearest Neighbour, Naive Bayes, Bernoulli Naive Bayes, Multinomial Naive Bayes, XG-Boost Classifier, Ada-Boost, Random Forest, SVM, Rocchio classifier, Ridge, Passive-Aggressive classifier, ANN besides Perceptron (PPN), the optimal classifiers are determined by comparing the results of Stochastic Gradient Descent and back-propagation neural networks for IDS, Conventional categorization indicators, such as "accuracy, precision, recall, and the f1-measure, have been used to evaluate the performance of the ML classification algorithms. △ Less

Submitted 1 October, 2023; originally announced October 2023.

arXiv:2206.00157 [pdf, other]

Design and Simulation of an Autonomous Quantum Flying Robot Vehicle: An IBM Quantum Experience

Authors: Sudev Pradhan, Anshuman Padhi, Bikash Kumar Behera

Abstract: The application of quantum computation and information in robotics has caught the attention of researchers off late. The field of robotics has always put its effort on the minimization of the space occupied by the robot, and on making the robot `smarter. `The smartness of a robot is its sensitivity to its surroundings and the user input and its ability to react upon them desirably. Quantum phenome… ▽ More The application of quantum computation and information in robotics has caught the attention of researchers off late. The field of robotics has always put its effort on the minimization of the space occupied by the robot, and on making the robot `smarter. `The smartness of a robot is its sensitivity to its surroundings and the user input and its ability to react upon them desirably. Quantum phenomena in robotics make sure that the robots occupy less space and the ability of quantum computation to process the huge amount of information effectively, consequently making the robot smarter. Braitenberg vehicle is a simple circuited robot that moves according to the input that its sensors receive. Building upon that, we propose a quantum robot vehicle that is `smart' enough to understand the complex situations more than that of a simple Braitenberg vehicle and navigate itself as per the obstacles present. It can detect an obstacle-free path and can navigate itself accordingly. It also takes input from the user when there is more than one free path available. When left with no option on the ground, it can airlift itself off the ground. As these vehicles sort of `react to the surrounding conditions, this idea can be used to build artificial life and genetic algorithms, space exploration and deep-earth exploration probes, and a handy tool in defense and intelligence services. △ Less

Submitted 31 May, 2022; originally announced June 2022.

Comments: 7 pages, 5 figures

arXiv:2009.00098 [pdf, other]

Sorting an Array Using the Topological Sort of a Corresponding Comparison Graph

Authors: Balaram Behera

Abstract: The quest for efficient sorting is ongoing, and we will explore a graph-based stable sorting strategy, in particular employing comparison graphs. We use the topological sort to map the comparison graph to a linear domain, and we can manipulate our graph such that the resulting topological sort is the sorted array. By taking advantage of the many relations between Hamiltonian paths and topological… ▽ More The quest for efficient sorting is ongoing, and we will explore a graph-based stable sorting strategy, in particular employing comparison graphs. We use the topological sort to map the comparison graph to a linear domain, and we can manipulate our graph such that the resulting topological sort is the sorted array. By taking advantage of the many relations between Hamiltonian paths and topological sorts in comparison graphs, we design a Divide-and-Conquer algorithm that runs in the optimal $O(n \log n)$ time. In the process, we construct a new merge process for graphs with relevant invariant properties for our use. Furthermore, this method is more space-efficient than the famous {\sc MergeSort} since we modify our fixed graph only. △ Less

Submitted 31 August, 2020; originally announced September 2020.

Comments: 18 pages, 0 figures. Keywords: graph algorithms; topological sort; sorting algorithms; comparison graphs

arXiv:2007.09768 [pdf, other]

FPT Algorithms for Finding Near-Cliques in $c$-Closed Graphs

Authors: Balaram Behera, Edin Husić, Shweta Jain, Tim Roughgarden, C. Seshadhri

Abstract: Finding large cliques or cliques missing a few edges is a fundamental algorithmic task in the study of real-world graphs, with applications in community detection, pattern recognition, and clustering. A number of effective backtracking-based heuristics for these problems have emerged from recent empirical work in social network analysis. Given the NP-hardness of variants of clique counting, these… ▽ More Finding large cliques or cliques missing a few edges is a fundamental algorithmic task in the study of real-world graphs, with applications in community detection, pattern recognition, and clustering. A number of effective backtracking-based heuristics for these problems have emerged from recent empirical work in social network analysis. Given the NP-hardness of variants of clique counting, these results raise a challenge for beyond worst-case analysis of these problems. Inspired by the triadic closure of real-world graphs, Fox et al. (SICOMP 2020) introduced the notion of $c$-closed graphs and proved that maximal clique enumeration is fixed-parameter tractable with respect to $c$. In practice, due to noise in data, one wishes to actually discover "near-cliques", which can be characterized as cliques with a sparse subgraph removed. In this work, we prove that many different kinds of maximal near-cliques can be enumerated in polynomial time (and FPT in $c$) for $c$-closed graphs. We study various established notions of such substructures, including $k$-plexes, complements of bounded-degeneracy and bounded-treewidth graphs. Interestingly, our algorithms follow relatively simple backtracking procedures, analogous to what is done in practice. Our results underscore the significance of the $c$-closed graph class for theoretical understanding of social network analysis. △ Less

Submitted 19 November, 2021; v1 submitted 19 July, 2020; originally announced July 2020.

Comments: Accepted to ITCS 2022

MSC Class: 68W01; 68R10; 05C85

arXiv:1705.06338 [pdf, other]

Distributed Vector Representation Of Shop** Items, The Customer And Shop** Cart To Build A Three Fold Recommendation System

Authors: Bibek Behera, Manoj Joshi, Abhilash KK, Mohammad Ansari Ismail

Abstract: The main idea of this paper is to represent shop** items through vectors because these vectors act as the base for building em- beddings for customers and shop** carts. Also, these vectors are input to the mathematical models that act as either a recommendation engine or help in targeting potential customers. We have used exponential family embeddings as the tool to construct two basic vectors… ▽ More The main idea of this paper is to represent shop** items through vectors because these vectors act as the base for building em- beddings for customers and shop** carts. Also, these vectors are input to the mathematical models that act as either a recommendation engine or help in targeting potential customers. We have used exponential family embeddings as the tool to construct two basic vectors - product embeddings and context vectors. Using the basic vectors, we build combined embeddings, trip embeddings and customer embeddings. Combined embeddings mix linguistic properties of product names with their shop** patterns. The customer embeddings establish an understand- ing of the buying pattern of customers in a group and help in building customer profile. For example a customer profile can represent customers frequently buying pet-food. Identifying such profiles can help us bring out offers and discounts. Similarly, trip embeddings are used to build trip profiles. People happen to buy similar set of products in a trip and hence their trip embeddings can be used to predict the next product they would like to buy. This is a novel technique and the first of its kind to make recommendation using product, trip and customer embeddings. △ Less

Submitted 17 May, 2017; originally announced May 2017.

Comments: Cicling 2017

Showing 1–6 of 6 results for author: Behera, B