Search | arXiv e-print repository

KmerCo: A lightweight K-mer counting technique with a tiny memory footprint

Abstract: K-mer counting is a requisite process for DNA assembly because it speeds up its overall process. The frequency of K-mers is used for estimating the parameters of DNA assembly, error correction, etc. The process also provides a list of district K-mers which assist in searching large databases and reducing the size of de Bruijn graphs. Nonetheless, K-mer counting is a data and compute-intensive proc… ▽ More K-mer counting is a requisite process for DNA assembly because it speeds up its overall process. The frequency of K-mers is used for estimating the parameters of DNA assembly, error correction, etc. The process also provides a list of district K-mers which assist in searching large databases and reducing the size of de Bruijn graphs. Nonetheless, K-mer counting is a data and compute-intensive process. Hence, it is crucial to implement a lightweight data structure that occupies low memory but does fast processing of K-mers. We proposed a lightweight K-mer counting technique, called KmerCo that implements a potent counting Bloom Filter variant, called countBF. KmerCo has two phases: insertion and classification. The insertion phase inserts all K-mers into countBF and determines distinct K-mers. The classification phase is responsible for the classification of distinct K-mers into trustworthy and erroneous K-mers based on a user-provided threshold value. We also proposed a novel benchmark performance metric. We used the Hadoop MapReduce program to determine the frequency of K-mers. We have conducted rigorous experiments to prove the dominion of KmerCo compared to state-of-the-art K-mer counting techniques. The experiments are conducted using DNA sequences of four organisms. The datasets are pruned to generate four different size datasets. KmerCo is compared with Squeakr, BFCounter, and Jellyfish. KmerCo took the lowest memory, highest number of insertions per second, and a positive trustworthy rate as compared with the three above-mentioned methods. △ Less

Submitted 28 April, 2023; originally announced May 2023.

Comments: Submitted to the conference for possible publication

MSC Class: 68P05 ACM Class: E.1

arXiv:2111.14609 [pdf, ps, other]

An Investigation on Learning, Polluting, and Unlearning the Spam Emails for Lifelong Learning

Authors: Nishchal Parne, Kyathi Puppaala, Nithish Bhupathi, Ripon Patgiri

Abstract: Machine unlearning for security is studied in this context. Several spam email detection methods exist, each of which employs a different algorithm to detect undesired spam emails. But these models are vulnerable to attacks. Many attackers exploit the model by polluting the data, which are trained to the model in various ways. So to act deftly in such situations model needs to readily unlearn the… ▽ More Machine unlearning for security is studied in this context. Several spam email detection methods exist, each of which employs a different algorithm to detect undesired spam emails. But these models are vulnerable to attacks. Many attackers exploit the model by polluting the data, which are trained to the model in various ways. So to act deftly in such situations model needs to readily unlearn the polluted data without the need for retraining. Retraining is impractical in most cases as there is already a massive amount of data trained to the model in the past, which needs to be trained again just for removing a small amount of polluted data, which is often significantly less than 1%. This problem can be solved by develo** unlearning frameworks for all spam detection models. In this research, unlearning module is integrated into spam detection models that are based on Naive Bayes, Decision trees, and Random Forests algorithms. To assess the benefits of unlearning over retraining, three spam detection models are polluted and exploited by taking attackers' positions and proving models' vulnerability. Reduction in accuracy and true positive rates are shown in each case showing the effect of pollution on models. Then unlearning modules are integrated into the models, and polluted data is unlearned; on testing the models after unlearning, restoration of performance is seen. Also, unlearning and retraining times are compared with different pollution data sizes on all models. On analyzing the findings, it can be concluded that unlearning is considerably superior to retraining. Results show that unlearning is fast, easy to implement, easy to use, and effective. △ Less

Submitted 24 December, 2021; v1 submitted 26 November, 2021; originally announced November 2021.

Comments: Submitted to Elsevier for possible publication

MSC Class: 68T05; 68T07; 68Q32; 68M25 ACM Class: D.4.6; I.2; I.2.6

arXiv:2108.10733 [pdf, other]

Graph Neural Networks: Methods, Applications, and Opportunities

Authors: Lilapati Waikhom, Ripon Patgiri

Abstract: In the last decade or so, we have witnessed deep learning reinvigorating the machine learning field. It has solved many problems in the domains of computer vision, speech recognition, natural language processing, and various other tasks with state-of-the-art performance. The data is generally represented in the Euclidean space in these domains. Various other domains conform to non-Euclidean space,… ▽ More In the last decade or so, we have witnessed deep learning reinvigorating the machine learning field. It has solved many problems in the domains of computer vision, speech recognition, natural language processing, and various other tasks with state-of-the-art performance. The data is generally represented in the Euclidean space in these domains. Various other domains conform to non-Euclidean space, for which graph is an ideal representation. Graphs are suitable for representing the dependencies and interrelationships between various entities. Traditionally, handcrafted features for graphs are incapable of providing the necessary inference for various tasks from this complex data representation. Recently, there is an emergence of employing various advances in deep learning to graph data-based tasks. This article provides a comprehensive survey of graph neural networks (GNNs) in each learning setting: supervised, unsupervised, semi-supervised, and self-supervised learning. Taxonomy of each graph based learning setting is provided with logical divisions of methods falling in the given learning setting. The approaches for each learning task are analyzed from both theoretical as well as empirical standpoints. Further, we provide general architecture guidelines for building GNNs. Various applications and benchmark datasets are also provided, along with open challenges still plaguing the general applicability of GNNs. △ Less

Submitted 8 September, 2021; v1 submitted 24 August, 2021; originally announced August 2021.

Comments: Submitted to ACM

MSC Class: 68Txx ACM Class: I.2.6; I.2; I.5

arXiv:2107.06835 [pdf, other]

A Review on Edge Analytics: Issues, Challenges, Opportunities, Promises, Future Directions, and Applications

Authors: Sabuzima Nayak, Ripon Patgiri, Lilapati Waikhom, Arif Ahmed

Abstract: Edge technology aims to bring Cloud resources (specifically, the compute, storage, and network) to the closed proximity of the Edge devices, i.e., smart devices where the data are produced and consumed. Embedding computing and application in Edge devices lead to emerging of two new concepts in Edge technology, namely, Edge computing and Edge analytics. Edge analytics uses some techniques or algori… ▽ More Edge technology aims to bring Cloud resources (specifically, the compute, storage, and network) to the closed proximity of the Edge devices, i.e., smart devices where the data are produced and consumed. Embedding computing and application in Edge devices lead to emerging of two new concepts in Edge technology, namely, Edge computing and Edge analytics. Edge analytics uses some techniques or algorithms to analyze the data generated by the Edge devices. With the emerging of Edge analytics, the Edge devices have become a complete set. Currently, Edge analytics is unable to provide full support for the execution of the analytic techniques. The Edge devices cannot execute advanced and sophisticated analytic algorithms following various constraints such as limited power supply, small memory size, limited resources, etc. This article aims to provide a detailed discussion on Edge analytics. A clear explanation to distinguish between the three concepts of Edge technology, namely, Edge devices, Edge computing, and Edge analytics, along with their issues. Furthermore, the article discusses the implementation of Edge analytics to solve many problems in various areas such as retail, agriculture, industry, and healthcare. In addition, the research papers of the state-of-the-art edge analytics are rigorously reviewed in this article to explore the existing issues, emerging challenges, research opportunities and their directions, and applications. △ Less

Submitted 1 July, 2021; originally announced July 2021.

Comments: Submitted to Elsevier for possible publication

MSC Class: 68Mxx ACM Class: C.5.5; C.5.1; I.2; H.3; H.2

arXiv:2106.04365 [pdf, ps, other]

RobustBF: A High Accuracy and Memory Efficient 2D Bloom Filter

Authors: Sabuzima Nayak, Ripon Patgiri

Abstract: Bloom Filter is an important probabilistic data structure to reduce memory consumption for membership filters. It is applied in diverse domains such as Computer Networking, Network Security and Privacy, IoT, Edge Computing, Cloud Computing, Big Data, and Biometrics. But Bloom Filter has an issue of the false positive probability. To address this issue, we propose a novel robust Bloom Filter, robus… ▽ More Bloom Filter is an important probabilistic data structure to reduce memory consumption for membership filters. It is applied in diverse domains such as Computer Networking, Network Security and Privacy, IoT, Edge Computing, Cloud Computing, Big Data, and Biometrics. But Bloom Filter has an issue of the false positive probability. To address this issue, we propose a novel robust Bloom Filter, robustBF for short. robustBF is a 2D Bloom Filter, capable of filtering millions of data with high accuracy without compromising the performance. Our proposed system is presented in two-fold. Firstly, we modify the murmur hash function, and test all modified hash functions for improvements and select the best-modified hash function experimentally. Secondly, we embed the modified hash functions in 2D Bloom Filter. Our experimental results show that robustBF is better than standard Bloom Filter and counting Bloom Filter in every aspect. robustBF exhibits nearly zero false positive probability with more than $10\times$ and $44\times$ lower memory consumption than standard Bloom filter and counting Bloom Filter, respectively. △ Less

Submitted 8 September, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

Comments: Submitted to IEEE conference

MSC Class: 41-XX; 68Mxx; 68Wxx ACM Class: E.1; E.2; H.2; H.3

arXiv:2106.04364 [pdf, other]

countBF: A General-purpose High Accuracy and Space Efficient Counting Bloom Filter

Authors: Sabuzima Nayak, Ripon Patgiri

Abstract: Bloom Filter is a probabilistic data structure for the membership query, and it has been intensely experimented in various fields to reduce memory consumption and enhance a system's performance. Bloom Filter is classified into two key categories: counting Bloom Filter (CBF), and non-counting Bloom Filter. CBF has a higher false positive probability than standard Bloom Filter (SBF), i.e., CBF uses… ▽ More Bloom Filter is a probabilistic data structure for the membership query, and it has been intensely experimented in various fields to reduce memory consumption and enhance a system's performance. Bloom Filter is classified into two key categories: counting Bloom Filter (CBF), and non-counting Bloom Filter. CBF has a higher false positive probability than standard Bloom Filter (SBF), i.e., CBF uses a higher memory footprint than SBF. But CBF can address the issue of the false negative probability. Notably, SBF is also false negative free, but it cannot support delete operations like CBF. To address these issues, we present a novel counting Bloom Filter based on SBF and 2D Bloom Filter, called countBF. countBF uses a modified murmur hash function to enhance its various requirements, which is experimentally evaluated. Our experimental results show that countBF uses $1.96\times$ and $7.85\times$ less memory than SBF and CBF respectively, while preserving lower false positive probability and execution time than both SBF and CBF. The overall accuracy of countBF is $99.999921$, and it proves the superiority of countBF over SBF and CBF. Also, we compare with other state-of-the-art counting Bloom Filters. △ Less

Submitted 6 June, 2021; originally announced June 2021.

Comments: Submitted to IEEE Conference for possible publication

MSC Class: 41-XX; 68Wxx ACM Class: E.1; E.2; H.2; H.3

arXiv:2103.12544 [pdf, other]

DeepBF: Malicious URL detection using Learned Bloom Filter and Evolutionary Deep Learning

Authors: Ripon Patgiri, Anupam Biswas, Sabuzima Nayak

Abstract: Malicious URL detection is an emerging research area due to continuous modernization of various systems, for instance, Edge Computing. In this article, we present a novel malicious URL detection technique, called deepBF (deep learning and Bloom Filter). deepBF is presented in two-fold. Firstly, we propose a learned Bloom Filter using 2-dimensional Bloom Filter. We experimentally decide the best no… ▽ More Malicious URL detection is an emerging research area due to continuous modernization of various systems, for instance, Edge Computing. In this article, we present a novel malicious URL detection technique, called deepBF (deep learning and Bloom Filter). deepBF is presented in two-fold. Firstly, we propose a learned Bloom Filter using 2-dimensional Bloom Filter. We experimentally decide the best non-cryptography string hash function. Then, we derive a modified non-cryptography string hash function from the selected hash function for deepBF by introducing biases in the hashing method and compared among the string hash functions. The modified string hash function is compared to other variants of diverse non-cryptography string hash functions. It is also compared with various filters, particularly, counting Bloom Filter, Kirsch \textit{et al.}, and Cuckoo Filter using various use cases. The use cases unearth weakness and strength of the filters. Secondly, we propose a malicious URL detection mechanism using deepBF. We apply the evolutionary convolutional neural network to identify the malicious URLs. The evolutionary convolutional neural network is trained and tested with malicious URL datasets. The output is tested in deepBF for accuracy. We have achieved many conclusions from our experimental evaluation and results and are able to reach various conclusive decisions which are presented in the article. △ Less

Submitted 26 February, 2022; v1 submitted 18 March, 2021; originally announced March 2021.

Comments: This work has been submitted to the Springer for possible publication

MSC Class: 68Txx; 97P80; 92B20; 68Qxx ACM Class: K.6.5; E.3; E.4; D.4.6; G.3; I.5; I.2.6; G.1.6

arXiv:2005.07532 [pdf, other]

doi 10.1007/978-981-15-9735-0_1

6G Communication Technology: A Vision on Intelligent Healthcare

Authors: Sabuzima Nayak, Ripon Patgiri

Abstract: 6G is a promising communication technology that will dominate the entire health market from 2030 onward. It will dominate not only health sector but also diverse sectors. It is expected that 6G will revolutionize many sectors including healthcare. Healthcare will be fully AI-driven and dependent on 6G communication technology, which will change our perception of lifestyle. Currently, time and spac… ▽ More 6G is a promising communication technology that will dominate the entire health market from 2030 onward. It will dominate not only health sector but also diverse sectors. It is expected that 6G will revolutionize many sectors including healthcare. Healthcare will be fully AI-driven and dependent on 6G communication technology, which will change our perception of lifestyle. Currently, time and space are the key barriers to health care and 6G will be able to overcome these barriers. Also, 6G will be proven as a game changing technology for healthcare. Therefore, in this perspective, we envision healthcare system for the era of 6G communication technology. Also, various new methodologies have to be introduced to enhance our lifestyle, which is addressed in this perspective, including Quality of Life (QoL), Intelligent Wearable Devices (IWD), Intelligent Internet of Medical Things (IIoMT), Hospital-to-Home (H2H) services, and new business model. In addition, we expose the role of 6G communication technology in telesurgery, Epidemic and Pandemic. △ Less

Submitted 16 April, 2020; originally announced May 2020.

Comments: This manuscript is submitted to IEEE for possible publication

MSC Class: 68-02; 68M10; 68Txx ACM Class: C.2; J.3; I.2

arXiv:2005.07531 [pdf, other]

doi 10.1007/978-981-19-0019-8_16

6G Communications: A Vision on the Potential Applications

Authors: Sabuzima Nayak, Ripon Patgiri

Abstract: 6G communication technology is a revolutionary technology that will revolutionize many technologies and applications. Furthermore, it will be truly AI-driven and will carry on intelligent space. Hence, it will enable Internet of Everything (IoE) which will also impact many technologies and applications. 6G communication technology promises high Quality of Services (QoS) and high Quality of Experie… ▽ More 6G communication technology is a revolutionary technology that will revolutionize many technologies and applications. Furthermore, it will be truly AI-driven and will carry on intelligent space. Hence, it will enable Internet of Everything (IoE) which will also impact many technologies and applications. 6G communication technology promises high Quality of Services (QoS) and high Quality of Experiences (QoE). With the combination of IoE and 6G communication technology, number of applications will be exploded in the coming future, particularly, vehicles, drones, homes, cities, hospitals, and so on, and there will be no untouched area. Thence, it is expected that many existing technologies will fully depend on 6G communication technology and enhance their performances. 6G communication technology will prove as game changer communication technology in many fields and will be capable to influence many applications. Therefore, we envision the potential applications of 6G communication technology in the near future. △ Less

Submitted 23 April, 2020; originally announced May 2020.

Comments: This manuscript is submitted to IEEE for possible publications

Report number: 869 MSC Class: 68-02; 68M10 ACM Class: C.2; I.2

Journal ref: Edge Analytics, Lecture Notes in Electrical Engineering, 2022

arXiv:2005.06965 [pdf, other]

A Review on Impact of Bloom Filter on Named Data Networking: The Future Internet Architecture

Authors: Sabuzima Nayak, Ripon Patgiri, Angana Borah

Abstract: Today is the era of smart devices. Through the smart devices, people remain connected with systems across the globe even in mobile state. Hence, the current Internet is facing scalability issue. Therefore, leaving IP based Internet behind due to scalability, the world is moving to the Future Internet Architecture, called Named Data Networking (NDN). Currently, the number of nodes connected to the… ▽ More Today is the era of smart devices. Through the smart devices, people remain connected with systems across the globe even in mobile state. Hence, the current Internet is facing scalability issue. Therefore, leaving IP based Internet behind due to scalability, the world is moving to the Future Internet Architecture, called Named Data Networking (NDN). Currently, the number of nodes connected to the Internet is in billions. And, the number of requests sent is in millions per second. NDN handles such huge numbers by modifying the IP architecture to meet the current requirements. NDN is scalable, produces less traffic and congestion, provides high level security, saves bandwidth, efficiently utilizes multiple network interfaces and have many more functionalities. Similarly, Bloom Filter is the only good choice to deploy in various modules of NDN to handle the huge number of packets. Bloom Filter is a simple probabilistic data structure for the membership query. This article presents a detailed discussion on the role of Bloom Filter in implementing NDN. The article includes a precise discussion on Bloom Filter and the main components of the NDN architecture, namely, packet, content store, forward information base and pending interest table are also discussed briefly. △ Less

Submitted 7 April, 2020; originally announced May 2020.

Comments: Subited to JNCA journal for possible publication

MSC Class: 41-02; 68-02; 68M10; 68M11; 68M12 ACM Class: C.2; E.1; F.2

arXiv:2005.06964 [pdf, ps, other]

doi 10.4108/eai.13-7-2018.163972

Big Computing: Where are we heading?

Authors: Sabuzima Nayak, Ripon Patgiri, Thoudam Doren Singh

Abstract: This paper presents the overview of the current trends of Big data against the computing scenario from different aspects. Some of the important aspect includes the Exascale, the computing power and the kind of applications which offer the Big data. This starts with the current computing hardware constraint against the need of the rising Big data applications. We highlight the issues and challenges… ▽ More This paper presents the overview of the current trends of Big data against the computing scenario from different aspects. Some of the important aspect includes the Exascale, the computing power and the kind of applications which offer the Big data. This starts with the current computing hardware constraint against the need of the rising Big data applications. We highlight the issues and challenges of energy requirement, software complexity, hardware failure, fault tolerant computing, and communication. As the complexity of computation is going to rise in the future. The paper also highlights the future direction of Big computing systems for Bioinformatics, social media, hardware and software requirements, data intensive computation and then towards GPU era. △ Less

Submitted 9 April, 2020; originally announced May 2020.

Comments: Published in EAI Endorsed Transactions on Scalable Information Systems

MSC Class: 68-02; 68M14; 68M11; 68M10 ACM Class: C.0; C.1; C.2; C.3; C.5; C.5

Journal ref: EAI Endorsed Transactions on Scalable Information Systems, 2020

arXiv:2005.06963 [pdf, other]

A Survey on Large Scale Metadata Server for Big Data Storage

Authors: Ripon Patgiri, Sabuzima Nayak

Abstract: Big Data is defined as high volume of variety of data with an exponential data growth rate. Data are amalgamated to generate revenue, which results a large data silo. Data are the oils of modern IT industries. Therefore, the data are growing at an exponential pace. The access mechanism of these data silos are defined by metadata. The metadata are decoupled from data server for various beneficial r… ▽ More Big Data is defined as high volume of variety of data with an exponential data growth rate. Data are amalgamated to generate revenue, which results a large data silo. Data are the oils of modern IT industries. Therefore, the data are growing at an exponential pace. The access mechanism of these data silos are defined by metadata. The metadata are decoupled from data server for various beneficial reasons. For instance, ease of maintenance. The metadata are stored in metadata server (MDS). Therefore, the study on the MDS is mandatory in designing of a large scale storage system. The MDS requires many parameters to augment with its architecture. The architecture of MDS depends on the demand of the storage system's requirements. Thus, MDS is categorized in various ways depending on the underlying architecture and design methodology. The article surveys on the various kinds of MDS architecture, designs, and methodologies. This article emphasizes on clustered MDS (cMDS) and the reports are prepared based on a) Bloom filter$-$based MDS, b) Client$-$funded MDS, c) Geo$-$aware MDS, d) Cache$-$aware MDS, e) Load$-$aware MDS, f) Hash$-$based MDS, and g) Tree$-$based MDS. Additionally, the article presents the issues and challenges of MDS for mammoth sized data. △ Less

Submitted 11 April, 2020; originally announced May 2020.

Comments: Submitted to ACM for possible publication

MSC Class: 68-02; 68M14; 68W10; 68W15 ACM Class: D.4; H.3

arXiv:2004.04024 [pdf, ps, other]

doi 10.4108/eai.11-11-2020.166959

6G Communication: Envisioning the Key Issues and Challenges

Authors: Sabuzima Nayak, Ripon Patgiri

Abstract: In 2030, we are going to evidence the 6G mobile communication technology, which will enable the Internet of Everything. Yet 5G has to be experienced by people worldwide and B5G has to be developed; the researchers have already started planning, visioning, and gathering requirements of the 6G. Moreover, many countries have already initiated the research on 6G. 6G promises connecting every smart dev… ▽ More In 2030, we are going to evidence the 6G mobile communication technology, which will enable the Internet of Everything. Yet 5G has to be experienced by people worldwide and B5G has to be developed; the researchers have already started planning, visioning, and gathering requirements of the 6G. Moreover, many countries have already initiated the research on 6G. 6G promises connecting every smart device to the Internet from smartphone to intelligent vehicles. 6G will provide sophisticated and high QoS such as holographic communication, augmented reality/virtual reality and many more. Also, it will focus on Quality of Experience (QoE) to provide rich experiences from 6G technology. Notably, it is very important to vision the issues and challenges of 6G technology, otherwise, promises may not be delivered on time. The requirements of 6G poses new challenges to the research community. To achieve desired parameters of 6G, researchers are exploring various alternatives. Hence, there are diverse research challenges to envision, from devices to softwarization. Therefore, in this article, we discuss the future issues and challenges to be faced by the 6G technology. We have discussed issues and challenges from every aspect from hardware to the enabling technologies which will be utilized by 6G. △ Less

Submitted 7 June, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

MSC Class: 68-02; 68M10 ACM Class: C.2; I.2

Journal ref: EAI Endorsed Transactionson Internet of Things, 2020

arXiv:2002.10254 [pdf, other]

Empirical Study on Airline Delay Analysis and Prediction

Authors: Ripon Patgiri, Sajid Hussain, Aditya Nongmeikapam

Abstract: The Big Data analytics are a logical analysis of very large scale datasets. The data analysis enhances an organization and improve the decision making process. In this article, we present Airline Delay Analysis and Prediction to analyze airline datasets with the combination of weather dataset. In this research work, we consider various attributes to analyze flight delay, for example, day-wise, air… ▽ More The Big Data analytics are a logical analysis of very large scale datasets. The data analysis enhances an organization and improve the decision making process. In this article, we present Airline Delay Analysis and Prediction to analyze airline datasets with the combination of weather dataset. In this research work, we consider various attributes to analyze flight delay, for example, day-wise, airline-wise, cloud cover, temperature, etc. Moreover, we present rigorous experiments on various machine learning model to predict correctly the delay of a flight, namely, logistic regression with L2 regularization, Gaussian Naive Bayes, K-Nearest Neighbors, Decision Tree classifier and Random forest model. The accuracy of the Random Forest model is 82% with a delay threshold of 15 minutes of flight delay. The analysis is carried out using dataset from 1987 to 2008, the training is conducted with dataset from 2000 to 2007 and validated prediction result using 2008 data. Moreover, we have got recall 99% in the Random Forest model. △ Less

Submitted 17 February, 2020; originally announced February 2020.

Comments: Figure 13

MSC Class: 68U35 ACM Class: I.5

arXiv:1903.12525 [pdf, other]

Shed More Light on Bloom Filter's Variants

Authors: Ripon Patgiri, Sabuzima Nayak, Samir Kumar Borgohain

Abstract: Bloom Filter is a probabilistic membership data structure and it is excessively used data structure for membership query. Bloom Filter becomes the predominant data structure in approximate membership filtering. Bloom Filter extremely enhances the query response time, and the response time is very fast. Bloom filter (BF) is used to detect whether an element belongs to a given set or not. The Bloom… ▽ More Bloom Filter is a probabilistic membership data structure and it is excessively used data structure for membership query. Bloom Filter becomes the predominant data structure in approximate membership filtering. Bloom Filter extremely enhances the query response time, and the response time is very fast. Bloom filter (BF) is used to detect whether an element belongs to a given set or not. The Bloom Filter returns True Positive (TP), False Positive (FP), or True Negative (TN). The Bloom Filter is widely adapted in numerous areas to enhance the performance of a system. In this paper, we present a) in-depth insight on the Bloom Filter,and b) the prominent variants of the Bloom Filters. △ Less

Submitted 17 March, 2019; originally announced March 2019.

Comments: 8 pages, 5 Figures, 1 Table, Proceedings of the 2018 International Conference on Information and Knowledge Engineering (IKE'18), pp. 14-21

Journal ref: Proceedings of the 2018 International Conference on Information and Knowledge Engineering (IKE'18), pp. 14-21, 2018

arXiv:1903.07167 [pdf, other]

Machine Learning: A Dark Side of Cancer Computing

Authors: Ripon Patgiri, Sabuzima Nayak, Tanya Akutota, Bishal Paul

Abstract: Cancer analysis and prediction is the utmost important research field for well-being of humankind. The Cancer data are analyzed and predicted using machine learning algorithms. Most of the researcher claims the accuracy of the predicted results within 99%. However, we show that machine learning algorithms can easily predict with an accuracy of 100% on Wisconsin Diagnostic Breast Cancer dataset. We… ▽ More Cancer analysis and prediction is the utmost important research field for well-being of humankind. The Cancer data are analyzed and predicted using machine learning algorithms. Most of the researcher claims the accuracy of the predicted results within 99%. However, we show that machine learning algorithms can easily predict with an accuracy of 100% on Wisconsin Diagnostic Breast Cancer dataset. We show that the method of gaining accuracy is an unethical approach that we can easily mislead the algorithms. In this paper, we exploit the weakness of Machine Learning algorithms. We perform extensive experiments for the correctness of our results to exploit the weakness of machine learning algorithms. The methods are rigorously evaluated to validate our claim. In addition, this paper focuses on correctness of accuracy. This paper report three key outcomes of the experiments, namely, correctness of accuracies, significance of minimum accuracy, and correctness of machine learning algorithms. △ Less

Submitted 17 March, 2019; originally announced March 2019.

Comments: 7 Pages, 21 Figures, 2 Tables, Proceedings of the 2018 International Conference on Bioinformatics and Computational Biology, pp. 92-98, 2018

Journal ref: Proceedings of the 2018 International Conference on Bioinformatics and Computational Biology, pp. 92-98, 2018

arXiv:1903.06570 [pdf, other]

doi 10.14569/IJACSA.2018.091277

scaleBF: A High Scalable Membership Filter using 3D Bloom Filter

Authors: Ripon Patgiri, Sabuzima Nayak, Samir Kumar Borgohain

Abstract: Bloom Filter is extensively deployed data structure in various applications and research domain since its inception. Bloom Filter is able to reduce the space consumption in an order of magnitude. Thus, Bloom Filter is used to keep information of a very large scale data. There are numerous variants of Bloom Filters available, however, scalability is a serious dilemma of Bloom Filter for years. To s… ▽ More Bloom Filter is extensively deployed data structure in various applications and research domain since its inception. Bloom Filter is able to reduce the space consumption in an order of magnitude. Thus, Bloom Filter is used to keep information of a very large scale data. There are numerous variants of Bloom Filters available, however, scalability is a serious dilemma of Bloom Filter for years. To solve this dilemma, there are also diverse variants of Bloom Filter. However, the time complexity and space complexity become the key issue again. In this paper, we present a novel Bloom Filter to address the scalability issue without compromising the performance, called scaleBF. scaleBF deploys many 3D Bloom Filter to filter the set of items. In this paper, we theoretically compare the contemporary Bloom Filter for scalability and scaleBF outperforms in terms of time complexity. △ Less

Submitted 15 March, 2019; originally announced March 2019.

Comments: 6 Pages, 3 Figures, 1 Table

Journal ref: International Journal of Advanced Computer Science and Applications(IJACSA), Volume 9 Issue 12, 2018

arXiv:1903.06565 [pdf, other]

doi 10.14569/IJACSA.2018.091193

Role of Bloom Filter in Big Data Research: A Survey

Authors: Ripon Patgiri, Sabuzima Nayak, Samir Kumar Borgohain

Abstract: Big Data is the most popular emerging trends that becomes a blessing for human kinds and it is the necessity of day-to-day life. For example, Facebook. Every person involves with producing data either directly or indirectly. Thus, Big Data is a high volume of data with exponential growth rate that consists of a variety of data. Big Data touches all fields, including Government sector, IT industry,… ▽ More Big Data is the most popular emerging trends that becomes a blessing for human kinds and it is the necessity of day-to-day life. For example, Facebook. Every person involves with producing data either directly or indirectly. Thus, Big Data is a high volume of data with exponential growth rate that consists of a variety of data. Big Data touches all fields, including Government sector, IT industry, Business, Economy, Engineering, Bioinformatics, and other basic sciences. Thus, Big Data forms a data silo. Most of the data are duplicates and unstructured. To deal with such kind of data silo, Bloom Filter is a precious resource to filter out the duplicate data. Also, Bloom Filter is inevitable in a Big Data storage system to optimize the memory consumption. Undoubtedly, Bloom Filter uses a tiny amount of memory space to filter a very large data size and it stores information of a large set of data. However, functionality of the Bloom Filter is limited to membership filter, but it can be adapted in various applications. Besides, the Bloom Filter is deployed in diverse field, and also used in the interdisciplinary research area. Bioinformatics, for instance. In this article, we expose the usefulness of Bloom Filter in Big Data research. △ Less

Submitted 15 March, 2019; originally announced March 2019.

Comments: 7 Pages, 3 Figures, 1 Table

Journal ref: International Journal of Advanced Computer Science and Applications(IJACSA), Volume 9 Issue 11, 2018

arXiv:1810.06689 [pdf, other]

doi 10.4108/eai.19-6-2018.155865

Preventing DDoS using Bloom Filter: A Survey

Authors: Ripon Patgiri, Sabuzima Nayak, Samir Kumar Borgohain

Abstract: Distributed Denial-of-Service (DDoS) is a menace for service provider and prominent issue in network security. Defeating or defending the DDoS is a prime challenge. DDoS make a service unavailable for a certain time. This phenomenon harms the service providers, and hence, loss of business revenue. Therefore, DDoS is a grand challenge to defeat. There are numerous mechanism to defend DDoS, however,… ▽ More Distributed Denial-of-Service (DDoS) is a menace for service provider and prominent issue in network security. Defeating or defending the DDoS is a prime challenge. DDoS make a service unavailable for a certain time. This phenomenon harms the service providers, and hence, loss of business revenue. Therefore, DDoS is a grand challenge to defeat. There are numerous mechanism to defend DDoS, however, this paper surveys the deployment of Bloom Filter in defending a DDoS attack. The Bloom Filter is a probabilistic data structure for membership query that returns either true or false. Bloom Filter uses tiny memory to store information of large data. Therefore, packet information is stored in Bloom Filter to defend and defeat DDoS. This paper presents a survey on DDoS defending technique using Bloom Filter. △ Less

Submitted 15 October, 2018; originally announced October 2018.

Comments: 9 pages, 1 figure. This article is accepted for publication in EAI Endorsed Transactions on Scalable Information Systems

Journal ref: EAI Endorsed Transactions on Scalable Information Systems, 5(19), 2018

arXiv:1808.08474 [pdf, other]

A Taxonomy on Big Data: Survey

Authors: Ripon Patgiri

Abstract: The Big Data is the most popular paradigm nowadays and it has almost no untouched area. For instance, science, engineering, economics, business, social science, and government. The Big Data are used to boost up the organization performance using massive amount of dataset. The Data are assets of the organization, and these data gives revenue to the organizations. Therefore, the Big Data is spawning… ▽ More The Big Data is the most popular paradigm nowadays and it has almost no untouched area. For instance, science, engineering, economics, business, social science, and government. The Big Data are used to boost up the organization performance using massive amount of dataset. The Data are assets of the organization, and these data gives revenue to the organizations. Therefore, the Big Data is spawning everywhere to enhance the organizations' revenue. Thus, many new technologies emerging based on Big Data. In this paper, we present the taxonomy of Big Data. Besides, we present in-depth insight on the Big Data paradigm. △ Less

Submitted 25 November, 2019; v1 submitted 25 August, 2018; originally announced August 2018.

Comments: 15 pages, 15 figures, 5 tables, and a survey paper

arXiv:1704.02632 [pdf]

MapReduce Scheduler: A 360-degree view

Authors: Rajdeep Das, Rohit Pratap Singh, Ripon Patgiri

Abstract: Undoubtedly, the MapReduce is the most powerful programming paradigm in distributed computing. The enhancement of the MapReduce is essential and it can lead the computing faster. Therefore, here are many scheduling algorithms to discuss based on their characteristics. Moreover, there are many shortcoming to discover in this field. In this article, we present the state-of-the-art scheduling algorit… ▽ More Undoubtedly, the MapReduce is the most powerful programming paradigm in distributed computing. The enhancement of the MapReduce is essential and it can lead the computing faster. Therefore, here are many scheduling algorithms to discuss based on their characteristics. Moreover, there are many shortcoming to discover in this field. In this article, we present the state-of-the-art scheduling algorithm to enhance the understanding of the algorithms. The algorithms are presented systematically such that there can be many future possibilities in scheduling algorithm through this article. In this paper, we provide in-depth insight on the MapReduce scheduling algorithm. In addition, we discuss various issues of MapReduce scheduler developed for large-scale computing as well as heterogeneous environment. △ Less

Submitted 9 April, 2017; originally announced April 2017.

Comments: Journal Article

Journal ref: International Journal of Current Engineering and Scientific Research, volume 3(11), pages 88-100, 2016

Showing 1–21 of 21 results for author: Patgiri, R